11 Jul 2025

Updates to my learnings on LLM integration

A few days back in my post of coding-with-llm I mentioned a few challenges I faced when coding with LLM, now my understanding got updated.

Structured Data

I mentioned there should be some SDK help handles the JSON structure prompt, and later I found vercel ai sdk does the job. It not only provides options to specify schema with zod, but also handles illegal json format with enhanced retries to garantee valid json response.

Managing Complexity

About managing complexity, it's a general rule to solve one problem at a time, it seems with LLM's capability, complexity can also be managed by itself, which is how agents work these days. This is revolutionary comparing to traditional programming that, a key "intelligent" engineering practice of "break down" could be handled by AI. Althought it's arguable whether LLMs have mastered it , comparing to coding (implementation of unambigious logic by code), which is probably consensus that AI has mastered.

Response time

I mentioned a few work-arounds of mitigating long response time. But later I realized, optimization won't address much of the user experience issue , due to how LLM works. It spill out tokens one by one, at speed around 200-300 tokens/second, meaning most of response will take seconds. With reasoning models, it will be even worse. So for user experience wise, a different way of user interaction is needed, comparing to traditional applications, and the answers seem to be:

Chat, this is probably why almost all LLM applications today are some kind of chat. It streams the response, so user doesn't have to wait long period and seeing no progress. BTW verscel SDK provide streamObject option
Batch processing, if the scenario does not require instant feedback.

Multi-Model Strategy

I used a few different models in my project, thinking of factors like cost and reponse time. But I found that larger models perform better, using smaller model to save money and a bit of time probably won't make sense if quality is compromised.

While for LLM API consumers, using different models might still make sense depends on the performance of models on different type of tasks. This might be addressed by big palyers with things like LLM MoE, and eventually the main benefit of using multiple model would be portablity.