06 Jul 2025

Lessons learned from integrating LLMs in Programming

Traditional programming relies on clearly defined rules that can be translated into code and executed by a computer. However, not all problems can be neatly described as codable rules. This is where AI—specifically Large Language Models (LLMs)—shines.

Recently, I’ve been experimenting with LLM APIs for a hobby project, and I’ve learned several key lessons about coding with LLMs. These insights might hint at how programming could evolve if LLMs become essential components of applications.

While today’s LLMs are powerful, they’re not perfect. Rather than talking on their flaws, This article focus on exploring practical ways to work around their limitations. Although with LLMs keep evolving, some of the limitations might be solved. But I feel some of the points are not likely to be addressed any time soon.

The Application

My project is a traditional web app with a JS/TypeScript frontend and a basic Go backend. Its primary function is to let users set a goal, and then use an LLM to suggest actionable steps to achieve it.

Structured Data Challenges

Since the app relies on structured data (e.g., JSON), I needed the LLM to consistently produce well-formatted outputs. While LLMs can generate JSON, manually including schema examples in prompts is cumbersome:

const prompt = `
  Given the following goal statement, extract the measurable goal as a JSON object with this format:
  {
    "target": { "unit": "...", "value": ... },
    "duration": { "unit": "...", "value": ... }
  }
  If any value is missing, use null.

  Statement: """${statement}"""
  JSON:
`;
const content = await sendAiPrompt(prompt, user.token, aiConfig);

Some extra layer can be added to the openAI interfaces, to add expected reposne schema, which can be translated to additional promts

inteface Task{
  title:string
  descrption:string
}
const sampleTask = {...} as Task
const completion = await client.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    { role: 'developer', content: 'Talk like a pirate.' },
    { role: 'user', content: 'Are semicolons optional in JavaScript?' },
  ],
  responseSchema:JSON.stringify(sampleTask)
});

This is pretty straight forward, I expect someone with big name would soon add this to their API specs

Consistency Issues

Despite explicit instructions, LLMs sometimes return JSON in unexpected formats, such

{
 tasks :[...]
}

{response:[{...}...]}

This inconsistency highlights the need for robust error handling:

try {
    return JSON.parse(jsonString) as T;
} catch { 
    //mandatory error handling of unexpected json string.
}

With reasoning models, due to the "think" section might also contain json strcuture, so your json extracting logic need consider that. The rule that I figured out for AI to code for me is:

/**
 * Extracts the last JSON object from a string and parses it.
 * - Prefer JSON inside ```json ...``` code blocks (from the end)
 * - Then prefer JSON inside any ```...``` code block (from the end)
 * - Last resort: fallback to last {...} in the string
 * Returns null if parsing fails.
 */

Still, parsing can fail—reinforcing the need for graceful error handling.

Managing Complexity

Initially, I asked the LLM to break down goals into hierarchical, structured plans. Despite prompt tweaks, the results were unsatisfactory. Yet, when I chatted with the same LLM directly, it provided excellent unstructured advice.

I realized that expecting a complex output from a single prompt violates basic engineering principles. Instead, I now:

Request raw, unstructured responses for high-level guidance.
Use follow-up prompts or algorithms to parse and structure the output.

Lesson: Even with AI’s intelligence, solving one problem at a time remains the best approach.

Response Time Optimization

LLM API calls often exceed ideal response times (>1s) for user interactions (usually need be <0.2s). There are a few options I figured out to mitigate this:

Simplify requests – Fewer tokens = faster responses.
Use lighter models when full reasoning isn’t needed.
Parallelize requests (but watch for rate limits).
Lock the UI during processing (e.g., React’s useEffect triggers).
Show progress indicators to manage user expectations.

Note: Parallel requests may trigger rate limits. Workarounds include multi-provider fallbacks or multiple API keys.

Multi-Model Strategy

Not all tasks require the most advanced (and expensive) model. Choose based on:

Importance of the request.
Complexity of the task.
Cost vs. benefit.
Response time requirements.

Application Architecture

Since LLM API keys must stay server-side, requests should route through a backend. For prototyping, I found a hybrid approach effective:

The backend acts as a secure proxy, handling authentication and API routing.
The frontend manages prompts, enabling rapid iteration when testing different providers or adjusting for inconsistencies.

Future Concerns

LLM providers frequently update models, which can introduce subtle behavioral changes—unlike traditional APIs, where breaking changes are usually infrastructure-related and easier to detect. The behavior change from LLMs, on the other hand, might hard to be noticed, which might lead to unexecpted results.