3.LLM Application Logic
👉 #LLM #AI #RAG #Fine-tuning #System-Architecture
1. Mastering the Foundational Logic of LLM Application Development
📅 2026.03.23 08:43 EDT, from Gemini 2.0 Flash 📎 YouTube — Mastering the Foundational Logic of LLM Application Development
2024 perspective: this video gives a clear, accessible introduction to the foundational logic of LLM (Large Language Model) application development. Its core thesis: in the AGI (Artificial General Intelligence) era, the most valuable role is the AI application developer — someone who understands business, the boundaries of AI, and decomposition logic, and uses those to land LLM capabilities in real-world scenarios.
flowchart LR
classDef default fill:#2d2d2d,stroke:#ccc,color:#fff;
classDef titleNode fill:none,stroke:none,color:#FFD700,font-weight:bold,font-size:18px;
%% 1. Foundations
subgraph Principles [" "]
direction LR
PT["🔹 1. Foundations"]:::titleNode
P1[Next Token Prediction<br>Word continuation]
P2[Scaling Law<br>Scaling effect]
P3[Transformer<br>Bedrock architecture]
end
%% 2. Application Modes
subgraph Modes [" "]
direction LR
MT["🔹 2. Application Modes"]:::titleNode
M1[Embedded<br>Silent processing]
M2[Copilot<br>Human-AI co-pilot]
M3[Agent<br>Autonomous decomposition]
end
%% 3. Technology Arsenal
subgraph Tech [" "]
direction LR
TT["🔹 3. Technology Arsenal"]:::titleNode
T1[Prompt Engineering]
T2[RAG<br>Retrieval-augmented / open-book]
T3[Function Calling<br>External tools]
T4[Fine-tuning<br>Parameter tuning / closed-book]
end
%% 4. Decision Path
subgraph Workflow ["🔹 4. Decision Path in Practice"]
direction LR
W1[1. Pure Prompt <br>Test native ceiling] --> W2{Choose path<br>by need}
W2 -- Private data --> R1[RAG]
W2 -- System interaction --> R2[Function Calling]
W2 -- Extreme responsiveness --> R3[Fine-tuning]
end
%% Module connections
Principles ==> Modes ==> Tech ==> Workflow
%% Module styling
style Principles fill:#333,stroke:#fff,stroke-width:2px
style Modes fill:#4a148c,stroke:#fff,stroke-width:2px
style Tech fill:#01579b,stroke:#fff,stroke-width:2px
style Workflow fill:#d4a017,stroke:#fff,stroke-width:2px
1.1. The Underlying Principle of LLMs and Core Concepts
- The underlying principle of an LLM can be summarized as Next Token Prediction (word continuation): based on a massive corpus, calculate probabilities and predict the next token.
- Key terminology
- Token: the smallest unit a model processes; rule of thumb: 1000 tokens ≈ 500 Chinese characters.
- Transformer: the core architecture of today's mainstream LLMs (e.g., GPT); the bedrock of the AI industry.
- Scaling Law: an empirical rule — more data and more compute means better model performance; "throw more force at it and miracles happen".
1.2. The Three AI Application Modes — Classified by How Dominant AI Is in the Business Process
- Embedded mode — human-led; AI as a background utility silently processing data (e.g., auto-generated meeting minutes).
- In this mode, AI is "imperceptible". It is integrated into the existing business flow; users may not even need to talk to AI directly — AI just handles a specific step in the background as a utility tool.
- Core logic: humans drive the business process, AI automates one specific step.
- Examples
- Meeting minutes: you finish a Zoom or Teams meeting as usual; after closing the window, the system emails you AI-generated minutes.
- Access control: a security camera at your community gate uses AI to recognize your face and open the gate.
- Content moderation: when you post on a social platform, background AI scans for prohibited words; if it passes, the post is published, otherwise blocked.
- Copilot mode — human-AI collaboration; the human holds the steering wheel, AI assists (e.g., code completion).
- This is currently the dominant mode. AI is present in real time as an assistant, but does not have final decision authority.
- Core logic: human-AI collaboration; the human holds the "steering wheel" and decides direction; AI provides suggestions, drafts, or completions.
- Examples
- GitHub Copilot (code completion): a programmer types a few characters; AI auto-completes the code, but the programmer must review for bugs before committing.
- Smart writing assistant: while writing email or docs, AI suggests the next sentence or polishes paragraphs, but you decide whether to send.
- AI-assisted medical diagnosis: AI scans an X-ray and flags suspected lesions, but the final diagnosis must be signed by a human doctor.
- Agent mode — AI-led; the user sets a goal, AI autonomously decomposes the task, calls tools, and self-corrects.
- This is the highest form the industry is currently pursuing. AI no longer just follows instructions — it has reasoning and task decomposition capability.
- Core logic: AI drives the process. The user only specifies a high-level goal, and the AI autonomously decides the path, calls tools, finds errors, and corrects itself.
-
Examples
- Automated travel planning: you say "plan a 5-day trip to Tokyo with a 2-week budget; book hotels and flights too." [11:40] The AI checks flights, compares hotels, checks weather, self-corrects if a hotel is full, and hands you the result. [11:51]
- Automated data engineering: you give the AI database access and a goal: "analyze the characteristics of churned users in the last three months and generate a visual report". The Agent writes its own SQL, executes it, adjusts logic when data is missing, and finally produces a chart for you.
-
The three AI application modes are essentially classified by human-in-the-loop and autonomy.
- A classic, industry-standard taxonomy widely accepted between 2024 and 2025.
- Intelligence requirements and development difficulty grow progressively across the three.
- In real adoption, start with the simplest Embedded or Copilot; once the business logic is mature, consider building a complex Agent.
| Dimension | Embedded | Copilot | Agent |
|---|---|---|---|
| Driver | Human (full control) | Human-AI co-pilot (human steers) | AI-led (user sets the goal) |
| Interaction | Trigger / silent background | Conversational / real-time completion | Goal-driven (objective-driven) |
| AI responsibility | Process specific data points | Assist generation, suggest options | Decompose tasks, execute autonomously |
| Intelligence | Lower (specific tasks) | Medium (needs context) | Very high (planning + correction) |
- 2026 AI applications no longer emphasize which mode they are — the focus is on the maturity of the Orchestration Layer.
- The core challenge has shifted from "can AI be autonomous" to "how do we monitor and evaluate the quality of these autonomous behaviors at scale".
-
The video's classification is reasonable but not forward-looking.
- It is good for explaining "what AI can do" to non-technical executives, but not as a sole architectural basis for technology selection.
- From the 2026 vantage point, the classification is logically clean but shows clear over-simplification in real adoption and evolution.
-
The video's classification is linear, assuming AI capability moves smoothly from "small utility" to "co-pilot" to "independent agent". In real businesses these boundaries are blurring.
- The Copilot–Agent boundary is collapsing
- Latest products like GitHub Copilot 2026 or Microsoft 365 Copilot Agent Mode no longer just wait for instructions; they have autonomous background tasking capability.
- A Copilot can run tests, fix bugs, and submit a PR while you sleep. Is that still a Copilot or already an Agent? The classification fails.
- Multi-Agent System ignored
- The video focuses on a "single AI" identity; current trends emphasize orchestration.
- In real workflows it is not a single powerful Agent solving everything, but a "manager Agent" coordinating a team of "specialist Agents".
- Governance and security ignored
- This taxonomy only talks about "what AI can do", not "what we dare let it do".
- In data engineering, Agent mode's biggest blocker is not intelligence but permission over-exposure.
- The new 2026 trend: from modes to roles. Industry consensus is shifting from "application modes" to digital workforce architecture. You see the taxonomy evolving into:
- Task-Specific Agents
- Stop emphasizing "Embedded vs. Copilot"; emphasize expertise instead.
- For example, a dedicated finance reconciliation Agent that can be Embedded into ERP or act as a Copilot answering queries.
- Agentic Workflows
- Andrew Ng's idea, fully landed in 2026.
- Rather than chasing one all-powerful Agent, decompose the process into:
Prompt → Iteration → Tool-Use → Reflection → Output. - In this pipeline, AI's identity is dynamic: Copilot when drafting, Agent when self-checking.
- Two harder metrics for data engineering:
- Deterministic vs. probabilistic
- Embedded often pursues determinism (output must conform to a schema).
- Agent is highly probabilistic, which is a huge risk for production data modeling.
- Granularity of human-in-the-loop (HITL)
- The 2026 best practice is no longer "human steers the wheel" but checkpoint-based control.
- AI executes 80% of the task autonomously, but at critical decision points (e.g.,
DROP TABLEor modifying a production API) it pauses and waits for human confirmation.
1.3. The Four Mainstream Technology Arsenals
- Prompt Engineering
- The basic foundation; the core mantra is: treat the AI as an extremely intelligent intern with no background knowledge — provide enough background, set clear constraints. [09:21]
- RAG (Retrieval-Augmented Generation)
- Hook the AI up to an external knowledge base (e.g., a vector database) so it can take an "open-book exam"; suitable when you need fresh or private data. [13:31]
- Function Calling
- Give the AI "hands and feet" — let it actively call external systems (e.g., weather lookup, finance system query). The model analyzes the request and proposes which tool to call. [14:22]
- Fine-tuning
- Use thousands of Q&A pairs to deeply change model parameters and bake knowledge into the brain — like a "closed-book exam"; suitable when output format, response speed, or specific personality must be tightly controlled. [15:06]
1.4. Decision Logic in Practice
When facing a specific business pain point, follow this three-step logic: 1. Prepare test data [17:36] - Manually test off-the-shelf LLMs with pure prompts to learn their raw capability ceiling. - If raw data quality is too poor, no architecture will work. 2. Choose the technology path [18:05] - Pure dialogue solves it: choose Prompt Engineering. - Involves private/real-time documents: choose RAG. - Involves external system actions: choose Function Calling. 3. Assess whether fine-tuning is needed [18:41] - Only consider Fine-tuning when: user volume makes RAG too costly, response speed must be sub-second, or you need to enforce a complex output format.
1.X. Supplement: The "Toolbox" or "Development Paradigm" of AI Engineering
- The four concepts below are not parallel underlying principles to RAG or Agent — they are encapsulations and means used to implement those underlying logics.
- The "four weapons" (Prompt, RAG, Function Calling, Fine-tuning) in the notes are raw materials and technical principles.
- The "three modes" (Embedded, Copilot, Agent) are final product forms.
- LangChain / LangGraph are the production equipment that turns raw materials into products.
- Vibe Coding is the directorial style you use when operating that equipment.
| Concept | Position in your notes' framework | Role type |
|---|---|---|
| AI Skill | Execution unit of Agent mode | Capability component (Feature) |
| LangChain | Scaffolding for RAG / Function Calling | Toolkit (SDK / Library) |
| LangGraph | Engine for Agent mode | State-orchestration framework (Engine) |
| Vibe Coding | Development methodology in the Copilot/Agent era | Production paradigm (Paradigm) |
- AI Skills: a granular unit of capability — not a concrete framework, but a logical concept.
- Position: equivalent to a specific Transform step or UDF (User Defined Function) in a data pipeline.
- Relationship: in Agent mode, the Agent decomposes the task and calls different Skills (e.g., a translation Skill, a SQL-generation Skill, a web-search Skill).
- Essence: a high-level encapsulation of Prompt + Function Calling, giving the model the feel of "skill plug-ins".
- LangChain: the industrial-grade pipeline factory; today's most popular orchestration framework.
- Position: equivalent to Airflow or dbt in data engineering.
- Relationship: it exists to implement the RAG and Function Calling mentioned in your notes; it wraps low-level API calls into "Chains".
- Problem solved: writing Python by hand to talk to OpenAI, vector DBs, search APIs leads to messy code; LangChain provides standard components so you assemble like Lego.
- LangGraph: a topology graph for complex state; a more advanced framework from the LangChain team, specifically for building Agents.
- Position: stateful multi-actor infrastructure; a cyclic state machine.
- Relationship: traditional LangChain is linear (DAG); a real Agent needs cycles and iteration (e.g., AI checks the result, finds it wrong, goes back).
- Essence: the advanced evolution tool for Agent mode; if LangChain is a straight-line assembly, LangGraph is a complex factory with logical loops.
- Vibe Coding: a dimensional shift in interaction — a development philosophy/paradigm popularized recently in AI circles by figures like Andrej Karpathy.
- Position: a shift in the "interaction protocol" of software engineering.
- Relationship: not a technical architecture but a development style; the developer no longer wrestles with syntax but describes the vibe, logical intent, and high-level requirements, letting the AI (Cursor, Windsurf, etc.) directly generate and run the code.
- Essence: the extreme expression of Copilot mode — code becomes consumable, while logic and "aesthetic/intuition" become the core productivity.
Sections covering "Agent Infrastructure Stack: 6 Layers" and "AI Agent Framework Selection Guide: 10 Frameworks" have been moved to their canonical homes.
📎 For the 6-layer Agent infrastructure, see
2.Application/3.Agent.md§2.3 Architecture & Design. 📎 For the 10-framework selection guide, see5.Framework/1.Agent_Frameworks_Overview.md.