3.LLM Application Logic

👉 #LLM #AI #RAG #Fine-tuning #System-Architecture

1. Mastering the Foundational Logic of LLM Application Development

📅 2026.03.23 08:43 EDT, from Gemini 2.0 Flash 📎 YouTube — Mastering the Foundational Logic of LLM Application Development

2024 perspective: this video gives a clear, accessible introduction to the foundational logic of LLM (Large Language Model) application development. Its core thesis: in the AGI (Artificial General Intelligence) era, the most valuable role is the AI application developer — someone who understands business, the boundaries of AI, and decomposition logic, and uses those to land LLM capabilities in real-world scenarios.

flowchart LR
  classDef default fill:#2d2d2d,stroke:#ccc,color:#fff;
  classDef titleNode fill:none,stroke:none,color:#FFD700,font-weight:bold,font-size:18px;

  %% 1. Foundations
  subgraph Principles [" "]
    direction LR
    PT["🔹 1. Foundations"]:::titleNode
    P1[Next Token Prediction<br>Word continuation]
    P2[Scaling Law<br>Scaling effect]
    P3[Transformer<br>Bedrock architecture]
  end

  %% 2. Application Modes
  subgraph Modes [" "]
    direction LR
    MT["🔹 2. Application Modes"]:::titleNode
    M1[Embedded<br>Silent processing]
    M2[Copilot<br>Human-AI co-pilot]
    M3[Agent<br>Autonomous decomposition]
  end

  %% 3. Technology Arsenal
  subgraph Tech [" "]
    direction LR
    TT["🔹 3. Technology Arsenal"]:::titleNode
    T1[Prompt Engineering]
    T2[RAG<br>Retrieval-augmented / open-book]
    T3[Function Calling<br>External tools]
    T4[Fine-tuning<br>Parameter tuning / closed-book]
  end

  %% 4. Decision Path
  subgraph Workflow ["🔹 4. Decision Path in Practice"]
    direction LR
    W1[1. Pure Prompt <br>Test native ceiling] --> W2{Choose path<br>by need}
    W2 -- Private data --> R1[RAG]
    W2 -- System interaction --> R2[Function Calling]
    W2 -- Extreme responsiveness --> R3[Fine-tuning]
  end

  %% Module connections
  Principles ==> Modes ==> Tech ==> Workflow

  %% Module styling
  style Principles fill:#333,stroke:#fff,stroke-width:2px
  style Modes fill:#4a148c,stroke:#fff,stroke-width:2px
  style Tech fill:#01579b,stroke:#fff,stroke-width:2px
  style Workflow fill:#d4a017,stroke:#fff,stroke-width:2px

1.1. The Underlying Principle of LLMs and Core Concepts

The underlying principle of an LLM can be summarized as Next Token Prediction (word continuation): based on a massive corpus, calculate probabilities and predict the next token.
Key terminology
Token: the smallest unit a model processes; rule of thumb: 1000 tokens ≈ 500 Chinese characters.
Transformer: the core architecture of today's mainstream LLMs (e.g., GPT); the bedrock of the AI industry.
Scaling Law: an empirical rule — more data and more compute means better model performance; "throw more force at it and miracles happen".

1.2. The Three AI Application Modes — Classified by How Dominant AI Is in the Business Process

Embedded mode — human-led; AI as a background utility silently processing data (e.g., auto-generated meeting minutes).
In this mode, AI is "imperceptible". It is integrated into the existing business flow; users may not even need to talk to AI directly — AI just handles a specific step in the background as a utility tool.
Core logic: humans drive the business process, AI automates one specific step.
Examples
- Meeting minutes: you finish a Zoom or Teams meeting as usual; after closing the window, the system emails you AI-generated minutes.
- Access control: a security camera at your community gate uses AI to recognize your face and open the gate.
- Content moderation: when you post on a social platform, background AI scans for prohibited words; if it passes, the post is published, otherwise blocked.
Copilot mode — human-AI collaboration; the human holds the steering wheel, AI assists (e.g., code completion).
This is currently the dominant mode. AI is present in real time as an assistant, but does not have final decision authority.
Core logic: human-AI collaboration; the human holds the "steering wheel" and decides direction; AI provides suggestions, drafts, or completions.
Examples
- GitHub Copilot (code completion): a programmer types a few characters; AI auto-completes the code, but the programmer must review for bugs before committing.
- Smart writing assistant: while writing email or docs, AI suggests the next sentence or polishes paragraphs, but you decide whether to send.
- AI-assisted medical diagnosis: AI scans an X-ray and flags suspected lesions, but the final diagnosis must be signed by a human doctor.
Agent mode — AI-led; the user sets a goal, AI autonomously decomposes the task, calls tools, and self-corrects.
This is the highest form the industry is currently pursuing. AI no longer just follows instructions — it has reasoning and task decomposition capability.
Core logic: AI drives the process. The user only specifies a high-level goal, and the AI autonomously decides the path, calls tools, finds errors, and corrects itself.
Examples
- Automated travel planning: you say "plan a 5-day trip to Tokyo with a 2-week budget; book hotels and flights too." [11:40] The AI checks flights, compares hotels, checks weather, self-corrects if a hotel is full, and hands you the result. [11:51]
- Automated data engineering: you give the AI database access and a goal: "analyze the characteristics of churned users in the last three months and generate a visual report". The Agent writes its own SQL, executes it, adjusts logic when data is missing, and finally produces a chart for you.
The three AI application modes are essentially classified by human-in-the-loop and autonomy.
A classic, industry-standard taxonomy widely accepted between 2024 and 2025.
Intelligence requirements and development difficulty grow progressively across the three.
In real adoption, start with the simplest Embedded or Copilot; once the business logic is mature, consider building a complex Agent.

Dimension	Embedded	Copilot	Agent
Driver	Human (full control)	Human-AI co-pilot (human steers)	AI-led (user sets the goal)
Interaction	Trigger / silent background	Conversational / real-time completion	Goal-driven (objective-driven)
AI responsibility	Process specific data points	Assist generation, suggest options	Decompose tasks, execute autonomously
Intelligence	Lower (specific tasks)	Medium (needs context)	Very high (planning + correction)

2026 AI applications no longer emphasize which mode they are — the focus is on the maturity of the Orchestration Layer.
The core challenge has shifted from "can AI be autonomous" to "how do we monitor and evaluate the quality of these autonomous behaviors at scale".
The video's classification is reasonable but not forward-looking.
- It is good for explaining "what AI can do" to non-technical executives, but not as a sole architectural basis for technology selection.
- From the 2026 vantage point, the classification is logically clean but shows clear over-simplification in real adoption and evolution.
The video's classification is linear, assuming AI capability moves smoothly from "small utility" to "co-pilot" to "independent agent". In real businesses these boundaries are blurring.
The Copilot–Agent boundary is collapsing
- Latest products like GitHub Copilot 2026 or Microsoft 365 Copilot Agent Mode no longer just wait for instructions; they have autonomous background tasking capability.
- A Copilot can run tests, fix bugs, and submit a PR while you sleep. Is that still a Copilot or already an Agent? The classification fails.
Multi-Agent System ignored
- The video focuses on a "single AI" identity; current trends emphasize orchestration.
- In real workflows it is not a single powerful Agent solving everything, but a "manager Agent" coordinating a team of "specialist Agents".
Governance and security ignored
- This taxonomy only talks about "what AI can do", not "what we dare let it do".
- In data engineering, Agent mode's biggest blocker is not intelligence but permission over-exposure.
The new 2026 trend: from modes to roles. Industry consensus is shifting from "application modes" to digital workforce architecture. You see the taxonomy evolving into:
Task-Specific Agents
- Stop emphasizing "Embedded vs. Copilot"; emphasize expertise instead.
- For example, a dedicated finance reconciliation Agent that can be Embedded into ERP or act as a Copilot answering queries.
Agentic Workflows
- Andrew Ng's idea, fully landed in 2026.
- Rather than chasing one all-powerful Agent, decompose the process into: Prompt → Iteration → Tool-Use → Reflection → Output.
- In this pipeline, AI's identity is dynamic: Copilot when drafting, Agent when self-checking.
Two harder metrics for data engineering:
Deterministic vs. probabilistic
- Embedded often pursues determinism (output must conform to a schema).
- Agent is highly probabilistic, which is a huge risk for production data modeling.
Granularity of human-in-the-loop (HITL)
- The 2026 best practice is no longer "human steers the wheel" but checkpoint-based control.
- AI executes 80% of the task autonomously, but at critical decision points (e.g., DROP TABLE or modifying a production API) it pauses and waits for human confirmation.

1.3. The Four Mainstream Technology Arsenals

Prompt Engineering
The basic foundation; the core mantra is: treat the AI as an extremely intelligent intern with no background knowledge — provide enough background, set clear constraints. [09:21]
RAG (Retrieval-Augmented Generation)
Hook the AI up to an external knowledge base (e.g., a vector database) so it can take an "open-book exam"; suitable when you need fresh or private data. [13:31]
Function Calling
Give the AI "hands and feet" — let it actively call external systems (e.g., weather lookup, finance system query). The model analyzes the request and proposes which tool to call. [14:22]
Fine-tuning
Use thousands of Q&A pairs to deeply change model parameters and bake knowledge into the brain — like a "closed-book exam"; suitable when output format, response speed, or specific personality must be tightly controlled. [15:06]

1.4. Decision Logic in Practice

When facing a specific business pain point, follow this three-step logic: 1. Prepare test data [17:36] - Manually test off-the-shelf LLMs with pure prompts to learn their raw capability ceiling. - If raw data quality is too poor, no architecture will work. 2. Choose the technology path [18:05] - Pure dialogue solves it: choose Prompt Engineering. - Involves private/real-time documents: choose RAG. - Involves external system actions: choose Function Calling. 3. Assess whether fine-tuning is needed [18:41] - Only consider Fine-tuning when: user volume makes RAG too costly, response speed must be sub-second, or you need to enforce a complex output format.

1.X. Supplement: The "Toolbox" or "Development Paradigm" of AI Engineering

The four concepts below are not parallel underlying principles to RAG or Agent — they are encapsulations and means used to implement those underlying logics.
The "four weapons" (Prompt, RAG, Function Calling, Fine-tuning) in the notes are raw materials and technical principles.
The "three modes" (Embedded, Copilot, Agent) are final product forms.
LangChain / LangGraph are the production equipment that turns raw materials into products.
Vibe Coding is the directorial style you use when operating that equipment.

Concept	Position in your notes' framework	Role type
AI Skill	Execution unit of Agent mode	Capability component (Feature)
LangChain	Scaffolding for RAG / Function Calling	Toolkit (SDK / Library)
LangGraph	Engine for Agent mode	State-orchestration framework (Engine)
Vibe Coding	Development methodology in the Copilot/Agent era	Production paradigm (Paradigm)

AI Skills: a granular unit of capability — not a concrete framework, but a logical concept.
Position: equivalent to a specific Transform step or UDF (User Defined Function) in a data pipeline.
Relationship: in Agent mode, the Agent decomposes the task and calls different Skills (e.g., a translation Skill, a SQL-generation Skill, a web-search Skill).
Essence: a high-level encapsulation of Prompt + Function Calling, giving the model the feel of "skill plug-ins".
LangChain: the industrial-grade pipeline factory; today's most popular orchestration framework.
Position: equivalent to Airflow or dbt in data engineering.
Relationship: it exists to implement the RAG and Function Calling mentioned in your notes; it wraps low-level API calls into "Chains".
Problem solved: writing Python by hand to talk to OpenAI, vector DBs, search APIs leads to messy code; LangChain provides standard components so you assemble like Lego.
LangGraph: a topology graph for complex state; a more advanced framework from the LangChain team, specifically for building Agents.
Position: stateful multi-actor infrastructure; a cyclic state machine.
Relationship: traditional LangChain is linear (DAG); a real Agent needs cycles and iteration (e.g., AI checks the result, finds it wrong, goes back).
Essence: the advanced evolution tool for Agent mode; if LangChain is a straight-line assembly, LangGraph is a complex factory with logical loops.
Vibe Coding: a dimensional shift in interaction — a development philosophy/paradigm popularized recently in AI circles by figures like Andrej Karpathy.
Position: a shift in the "interaction protocol" of software engineering.
Relationship: not a technical architecture but a development style; the developer no longer wrestles with syntax but describes the vibe, logical intent, and high-level requirements, letting the AI (Cursor, Windsurf, etc.) directly generate and run the code.
Essence: the extreme expression of Copilot mode — code becomes consumable, while logic and "aesthetic/intuition" become the core productivity.

Sections covering "Agent Infrastructure Stack: 6 Layers" and "AI Agent Framework Selection Guide: 10 Frameworks" have been moved to their canonical homes.

📎 For the 6-layer Agent infrastructure, see 2.Application/3.Agent.md §2.3 Architecture & Design. 📎 For the 10-framework selection guide, see 5.Framework/1.Agent_Frameworks_Overview.md.