3.Function Calling

👉 #AI #LLM #Agent #Coding

I. Function Calling (Tool Use)

📅 2026-04-28 Tuesday PST; Claude Opus 4.6 📎 LLM Function Calling Complete Guide 📎 Function Calling: OpenAI, Anthropic, Google 📎 Building Agentic LLM Systems

1. Overview

1.1. Definition & Why

Function Calling (Tool Use): during inference, the LLM emits a structured JSON instruction specifying which external function to call and with what arguments; the application layer executes it and feeds the result back to the model for further reasoning.
Key distinction: the LLM does not run any code itself — it only "decides what to call"; the actual execution is performed by the application.
Design intent: a pure-text LLM cannot interact with the outside world; Function Calling is the bridge that promotes an LLM from "chatbot" to "agent that takes actions".
Pain points solved:
Real-time data: training data has a cutoff, but APIs can fetch live weather / stock prices / news
Precise computation: LLMs are unreliable at math; a calculator function gives exact results
System integration: let the LLM operate databases, send emails, create tickets, deploy code
Structured output: force the LLM to output JSON conforming to a schema rather than free text
Terminology unification: OpenAI calls it "Function Calling"; Anthropic calls it "Tool Use"; Google calls it "Function Declarations" — same essence.

1.2. Features & Use Cases

Core capabilities:
Structured Output: force the model to output per a JSON schema, parseable by code
Parallel Calling: call multiple functions in one inference (e.g., weather and flights at once)
Chained Calling: feed one function's result as the next function's input
Forced Calling: require the model to call a specific function
Auto Selection: model picks the best fit from a list of available functions
Typical scenarios:
Smart customer service: query order status, change shipping address, issue refunds
Data analysis: query DBs, generate charts, export reports
DevOps: check service status, restart instances, query logs
Personal assistant: check calendar, send email, book conference rooms, set reminders
RAG enhancement: expose "search the knowledge base" as a tool the model can call on demand
Multimodal: invoke image generation / TTS / video analysis services

1.3. Competitors

Function Calling is a foundational capability — no direct "competitors", but different implementation paths:

Approach	Mechanism	Pros	Cons
Native Function Calling	Model natively emits structured tool calls	Most reliable; specifically trained	Depends on model-provider API
Prompt-Based Parsing	Prompt asks for JSON, app parses it	Works on any model	Format unstable; needs extra validation
ReAct Pattern	Model alternates Thought / Action / Observation	Reasoning is transparent	High token cost across multi-turn
MCP (Model Context Protocol)	Standardized tool-interface protocol	Tools reusable across models	Ecosystem still developing

Mainstream-model Function Calling capability comparison (2026):

Model	Parallel	Forced	Streaming	Reliability
GPT-4o / GPT-4.1	✅	✅	✅	High
Claude Sonnet/Opus	✅	✅	✅	High
Gemini 2.5	✅	✅	✅	High
Llama 3.3	✅	✅	✅	Medium-high
Mistral Large	✅	✅	✅	Medium

2. Concept, Component, & Architecture

2.1. Key Concepts

(1) Tool Definition

Describes a function via JSON Schema: name, purpose, parameter types and constraints.
This is the "contract" of Function Calling — the model uses it to decide when to call and what to pass.
A good tool definition = clear function name + precise description + strict parameter schema.

(2) Tool Call

The structured instruction the model emits after reasoning: function name + JSON arguments.
The model can emit multiple tool calls in one inference (parallel calling).
The application is responsible for parsing tool calls, executing the function, and returning the result.

(3) Tool Result

The function's return value, sent back to the model as a message.
The model continues reasoning from the result: it may produce a final answer or issue another tool call.

(4) Tool Choice

auto: model decides whether to call a tool (default)
required: force the model to call at least one tool
none: forbid tool calls; text-only output
specific: must call a specific tool

(5) Execution Loop

The core pattern is a loop:
User question + tool definitions → sent to the model
Model returns a tool call (or a direct answer)
Application executes the function and gets the result
Send the result back to the model
Repeat 2-4 until the model produces the final answer
This loop is the foundation of Agents — an Agent is essentially "LLM + Tool Loop".

(6) Structured Output

A side-product of Function Calling: even without calling a real function, you can use a tool definition to force the model to output JSON conforming to a schema.
Use: data extraction, classification labeling, form filling — any structured-data scenario.
OpenAI's response_format: { type: "json_schema" } is purpose-built for this.

2.2. Core Components

(1) Schema Registry

Function: manage all available tool definitions (name, description, parameter schema).
Design: tool description quality directly affects the model's calling accuracy.
Best practice: function names start with verbs (get_weather, create_order); descriptions clearly state "when to use this tool".

(2) Router / Dispatcher

Function: parse the model's tool call and route to the corresponding function implementation.
Security: must verify the function name is in an allow-list and arguments conform to the schema.
Error handling: function not found, schema-validation failure, execution timeout — all need graceful handling.

(3) Executor

Function: actually run the function call (API request, DB query, system command).
Security: sandboxed execution, permission control, timeouts.
Concurrency: parallel tool calls can run concurrently to reduce latency.

(4) Result Formatter

Function: format the function return into something the model can understand.
Tips: large result sets need truncation or summarization to avoid blowing up the context window.
Errors: on failure, return a clear error so the model can adjust its strategy.

2.3. Architecture & Design

(1) Standard Function-Calling Flow

sequenceDiagram
  participant U as User
  participant A as Application
  participant M as LLM
  participant T as External Tool/API

  U->>A: "What's the weather in Beijing tomorrow?"
  A->>M: User message + Tool Definitions
  M->>A: Tool Call: get_weather(city="Beijing", date="tomorrow")
  A->>T: Call weather API
  T->>A: {"temp": 28, "condition": "sunny"}
  A->>M: Tool Result: {"temp": 28, "condition": "sunny"}
  M->>A: "Beijing tomorrow: sunny, 28°C — good for outings"
  A->>U: Final answer

(2) Multi-Tool Parallel Calling

sequenceDiagram
  participant U as User
  participant A as Application
  participant M as LLM
  participant T1 as Weather API
  participant T2 as Flight API

  U->>A: "Tomorrow's Beijing weather and flights to Shanghai"
  A->>M: User message + Tool Definitions
  M->>A: Tool Call 1: get_weather(...) + Tool Call 2: search_flights(...)

  par Parallel execution
    A->>T1: Query weather
    A->>T2: Query flights
  end

  T1->>A: Weather result
  T2->>A: Flight result
  A->>M: Both Tool Results
  M->>A: Combined answer
  A->>U: Final answer

(3) Relationship to Agent Architecture

flowchart TD
  A[Agent = LLM + Tools + Loop] --> B[Function Calling — Agent's "hand"]
  A --> C[Memory — Agent's "brain"]
  A --> D[Planning — Agent's "thought"]

  B --> B1[MCP: standardized tool interface]
  B --> B2[Native Tool Use: model's built-in capability]
  B --> B3[ReAct: Reason + Act pattern]

2.4. Eco-system

Protocol layer:
MCP (Model Context Protocol): Anthropic-led open standard; standardizes tool definition and invocation; tools become reusable across models
OpenAI Function Calling API: de-facto standard; most frameworks are compatible
A2A (Agent-to-Agent): Google-led; Agents call each other via Function Calling
Framework layer:
LangChain / LangGraph: @tool decorator quickly defines tools and auto-generates schemas
LlamaIndex: FunctionTool class wraps tools, integrates seamlessly with RAG pipelines
Vercel AI SDK: TypeScript ecosystem, tool() function, frontend-friendly
PydanticAI: auto-generates tool schemas from Pydantic models
Tool ecosystem:
MCP Server Hub: community-maintained MCP servers (DBs, file systems, APIs)
LangChain Tools: 100+ pre-built tools (search, calc, code execution)
OpenAI Plugins (now GPTs Actions): third-party integrations

3. Install, Configure, Secure, & Cheatsheets

3.1. OpenAI Function Calling Implementation

from openai import OpenAI
import json

client = OpenAI()

# Step 1: define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city. Call this when the user asks about weather.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g., 'Beijing', 'Seattle'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

# Step 2: send the request
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "How's Beijing's weather today?"}],
    tools=tools,
    tool_choice="auto"  # auto / required / none / {"type": "function", "function": {"name": "..."}}
)

# Step 3: handle the tool call
message = response.choices[0].message
if message.tool_calls:
    for tool_call in message.tool_calls:
        func_name = tool_call.function.name
        func_args = json.loads(tool_call.function.arguments)
        # Execute the actual function
        result = get_weather(**func_args)  # your implementation

        # Step 4: send result back to the model
        messages = [
            {"role": "user", "content": "How's Beijing's weather today?"},
            message,  # the assistant message containing tool_calls
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result)
            }
        ]
        final = client.chat.completions.create(
            model="gpt-4o",
            messages=messages
        )
        print(final.choices[0].message.content)

3.2. Anthropic Tool Use Implementation

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "input_schema": {  # Anthropic uses input_schema instead of parameters
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    ],
    messages=[{"role": "user", "content": "What's Beijing's weather like?"}]
)

# Handle tool_use blocks
for block in response.content:
    if block.type == "tool_use":
        result = get_weather(**block.input)
        # Send result back
        final = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[
                {"role": "user", "content": "What's Beijing's weather like?"},
                {"role": "assistant", "content": response.content},
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": json.dumps(result)
                        }
                    ]
                }
            ]
        )

3.3. LangChain Shortcut

from langchain_core.tools import tool
from langchain_openai import ChatOpenAI

@tool
def get_weather(city: str, unit: str = "celsius") -> dict:
    """Get current weather for a city. Call this when the user asks about weather.

    Args:
        city: City name, e.g., 'Beijing', 'Seattle'
        unit: Temperature unit — celsius or fahrenheit
    """
    # Real implementation
    return {"city": city, "temp": 28, "condition": "sunny"}

# Auto-generates Tool Schema from docstring + type hints
llm = ChatOpenAI(model="gpt-4o")
llm_with_tools = llm.bind_tools([get_weather])

response = llm_with_tools.invoke("What's the weather in Beijing?")

3.4. Security Best Practices

Input validation: never directly execute model-emitted arguments; strictly validate via JSON Schema or Pydantic.
Allow-list: only allow predefined functions; reject unknown function names.
Permission control: different users get different toolsets (e.g., regular users can't call delete_user).
Sandboxed execution: tools that run code must execute in a sandbox (Docker / gVisor).
Rate limit: prevent infinite tool-call loops; cap max calls (typically 10-20).
Sensitive-action confirmation: write operations (create / modify / delete) need human-in-the-loop confirmation.
Prompt-injection defense: tool returns may contain malicious instructions — filter them.

3.5. Cheatsheet — Three-Platform API Comparison

Dimension	OpenAI	Anthropic	Google
Tool-definition field	`tools[].function`	`tools[]`	`tools[].functionDeclarations`
Param schema	`parameters`	`input_schema`	`parameters`
Call output	`message.tool_calls[]`	`content[].type == "tool_use"`	`candidates[].content.parts[].functionCall`
Result return	`role: "tool"`	`type: "tool_result"`	`role: "function"`
Call ID	`tool_call.id`	`block.id`	None (matched by order)
Parallel calls	✅ multiple tool_calls	✅ multiple tool_use blocks	✅ multiple functionCall parts

4. Bootcamp & Workshops

4.1. Official & Classic Tutorials

Resource	Link	Goal
OpenAI Function Calling Guide	platform.openai.com	Official guide
Anthropic Tool Use Guide	docs.anthropic.com	Complete Claude Tool Use docs
LangChain Tool Calling	python.langchain.com	Cross-model framework abstraction
DeepLearning.AI - Function Calling	deeplearning.ai	Practical course
MCP Official Docs	modelcontextprotocol.io	Standardized tool protocol

4.2. Trouble Shooting

Symptom	Root Cause	Solution
Model answers directly without calling tools	Tool description unclear; model doesn't know when to call	Improve `description`; make "when to call" explicit
Wrong argument format / missing required args	Schema not strict enough	Add `required`; use `enum` to constrain values
Wrong tool selected	Ambiguous tool descriptions	Ensure each tool's description doesn't overlap; single responsibility
Infinite-loop tool calls	Model calls the same tool repeatedly	Set max calls; explicitly include "done" signal in result
Parallel-call results jumbled	Results not matched to tool-call IDs	Ensure each tool result carries the correct tool_call_id
Tool execution timeout	External API slow	Set timeouts; return timeout errors so the model can pivot

4.3. Common Q & A

Q: How is Function Calling related to MCP?
A: Function Calling is a model-layer capability (the model knows how to emit tool calls); MCP is a protocol-layer standard (defines how tools are discovered, described, and invoked). MCP builds on Function Calling so tools are reusable across models and apps.
Q: Do all LLMs support Function Calling?
A: In 2026, most commercial models (GPT-4o, Claude, Gemini) and many open-source models (Llama 3.3, Mistral) support it natively. Models that don't can simulate via Prompt Engineering, but reliability is lower.
Q: Difference between Function Calling and JSON Mode?
A: JSON Mode only guarantees valid JSON; Function Calling guarantees output conforming to a specific schema, with function name and call ID, supporting multi-turn interaction.
Q: How to handle tool execution failure?
A: Return the error as a Tool Result so the model can decide next (retry, switch tool, inform user). Don't silently swallow errors.
Q: Max number of tools per request?
A: OpenAI supports up to 128; Anthropic recommends ≤ 64. But more tools means lower selection accuracy — keep it to 10-20 in practice, or use a router for dynamic loading.