Skip to content

3.Function Calling

👉 #AI #LLM #Agent #Coding

I. Function Calling (Tool Use)

📅 2026-04-28 Tuesday PST; Claude Opus 4.6 📎 LLM Function Calling Complete Guide 📎 Function Calling: OpenAI, Anthropic, Google 📎 Building Agentic LLM Systems

1. Overview

1.1. Definition & Why
  • Function Calling (Tool Use): during inference, the LLM emits a structured JSON instruction specifying which external function to call and with what arguments; the application layer executes it and feeds the result back to the model for further reasoning.
  • Key distinction: the LLM does not run any code itself — it only "decides what to call"; the actual execution is performed by the application.
  • Design intent: a pure-text LLM cannot interact with the outside world; Function Calling is the bridge that promotes an LLM from "chatbot" to "agent that takes actions".
  • Pain points solved:
  • Real-time data: training data has a cutoff, but APIs can fetch live weather / stock prices / news
  • Precise computation: LLMs are unreliable at math; a calculator function gives exact results
  • System integration: let the LLM operate databases, send emails, create tickets, deploy code
  • Structured output: force the LLM to output JSON conforming to a schema rather than free text
  • Terminology unification: OpenAI calls it "Function Calling"; Anthropic calls it "Tool Use"; Google calls it "Function Declarations" — same essence.
1.2. Features & Use Cases
  • Core capabilities:
  • Structured Output: force the model to output per a JSON schema, parseable by code
  • Parallel Calling: call multiple functions in one inference (e.g., weather and flights at once)
  • Chained Calling: feed one function's result as the next function's input
  • Forced Calling: require the model to call a specific function
  • Auto Selection: model picks the best fit from a list of available functions
  • Typical scenarios:
  • Smart customer service: query order status, change shipping address, issue refunds
  • Data analysis: query DBs, generate charts, export reports
  • DevOps: check service status, restart instances, query logs
  • Personal assistant: check calendar, send email, book conference rooms, set reminders
  • RAG enhancement: expose "search the knowledge base" as a tool the model can call on demand
  • Multimodal: invoke image generation / TTS / video analysis services
1.3. Competitors
  • Function Calling is a foundational capability — no direct "competitors", but different implementation paths:
Approach Mechanism Pros Cons
Native Function Calling Model natively emits structured tool calls Most reliable; specifically trained Depends on model-provider API
Prompt-Based Parsing Prompt asks for JSON, app parses it Works on any model Format unstable; needs extra validation
ReAct Pattern Model alternates Thought / Action / Observation Reasoning is transparent High token cost across multi-turn
MCP (Model Context Protocol) Standardized tool-interface protocol Tools reusable across models Ecosystem still developing
  • Mainstream-model Function Calling capability comparison (2026):
Model Parallel Forced Streaming Reliability
GPT-4o / GPT-4.1 High
Claude Sonnet/Opus High
Gemini 2.5 High
Llama 3.3 Medium-high
Mistral Large Medium

2. Concept, Component, & Architecture

2.1. Key Concepts
(1) Tool Definition
  • Describes a function via JSON Schema: name, purpose, parameter types and constraints.
  • This is the "contract" of Function Calling — the model uses it to decide when to call and what to pass.
  • A good tool definition = clear function name + precise description + strict parameter schema.
(2) Tool Call
  • The structured instruction the model emits after reasoning: function name + JSON arguments.
  • The model can emit multiple tool calls in one inference (parallel calling).
  • The application is responsible for parsing tool calls, executing the function, and returning the result.
(3) Tool Result
  • The function's return value, sent back to the model as a message.
  • The model continues reasoning from the result: it may produce a final answer or issue another tool call.
(4) Tool Choice
  • auto: model decides whether to call a tool (default)
  • required: force the model to call at least one tool
  • none: forbid tool calls; text-only output
  • specific: must call a specific tool
(5) Execution Loop
  • The core pattern is a loop:
  • User question + tool definitions → sent to the model
  • Model returns a tool call (or a direct answer)
  • Application executes the function and gets the result
  • Send the result back to the model
  • Repeat 2-4 until the model produces the final answer
  • This loop is the foundation of Agents — an Agent is essentially "LLM + Tool Loop".
(6) Structured Output
  • A side-product of Function Calling: even without calling a real function, you can use a tool definition to force the model to output JSON conforming to a schema.
  • Use: data extraction, classification labeling, form filling — any structured-data scenario.
  • OpenAI's response_format: { type: "json_schema" } is purpose-built for this.
2.2. Core Components
(1) Schema Registry
  • Function: manage all available tool definitions (name, description, parameter schema).
  • Design: tool description quality directly affects the model's calling accuracy.
  • Best practice: function names start with verbs (get_weather, create_order); descriptions clearly state "when to use this tool".
(2) Router / Dispatcher
  • Function: parse the model's tool call and route to the corresponding function implementation.
  • Security: must verify the function name is in an allow-list and arguments conform to the schema.
  • Error handling: function not found, schema-validation failure, execution timeout — all need graceful handling.
(3) Executor
  • Function: actually run the function call (API request, DB query, system command).
  • Security: sandboxed execution, permission control, timeouts.
  • Concurrency: parallel tool calls can run concurrently to reduce latency.
(4) Result Formatter
  • Function: format the function return into something the model can understand.
  • Tips: large result sets need truncation or summarization to avoid blowing up the context window.
  • Errors: on failure, return a clear error so the model can adjust its strategy.
2.3. Architecture & Design
(1) Standard Function-Calling Flow
sequenceDiagram
  participant U as User
  participant A as Application
  participant M as LLM
  participant T as External Tool/API

  U->>A: "What's the weather in Beijing tomorrow?"
  A->>M: User message + Tool Definitions
  M->>A: Tool Call: get_weather(city="Beijing", date="tomorrow")
  A->>T: Call weather API
  T->>A: {"temp": 28, "condition": "sunny"}
  A->>M: Tool Result: {"temp": 28, "condition": "sunny"}
  M->>A: "Beijing tomorrow: sunny, 28°C — good for outings"
  A->>U: Final answer
(2) Multi-Tool Parallel Calling
sequenceDiagram
  participant U as User
  participant A as Application
  participant M as LLM
  participant T1 as Weather API
  participant T2 as Flight API

  U->>A: "Tomorrow's Beijing weather and flights to Shanghai"
  A->>M: User message + Tool Definitions
  M->>A: Tool Call 1: get_weather(...) + Tool Call 2: search_flights(...)

  par Parallel execution
    A->>T1: Query weather
    A->>T2: Query flights
  end

  T1->>A: Weather result
  T2->>A: Flight result
  A->>M: Both Tool Results
  M->>A: Combined answer
  A->>U: Final answer
(3) Relationship to Agent Architecture
flowchart TD
  A[Agent = LLM + Tools + Loop] --> B[Function Calling — Agent's "hand"]
  A --> C[Memory — Agent's "brain"]
  A --> D[Planning — Agent's "thought"]

  B --> B1[MCP: standardized tool interface]
  B --> B2[Native Tool Use: model's built-in capability]
  B --> B3[ReAct: Reason + Act pattern]
2.4. Eco-system
  • Protocol layer:
  • MCP (Model Context Protocol): Anthropic-led open standard; standardizes tool definition and invocation; tools become reusable across models
  • OpenAI Function Calling API: de-facto standard; most frameworks are compatible
  • A2A (Agent-to-Agent): Google-led; Agents call each other via Function Calling
  • Framework layer:
  • LangChain / LangGraph: @tool decorator quickly defines tools and auto-generates schemas
  • LlamaIndex: FunctionTool class wraps tools, integrates seamlessly with RAG pipelines
  • Vercel AI SDK: TypeScript ecosystem, tool() function, frontend-friendly
  • PydanticAI: auto-generates tool schemas from Pydantic models
  • Tool ecosystem:
  • MCP Server Hub: community-maintained MCP servers (DBs, file systems, APIs)
  • LangChain Tools: 100+ pre-built tools (search, calc, code execution)
  • OpenAI Plugins (now GPTs Actions): third-party integrations

3. Install, Configure, Secure, & Cheatsheets

3.1. OpenAI Function Calling Implementation
from openai import OpenAI
import json

client = OpenAI()

# Step 1: define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city. Call this when the user asks about weather.",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g., 'Beijing', 'Seattle'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

# Step 2: send the request
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "How's Beijing's weather today?"}],
    tools=tools,
    tool_choice="auto"  # auto / required / none / {"type": "function", "function": {"name": "..."}}
)

# Step 3: handle the tool call
message = response.choices[0].message
if message.tool_calls:
    for tool_call in message.tool_calls:
        func_name = tool_call.function.name
        func_args = json.loads(tool_call.function.arguments)
        # Execute the actual function
        result = get_weather(**func_args)  # your implementation

        # Step 4: send result back to the model
        messages = [
            {"role": "user", "content": "How's Beijing's weather today?"},
            message,  # the assistant message containing tool_calls
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result)
            }
        ]
        final = client.chat.completions.create(
            model="gpt-4o",
            messages=messages
        )
        print(final.choices[0].message.content)
3.2. Anthropic Tool Use Implementation
import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=[
        {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "input_schema": {  # Anthropic uses input_schema instead of parameters
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    ],
    messages=[{"role": "user", "content": "What's Beijing's weather like?"}]
)

# Handle tool_use blocks
for block in response.content:
    if block.type == "tool_use":
        result = get_weather(**block.input)
        # Send result back
        final = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=1024,
            messages=[
                {"role": "user", "content": "What's Beijing's weather like?"},
                {"role": "assistant", "content": response.content},
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": json.dumps(result)
                        }
                    ]
                }
            ]
        )
3.3. LangChain Shortcut
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI

@tool
def get_weather(city: str, unit: str = "celsius") -> dict:
    """Get current weather for a city. Call this when the user asks about weather.

    Args:
        city: City name, e.g., 'Beijing', 'Seattle'
        unit: Temperature unit — celsius or fahrenheit
    """
    # Real implementation
    return {"city": city, "temp": 28, "condition": "sunny"}

# Auto-generates Tool Schema from docstring + type hints
llm = ChatOpenAI(model="gpt-4o")
llm_with_tools = llm.bind_tools([get_weather])

response = llm_with_tools.invoke("What's the weather in Beijing?")
3.4. Security Best Practices
  • Input validation: never directly execute model-emitted arguments; strictly validate via JSON Schema or Pydantic.
  • Allow-list: only allow predefined functions; reject unknown function names.
  • Permission control: different users get different toolsets (e.g., regular users can't call delete_user).
  • Sandboxed execution: tools that run code must execute in a sandbox (Docker / gVisor).
  • Rate limit: prevent infinite tool-call loops; cap max calls (typically 10-20).
  • Sensitive-action confirmation: write operations (create / modify / delete) need human-in-the-loop confirmation.
  • Prompt-injection defense: tool returns may contain malicious instructions — filter them.
3.5. Cheatsheet — Three-Platform API Comparison
Dimension OpenAI Anthropic Google
Tool-definition field tools[].function tools[] tools[].functionDeclarations
Param schema parameters input_schema parameters
Call output message.tool_calls[] content[].type == "tool_use" candidates[].content.parts[].functionCall
Result return role: "tool" type: "tool_result" role: "function"
Call ID tool_call.id block.id None (matched by order)
Parallel calls ✅ multiple tool_calls ✅ multiple tool_use blocks ✅ multiple functionCall parts

4. Bootcamp & Workshops

4.1. Official & Classic Tutorials
Resource Link Goal
OpenAI Function Calling Guide platform.openai.com Official guide
Anthropic Tool Use Guide docs.anthropic.com Complete Claude Tool Use docs
LangChain Tool Calling python.langchain.com Cross-model framework abstraction
DeepLearning.AI - Function Calling deeplearning.ai Practical course
MCP Official Docs modelcontextprotocol.io Standardized tool protocol
4.2. Trouble Shooting
Symptom Root Cause Solution
Model answers directly without calling tools Tool description unclear; model doesn't know when to call Improve description; make "when to call" explicit
Wrong argument format / missing required args Schema not strict enough Add required; use enum to constrain values
Wrong tool selected Ambiguous tool descriptions Ensure each tool's description doesn't overlap; single responsibility
Infinite-loop tool calls Model calls the same tool repeatedly Set max calls; explicitly include "done" signal in result
Parallel-call results jumbled Results not matched to tool-call IDs Ensure each tool result carries the correct tool_call_id
Tool execution timeout External API slow Set timeouts; return timeout errors so the model can pivot
4.3. Common Q & A
  • Q: How is Function Calling related to MCP?
  • A: Function Calling is a model-layer capability (the model knows how to emit tool calls); MCP is a protocol-layer standard (defines how tools are discovered, described, and invoked). MCP builds on Function Calling so tools are reusable across models and apps.
  • Q: Do all LLMs support Function Calling?
  • A: In 2026, most commercial models (GPT-4o, Claude, Gemini) and many open-source models (Llama 3.3, Mistral) support it natively. Models that don't can simulate via Prompt Engineering, but reliability is lower.
  • Q: Difference between Function Calling and JSON Mode?
  • A: JSON Mode only guarantees valid JSON; Function Calling guarantees output conforming to a specific schema, with function name and call ID, supporting multi-turn interaction.
  • Q: How to handle tool execution failure?
  • A: Return the error as a Tool Result so the model can decide next (retry, switch tool, inform user). Don't silently swallow errors.
  • Q: Max number of tools per request?
  • A: OpenAI supports up to 128; Anthropic recommends ≤ 64. But more tools means lower selection accuracy — keep it to 10-20 in practice, or use a router for dynamic loading.