3.Function Calling
👉 #AI #LLM #Agent #Coding
📅 2026-04-28 Tuesday PST; Claude Opus 4.6
📎 LLM Function Calling Complete Guide
📎 Function Calling: OpenAI, Anthropic, Google
📎 Building Agentic LLM Systems
1. Overview
1.1. Definition & Why
- Function Calling (Tool Use): during inference, the LLM emits a structured JSON instruction specifying which external function to call and with what arguments; the application layer executes it and feeds the result back to the model for further reasoning.
- Key distinction: the LLM does not run any code itself — it only "decides what to call"; the actual execution is performed by the application.
- Design intent: a pure-text LLM cannot interact with the outside world; Function Calling is the bridge that promotes an LLM from "chatbot" to "agent that takes actions".
- Pain points solved:
- Real-time data: training data has a cutoff, but APIs can fetch live weather / stock prices / news
- Precise computation: LLMs are unreliable at math; a calculator function gives exact results
- System integration: let the LLM operate databases, send emails, create tickets, deploy code
- Structured output: force the LLM to output JSON conforming to a schema rather than free text
- Terminology unification: OpenAI calls it "Function Calling"; Anthropic calls it "Tool Use"; Google calls it "Function Declarations" — same essence.
1.2. Features & Use Cases
- Core capabilities:
- Structured Output: force the model to output per a JSON schema, parseable by code
- Parallel Calling: call multiple functions in one inference (e.g., weather and flights at once)
- Chained Calling: feed one function's result as the next function's input
- Forced Calling: require the model to call a specific function
- Auto Selection: model picks the best fit from a list of available functions
- Typical scenarios:
- Smart customer service: query order status, change shipping address, issue refunds
- Data analysis: query DBs, generate charts, export reports
- DevOps: check service status, restart instances, query logs
- Personal assistant: check calendar, send email, book conference rooms, set reminders
- RAG enhancement: expose "search the knowledge base" as a tool the model can call on demand
- Multimodal: invoke image generation / TTS / video analysis services
1.3. Competitors
- Function Calling is a foundational capability — no direct "competitors", but different implementation paths:
| Approach |
Mechanism |
Pros |
Cons |
| Native Function Calling |
Model natively emits structured tool calls |
Most reliable; specifically trained |
Depends on model-provider API |
| Prompt-Based Parsing |
Prompt asks for JSON, app parses it |
Works on any model |
Format unstable; needs extra validation |
| ReAct Pattern |
Model alternates Thought / Action / Observation |
Reasoning is transparent |
High token cost across multi-turn |
| MCP (Model Context Protocol) |
Standardized tool-interface protocol |
Tools reusable across models |
Ecosystem still developing |
- Mainstream-model Function Calling capability comparison (2026):
| Model |
Parallel |
Forced |
Streaming |
Reliability |
| GPT-4o / GPT-4.1 |
✅ |
✅ |
✅ |
High |
| Claude Sonnet/Opus |
✅ |
✅ |
✅ |
High |
| Gemini 2.5 |
✅ |
✅ |
✅ |
High |
| Llama 3.3 |
✅ |
✅ |
✅ |
Medium-high |
| Mistral Large |
✅ |
✅ |
✅ |
Medium |
2. Concept, Component, & Architecture
2.1. Key Concepts
- Describes a function via JSON Schema: name, purpose, parameter types and constraints.
- This is the "contract" of Function Calling — the model uses it to decide when to call and what to pass.
- A good tool definition = clear function name + precise description + strict parameter schema.
- The structured instruction the model emits after reasoning: function name + JSON arguments.
- The model can emit multiple tool calls in one inference (parallel calling).
- The application is responsible for parsing tool calls, executing the function, and returning the result.
- The function's return value, sent back to the model as a message.
- The model continues reasoning from the result: it may produce a final answer or issue another tool call.
auto: model decides whether to call a tool (default)
required: force the model to call at least one tool
none: forbid tool calls; text-only output
specific: must call a specific tool
(5) Execution Loop
- The core pattern is a loop:
- User question + tool definitions → sent to the model
- Model returns a tool call (or a direct answer)
- Application executes the function and gets the result
- Send the result back to the model
- Repeat 2-4 until the model produces the final answer
- This loop is the foundation of Agents — an Agent is essentially "LLM + Tool Loop".
(6) Structured Output
- A side-product of Function Calling: even without calling a real function, you can use a tool definition to force the model to output JSON conforming to a schema.
- Use: data extraction, classification labeling, form filling — any structured-data scenario.
- OpenAI's
response_format: { type: "json_schema" } is purpose-built for this.
2.2. Core Components
(1) Schema Registry
- Function: manage all available tool definitions (name, description, parameter schema).
- Design: tool description quality directly affects the model's calling accuracy.
- Best practice: function names start with verbs (get_weather, create_order); descriptions clearly state "when to use this tool".
(2) Router / Dispatcher
- Function: parse the model's tool call and route to the corresponding function implementation.
- Security: must verify the function name is in an allow-list and arguments conform to the schema.
- Error handling: function not found, schema-validation failure, execution timeout — all need graceful handling.
(3) Executor
- Function: actually run the function call (API request, DB query, system command).
- Security: sandboxed execution, permission control, timeouts.
- Concurrency: parallel tool calls can run concurrently to reduce latency.
- Function: format the function return into something the model can understand.
- Tips: large result sets need truncation or summarization to avoid blowing up the context window.
- Errors: on failure, return a clear error so the model can adjust its strategy.
2.3. Architecture & Design
(1) Standard Function-Calling Flow
sequenceDiagram
participant U as User
participant A as Application
participant M as LLM
participant T as External Tool/API
U->>A: "What's the weather in Beijing tomorrow?"
A->>M: User message + Tool Definitions
M->>A: Tool Call: get_weather(city="Beijing", date="tomorrow")
A->>T: Call weather API
T->>A: {"temp": 28, "condition": "sunny"}
A->>M: Tool Result: {"temp": 28, "condition": "sunny"}
M->>A: "Beijing tomorrow: sunny, 28°C — good for outings"
A->>U: Final answer
sequenceDiagram
participant U as User
participant A as Application
participant M as LLM
participant T1 as Weather API
participant T2 as Flight API
U->>A: "Tomorrow's Beijing weather and flights to Shanghai"
A->>M: User message + Tool Definitions
M->>A: Tool Call 1: get_weather(...) + Tool Call 2: search_flights(...)
par Parallel execution
A->>T1: Query weather
A->>T2: Query flights
end
T1->>A: Weather result
T2->>A: Flight result
A->>M: Both Tool Results
M->>A: Combined answer
A->>U: Final answer
(3) Relationship to Agent Architecture
flowchart TD
A[Agent = LLM + Tools + Loop] --> B[Function Calling — Agent's "hand"]
A --> C[Memory — Agent's "brain"]
A --> D[Planning — Agent's "thought"]
B --> B1[MCP: standardized tool interface]
B --> B2[Native Tool Use: model's built-in capability]
B --> B3[ReAct: Reason + Act pattern]
2.4. Eco-system
- Protocol layer:
- MCP (Model Context Protocol): Anthropic-led open standard; standardizes tool definition and invocation; tools become reusable across models
- OpenAI Function Calling API: de-facto standard; most frameworks are compatible
- A2A (Agent-to-Agent): Google-led; Agents call each other via Function Calling
- Framework layer:
- LangChain / LangGraph:
@tool decorator quickly defines tools and auto-generates schemas
- LlamaIndex:
FunctionTool class wraps tools, integrates seamlessly with RAG pipelines
- Vercel AI SDK: TypeScript ecosystem,
tool() function, frontend-friendly
- PydanticAI: auto-generates tool schemas from Pydantic models
- Tool ecosystem:
- MCP Server Hub: community-maintained MCP servers (DBs, file systems, APIs)
- LangChain Tools: 100+ pre-built tools (search, calc, code execution)
- OpenAI Plugins (now GPTs Actions): third-party integrations
3.1. OpenAI Function Calling Implementation
from openai import OpenAI
import json
client = OpenAI()
# Step 1: define tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city. Call this when the user asks about weather.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g., 'Beijing', 'Seattle'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
}
]
# Step 2: send the request
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "How's Beijing's weather today?"}],
tools=tools,
tool_choice="auto" # auto / required / none / {"type": "function", "function": {"name": "..."}}
)
# Step 3: handle the tool call
message = response.choices[0].message
if message.tool_calls:
for tool_call in message.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
# Execute the actual function
result = get_weather(**func_args) # your implementation
# Step 4: send result back to the model
messages = [
{"role": "user", "content": "How's Beijing's weather today?"},
message, # the assistant message containing tool_calls
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
}
]
final = client.chat.completions.create(
model="gpt-4o",
messages=messages
)
print(final.choices[0].message.content)
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": { # Anthropic uses input_schema instead of parameters
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
],
messages=[{"role": "user", "content": "What's Beijing's weather like?"}]
)
# Handle tool_use blocks
for block in response.content:
if block.type == "tool_use":
result = get_weather(**block.input)
# Send result back
final = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "What's Beijing's weather like?"},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": block.id,
"content": json.dumps(result)
}
]
}
]
)
3.3. LangChain Shortcut
from langchain_core.tools import tool
from langchain_openai import ChatOpenAI
@tool
def get_weather(city: str, unit: str = "celsius") -> dict:
"""Get current weather for a city. Call this when the user asks about weather.
Args:
city: City name, e.g., 'Beijing', 'Seattle'
unit: Temperature unit — celsius or fahrenheit
"""
# Real implementation
return {"city": city, "temp": 28, "condition": "sunny"}
# Auto-generates Tool Schema from docstring + type hints
llm = ChatOpenAI(model="gpt-4o")
llm_with_tools = llm.bind_tools([get_weather])
response = llm_with_tools.invoke("What's the weather in Beijing?")
3.4. Security Best Practices
- Input validation: never directly execute model-emitted arguments; strictly validate via JSON Schema or Pydantic.
- Allow-list: only allow predefined functions; reject unknown function names.
- Permission control: different users get different toolsets (e.g., regular users can't call
delete_user).
- Sandboxed execution: tools that run code must execute in a sandbox (Docker / gVisor).
- Rate limit: prevent infinite tool-call loops; cap max calls (typically 10-20).
- Sensitive-action confirmation: write operations (create / modify / delete) need human-in-the-loop confirmation.
- Prompt-injection defense: tool returns may contain malicious instructions — filter them.
| Dimension |
OpenAI |
Anthropic |
Google |
| Tool-definition field |
tools[].function |
tools[] |
tools[].functionDeclarations |
| Param schema |
parameters |
input_schema |
parameters |
| Call output |
message.tool_calls[] |
content[].type == "tool_use" |
candidates[].content.parts[].functionCall |
| Result return |
role: "tool" |
type: "tool_result" |
role: "function" |
| Call ID |
tool_call.id |
block.id |
None (matched by order) |
| Parallel calls |
✅ multiple tool_calls |
✅ multiple tool_use blocks |
✅ multiple functionCall parts |
4. Bootcamp & Workshops
4.1. Official & Classic Tutorials
4.2. Trouble Shooting
| Symptom |
Root Cause |
Solution |
| Model answers directly without calling tools |
Tool description unclear; model doesn't know when to call |
Improve description; make "when to call" explicit |
| Wrong argument format / missing required args |
Schema not strict enough |
Add required; use enum to constrain values |
| Wrong tool selected |
Ambiguous tool descriptions |
Ensure each tool's description doesn't overlap; single responsibility |
| Infinite-loop tool calls |
Model calls the same tool repeatedly |
Set max calls; explicitly include "done" signal in result |
| Parallel-call results jumbled |
Results not matched to tool-call IDs |
Ensure each tool result carries the correct tool_call_id |
| Tool execution timeout |
External API slow |
Set timeouts; return timeout errors so the model can pivot |
4.3. Common Q & A
- Q: How is Function Calling related to MCP?
- A: Function Calling is a model-layer capability (the model knows how to emit tool calls); MCP is a protocol-layer standard (defines how tools are discovered, described, and invoked). MCP builds on Function Calling so tools are reusable across models and apps.
- Q: Do all LLMs support Function Calling?
- A: In 2026, most commercial models (GPT-4o, Claude, Gemini) and many open-source models (Llama 3.3, Mistral) support it natively. Models that don't can simulate via Prompt Engineering, but reliability is lower.
- Q: Difference between Function Calling and JSON Mode?
- A: JSON Mode only guarantees valid JSON; Function Calling guarantees output conforming to a specific schema, with function name and call ID, supporting multi-turn interaction.
- Q: How to handle tool execution failure?
- A: Return the error as a Tool Result so the model can decide next (retry, switch tool, inform user). Don't silently swallow errors.
- Q: Max number of tools per request?
- A: OpenAI supports up to 128; Anthropic recommends ≤ 64. But more tools means lower selection accuracy — keep it to 10-20 in practice, or use a router for dynamic loading.