Introduction
OpenAI’s latest models, like GPT-o1 and GPT-4o, excel in delivering accurate, context-aware responses across diverse fields. A key factor behind the advancements in these Large Language Models (LLMs) is their enhanced utility and the significant reduction in common issues like hallucinations. Techniques like retrieval-augmented generation (RAG) enhance accuracy and reduce hallucinations by allowing models to access external, pre-indexed data. However, function-calling emerges as a key capability when applications need real-time data like weather forecasting, stock prices (easy to judge the bullish and bearish behaviour) and other dynamic updates. Function-calling in LLMs, also known as Tool Calling, allows LLMs to invoke APIs or other systems, offering the ability to perform specific tasks autonomously.
This article explores 6 LLMs that support function-calling capabilities, offering real-time API integration for enhanced accuracy and automation. These models are shaping the next generation of AI agents, enabling them to autonomously handle tasks involving data retrieval, processing, and real-time decision-making.
What is Function Calling in LLMs?
Function calling is a methodology that enables large language models (LLMs) to interact with external systems, APIs, and tools. By equipping an LLM with a collection of functions or tools and details on how to use them, the model can intelligently choose and execute the appropriate function to perform a specific task.
This capability significantly extends the functionality of LLMs beyond simple text generation, allowing them to engage with the real world. Instead of only producing text-based responses, LLMs with function-calling capabilities can now perform actions, control devices, access databases for information retrieval, and complete a variety of tasks by utilizing external tools and services.
However, not all LLMs are equipped with function-calling abilities. Only models that have been specifically trained or fine-tuned for this purpose can recognize when a prompt requires invoking a function. The Berkeley Function-Calling Leaderboard, for instance, evaluates how well different LLMs handle a variety of programming languages and API scenarios, highlighting the versatility and reliability of these models in executing multiple, complex functions in parallel. This capability is essential for creating AI systems operating across various software environments and managing tasks requiring simultaneous actions.
Typically, applications utilizing function-calling LLMs follow a two-step process: mapping the user prompt to the correct function and input parameters and processing the function’s output to generate a final, coherent response.
To learn basics of AI Agents, checkout our free course on Introduction to AI Agents!
LLMs that Support Function Callings
Here are 6 LLMs that support function callings:
1. OpenAI GPT-4o
Link to the doc: GPT-4o Function Calling
Function calling in GPT-4o allows developers to connect large language models to external tools and systems, enhancing their capabilities. By leveraging this feature, AI can interact with APIs, fetch data, execute functions, and perform tasks requiring external resource integration. This capability is particularly useful in building intelligent assistants, automating workflows, or developing dynamic applications that can perform actions based on user input.
Example Use Cases
Function calling with GPT-4o opens up a wide range of practical applications, including but not limited to:
- Fetching data for assistants: AI assistants can use function calling to retrieve data from external systems. For example, when a user asks, “What are my recent orders?”, the assistant can use a function call to fetch the latest order details from a database before formulating a response.
- Performing actions: Beyond data retrieval, function calling enables assistants to execute actions, such as scheduling a meeting based on user preferences and calendar availability.
- Performing computations: For specific tasks like mathematical problem solving, function calling allows the assistant to carry out computations, ensuring accurate responses without relying solely on the model’s general reasoning capabilities.
- Building workflows: Function calls can orchestrate complex workflows. An example would be a pipeline that processes unstructured data, converts it into a structured format, and stores it in a database for further use.
- Modifying UI elements: Function calling can be integrated into user interfaces to update based on user inputs dynamically. For instance, it can trigger functions that modify a map UI by rendering pins based on user location or search queries.
These improvements make GPT-4o ideal for building autonomous AI agents, from virtual assistants to complex data analysis tools.
Also read: Introduction to OpenAI Function Calling
2. Gemini 1.5-Flash
Link to the doc: Gemini 1.5-Flash function calling
Function Calling is a powerful feature of Gemini-1.5 Flash that allows developers to define and integrate custom functions seamlessly with Gemini models. Instead of directly invoking these functions, the models generate structured data outputs that specify the function names and suggested arguments. This approach enables the creation of dynamic applications that can interact with external APIs, databases, and various services, providing real-time and contextually relevant responses to user queries.
Introduction to Function Calling with Gemini-1.5 Flash:
The Function Calling feature in Gemini-1.5 Flash empowers developers to extend the capabilities of Gemini models by integrating custom functionalities. By defining custom functions and supplying them to the Gemini models, applications can leverage these functions to perform specific tasks, fetch real-time data, and interact with external systems. This enhances the model’s ability to provide comprehensive and accurate responses tailored to user needs.
Example Use Cases
Function Calling with Gemini-1.5 Flash can be leveraged across various domains to enhance application functionality and user experience. Here are some illustrative use cases:
- E-commerce Platforms:
- Product Recommendations: Integrate with inventory databases to provide real-time product suggestions based on user preferences and availability.
- Order Tracking: Fetch and display the latest order status by calling external order management systems.
- Customer Support:
- Ticket Management: Automatically create, update, or retrieve support tickets by interacting with CRM systems.
- Knowledge Base Access: Retrieve relevant articles or documentation to assist in resolving user queries.
- Healthcare Applications:
- Appointment Scheduling: Access and manage appointment slots by interfacing with medical scheduling systems.
- Patient Information Retrieval: Securely fetch patient records or medical history from databases to provide informed responses.
- Travel and Hospitality:
- Flight Information: Call airline APIs to retrieve real-time flight statuses, availability, and booking options.
- Hotel Reservations: Check room availability, book reservations, and manage bookings through hotel management systems.
- Finance and Banking:
- Account Information: Provide up-to-date account balances and transaction histories by interfacing with banking systems.
- Financial Transactions: Facilitate fund transfers, bill payments, and other financial operations securely.
3. Anthropic Claude Sonnet 3.5
Link to the doc: Anthropic Claude Sonnet 3.5 function calling
Anthropic Claude 4.5 supports function calling, enabling seamless integration with external tools to perform specific tasks. This allows Claude to interact dynamically with external systems and return results to the user in real time. By incorporating custom tools, you can expand Claude’s functionality beyond text generation, enabling it to access external APIs, fetch data, and perform actions essential for specific use cases.
In the context of Claude’s function calling, external tools or APIs can be defined and made available for the model to call during a conversation. Claude intelligently determines when a tool is necessary based on the user’s input, formats the request appropriately, and provides the result in a clear response. This mechanism enhances Claude’s versatility, allowing it to go beyond just answering questions or generating text by integrating real-world data or executing code through external APIs.
How Does Function Calling Work?
To integrate function calling with Claude, follow these steps:
- Provide Claude with tools and a user prompt:
- In the API request, define tools with specific names, descriptions, and input schemas. For instance, a tool might retrieve weather data or execute a calculation.
- The user prompt may require these tools, such as: “What’s the weather in San Francisco?”
- Claude decides to use a tool:
- Claude assesses whether any of the available tools are relevant to the user’s query.
- If applicable, Claude constructs a formatted request to call the tool, and the API responds with a tool_use stop_reason, indicating that Claude intends to use a tool.
- Extract tool input, run the code, and return results:
- The tool name and input are extracted on the client side.
- You execute the tool’s logic (e.g., calling an external API) and return the result as a new user message with a tool_result content block.
- Claude uses the tool result to formulate a response:
- Claude analyzes the tool’s output and integrates it into the final response to the user’s original prompt.
Example Use Cases
Here are the use cases of this function:
- Weather Forecasting:
- User prompt: “What’s the weather like in San Francisco today?”
- Tool use: Claude could call an external weather API to retrieve the current forecast, returning the result as part of the response.
- Currency Conversion:
- User prompt: “What’s 100 USD in EUR?”
- Tool use: Claude could use a currency conversion tool to calculate the equivalent value in real time and provide the exact result.
- Task Automation:
- User prompt: “Set a reminder for tomorrow at 9 AM.”
- Tool use: Claude could call a task scheduling tool to set the reminder in an external system.
- Data Lookup:
- User prompt: “What is Tesla’s stock price?”
- Tool use: Claude could query an external stock market API to fetch the latest stock price for Tesla.
By enabling function calling, Claude 4.5 significantly enhances its ability to assist users by integrating custom and real-world solutions into everyday interactions.
Claude excels in scenarios where safety and interpretability are paramount, making it a reliable choice for applications that require secure and accurate external system integrations.
4. Cohere Command R+
Link to the doc: Cohere Command R+ Function Calling
Function calling, often referred to as Single-Step Tool Use, is a key capability of Command R+ that allows the system to interact directly with external tools like APIs, databases, or search engines in a structured and dynamic manner. The model makes intelligent decisions about which tool to use and what parameters to pass, simplifying the interaction with external systems and APIs.
This capability is central to many advanced use cases because it enables the model to perform tasks that require retrieving or manipulating external data, rather than relying solely on its pre-trained knowledge.
Definition and Mechanics
Command R+ utilizes function calling by making two key inferences:
- Tool Selection: The model identifies which tool should be used based on the conversation and selects the appropriate parameters to pass to the tool.
- Response Generation: Once the external tool returns the data, the model processes that information and generates the final response to the user, integrating it smoothly into the conversation.
Command R+ has been specifically trained to handle this functionality using a specialized prompt template. This ensures that the model can consistently deliver high-quality results when interacting with external tools. Deviating from the recommended template may reduce the performance of the function calling feature.
Example Use Cases
- Weather Forecast Retrieval: Command R+ can be programmed to call a weather API when a user asks about the current weather or future forecasts. The model selects the appropriate parameters (like location and time), makes the API request, and generates a human-friendly response using the returned data.
Example:- User: “What’s the weather in New York tomorrow?”
- Command R+: Calls a weather API with the parameters for “New York” and “tomorrow” and responds, “Tomorrow in New York, expect partly cloudy skies with a high of 75°F.”
- Database Lookup: In scenarios where the user is looking for specific information stored in a database, such as customer details or order history, Command R+ can execute queries dynamically and return the requested information.
Example:- User: “Can you give me the details for customer ID 12345?”
- Command R+: Calls the database, retrieves the relevant customer details, and responds with the appropriate information, “Customer 12345 is John Doe, registered on June 3rd, 2022, with an active subscription.”
- Search Engine Queries: If a user is searching for information that is not contained in the model’s knowledge base, Command R+ can leverage a search engine API to retrieve up-to-date information and then present it to the user in an easily understandable format.
Example:- User: “What’s the latest news on electric vehicle advancements?”
- Command R+: Calls a search engine API to retrieve recent articles or updates, then summarizes the findings: “Recent advancements in electric vehicles include breakthroughs in battery technology, offering a range increase of 20%.”
5. Mistral Large 2
Link to the doc: Mistral Large 2Function Calling
Mistral Large 2, an advanced language model with 123 billion parameters, excels in generating code, solving mathematical problems, and handling multilingual tasks. One of its most powerful features is enhanced function calling, which allows it to execute complex, multi-step processes both in parallel and sequentially. Function calling refers to the model’s ability to dynamically interact with external tools, APIs, or other models to retrieve or process data based on specific user instructions. This capability significantly extends its application across various fields, making it a versatile solution for advanced computational and business applications.
Function Calling Capabilities
Mistral Large 2 has been trained to handle intricate function calls by leveraging both its reasoning skills and its capability to integrate with external processes. Whether it’s calculating complex equations, generating real-time reports, or interacting with APIs to fetch live data, the model’s robust function calling can coordinate tasks that demand high-level problem-solving. The model excels at determining when to call specific functions and how to sequence them for optimal results, whether through parallelization or sequential steps.
Example Use Cases
- Automated Business Workflows:
- Mistral Large 2 can be integrated into customer support systems, where it can automatically process user queries and call different functions to check inventory, schedule appointments, or escalate issues to human agents when necessary. Its ability to sequence and parallelize function calls can handle a high volume of inquiries, reducing response time and enhancing productivity.
- Data Processing and Retrieval:
- Mistral Large 2 can interact with multiple APIs to fetch, analyze, and present data in complex data environments, such as financial markets or scientific research. For example, in financial systems, the model could pull real-time stock data, run risk assessments, and provide investment recommendations based on a series of function calls to relevant APIs and tools.
- Dynamic Report Generation:
- Mistral Large 2 can function as a report generator, pulling data from various sources, applying business logic, and producing customized reports. This is especially useful in industries like logistics, where real-time data processing is crucial. By sequentially calling functions that gather data on shipping statuses, calculate metrics, and forecast trends, the model enables seamless reporting with minimal human input.
- Scientific Computations and Simulations:
- Its enhanced mathematical capabilities combined with function calling make Mistral Large 2 suitable for complex scientific simulations. For instance, in climate modeling, the model can call external data sources to gather real-time atmospheric data, perform parallel calculations across different environmental variables, and then generate predictive models.
Also read: Mistral Large 2: Powerful Enough to Challenge Llama 3.1 405B?
6. Meta LLaMA 3.2
LLaMA 3.2, developed by Meta, stands out for its open-source accessibility and introduction of function calling, making it a powerful tool for developers who require flexibility and customization. This version hasn’t seen as widespread commercialization as other AI models, but its emphasis on adaptability is ideal for teams with strong development resources, especially in research and AI experimentation contexts.
Key Features
- Open-Source Function Calling: One of the unique selling points of LLaMA 3.2 is its open-source nature. This allows developers to customize and tailor function calling for their specific projects, making it particularly useful for internal enterprise applications.
- Adaptability: Thanks to its open-source foundation, LLaMA 3.2 can be adapted to various use cases. This makes it attractive for researchers, academic institutions, or startups looking for more control over their AI tools without heavy commercial overhead.
- Large-Scale Applications: LLaMA 3.2’s function calling capabilities are designed to interact with real-time data and handle large-scale AI system requirements. This feature will benefit enterprises working on proprietary solutions or custom-built AI systems.
As of now, LLaMA 3.2 benchmarks are still in development and haven’t been fully tested, so we’re awaiting comprehensive comparisons to models like GPT-4o. However, its introduction is an exciting leap in function-based AI interaction and flexibility, bringing new opportunities for experimentation and custom solutions.
Also read: 3 Ways to Run Llama 3.2 on Your Device
Steps for Implementing Function Calling in Applications
To integrate function calling into your application, follow these steps:
- Select the Function: Identify the specific function within your codebase that the model should have access to. This function might interact with external systems, update databases, or modify user interfaces.
- Describe the Function to the Model: Provide a clear description of the function, including its purpose and the expected input/output, so the model understands how to interact with it.
- Pass Function Definitions to the Model: When passing messages to the model, include these function definitions, making them available as “tools” that the model can choose to use when responding to prompts.
- Handle the Model’s Response: Once the model has invoked the function, process the response as appropriate within your application.
- Provide the Result Back to the Model: After the function is executed, pass the result back to the model so it can incorporate this information into its final response to the user.
Implementing Function Calling Using GPT-4o
Manages a conversation with the GPT model, leveraging function calling to obtain weather data when needed.
1. Imports and Setup
import json
import os
import requests
from openai import OpenAI
client = OpenAI()
- Imports:
- json: For handling JSON data.
- os: For interacting with the operating system (though not used in the provided code).
- requests: For making HTTP requests to external APIs.
- OpenAI: From the openai package to interact with OpenAI’s API.
- Client Initialization:
- client = OpenAI(): Creates an instance of the OpenAI client to interact with the API.
2. Defining the get_current_weather Function
def get_current_weather(latitude, longitude):
"""Get the current weather in a given latitude and longitude"""
base = "https://api.openweathermap.org/data/2.5/weather"
key = "c64b4b9038f82998c12fa174d606591a"
request_url = f"{base}?lat={latitude}&lon={longitude}&appid={key}&units=metric"
response = requests.get(request_url)
result = {
"latitude": latitude,
"longitude": longitude,
**response.json()["main"]
}
return json.dumps(result)
- Purpose: Fetches current weather data for specified geographic coordinates using the OpenWeatherMap API.
- Parameters:
- latitude: The latitude of the location.
- longitude: The longitude of the location.
- Process:
- Constructs the API request URL with the provided latitude and longitude.
- Sends a GET request to the OpenWeatherMap API.
- Parses the JSON response, extracting relevant weather information.
- Returns the weather data as a JSON-formatted string.
3. Defining the run_conversation Function
def run_conversation(content):
messages = [{"role": "user", "content": content}]
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given latitude and longitude",
"parameters": {
"type": "object",
"properties": {
"latitude": {
"type": "string",
"description": "The latitude of a place",
},
"longitude": {
"type": "string",
"description": "The longitude of a place",
},
},
"required": ["latitude", "longitude"],
},
},
}
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto",
)
response_message = response.choices[0].message
tool_calls = response_message.tool_calls
if tool_calls:
messages.append(response_message)
available_functions = {
"get_current_weather": get_current_weather,
}
for tool_call in tool_calls:
print(f"Function: {tool_call.function.name}")
print(f"Params:{tool_call.function.arguments}")
function_name = tool_call.function.name
function_to_call = available_functions[function_name]
function_args = json.loads(tool_call.function.arguments)
function_response = function_to_call(
latitude=function_args.get("latitude"),
longitude=function_args.get("longitude"),
)
print(f"API: {function_response}")
messages.append(
{
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": function_response,
}
)
second_response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
stream=True
)
return second_response
4. Executing the Conversation
if __name__ == "__main__":
question = "What's the weather like in Paris and San Francisco?"
response = run_conversation(question)
for chunk in response:
print(chunk.choices[0].delta.content or "", end='', flush=True)
Let’s Understand the Code
Function Definition and Input
The run_conversation function takes a user’s input as its argument and starts a conversation by creating a message representing the user’s role and content. This initiates the chat flow where the user’s message is the first interaction.
Tools Setup
A list of tools is defined, and one such tool is a function called get_current_weather. This function is described as retrieving the current weather based on the provided latitude and longitude coordinates. The parameters for this function are clearly specified, including that both latitude and longitude are required inputs.
Generating the First Chat Response
The function then calls the GPT-4 model to generate a response based on the user’s message. The model has access to the tools (such as get_current_weather), and it automatically decides whether to use any of these tools. The response from the model may include tool calls, which are captured for further processing.
Handling Tool Calls
If the model decides to invoke a tool, the tool calls are processed. The function retrieves the appropriate tool (in this case, the get_current_weather function), extracts the parameters (latitude and longitude), and calls the function to get the weather information. The result from this function is then printed and appended to the conversation as a response from the tool.
Generating the Second Chat Response
After the tool’s output is integrated into the conversation, a second request is sent to the GPT-4 model to generate a new response enriched with the tool’s output. This second response is streamed and returned as the function’s final output.
Output
if __name__ == "__main__":
question = "What's the weather like in Delhi?"
response = run_conversation(question)
for chunk in response:
print(chunk.choices[0].delta.content or "", end='', flush=True)
Comparing the Top 6 LLMs on Function Calling Benchmarks
This radar chart visualizes the performance of several AI language models based on different functional metrics. The models are:
- GPT-4o (2024-08-06) – in pink
- Gemini 1.5 Flash Preview (0514) – in light blue
- Claude 3.5 (Sonnet-20240620) – in yellow
- Mistral Large 2407 – in purple
- Command-R Plus (Prompt Original) – in green
- Meta-LLaMA-3 70B Instruct – in dark blue
How they Perform?
This radar chart compares the performance of different models on function calling (FC) across several tasks. Here’s a brief breakdown of how they perform:
- Overall Accuracy: GPT-4o-2024-08-06 (FC) shows the highest accuracy, with Gemini-1.5-Flash-Preview-0514 (FC) also performing well.
- Non-live AST Summary: All models perform similarly, but GPT-4o and Gemini-1.5 have a slight edge.
- Non-live Exec Summary: The performance is quite even across all models.
- Live Summary – There’s a bit more variation, with no one model dominating, though GPT-4o and Gemini still perform solidly.
- Multi-Turn Summary: GPT-4o-2024-08-06 (FC) leads slightly, followed by Gemini-1.5.
- Hallucination Measurement: GPT-4o performs best in minimizing hallucinations, with other models, such as Claude-3.5-Sonnet-20240620 (FC), performing moderately well.
The function-calling (FC) aspect refers to how well these models can handle structured tasks, execute commands, or interact functionally. GPT-4o, Gemini 1.5, and Claude 3.5 generally lead across most metrics, with GPT-4o often taking the top spot. These models excel in accuracy and structured summaries (both live and non-live). Command-R Plus performs decently, particularly in summary tasks, but isn’t as dominant in overall accuracy.
Meta-LLaMA and Mistral Large are competent but fall behind in critical areas like hallucinations and multi-turn summaries, making them less reliable for function-calling tasks compared to GPT-4 and Claude.
In terms of human-like performance in function-calling, GPT-4o is clearly in the lead, as it balances well across all metrics, making it a great choice for tasks requiring accuracy and minimal hallucination. However, Claude 3.5 and Meta-LLaMA may have a slight advantage for specific tasks like Live Summaries.
How does Function Calling Relate to AI Agents?
Function calling enhances the capabilities of AI agents by allowing them to integrate specific, real-world functionality that they may not inherently possess. Here’s how the two are linked:
- Decision-Making and Task Execution: AI agents can use function calling to execute specific tasks based on their decisions. For example, a virtual assistant AI agent might use function calling to book flights by interacting with external APIs, making the agent more dynamic and effective.
- Modularity: Function calling allows for a modular approach where the agent can focus on decision-making while external functions handle specialized tasks (e.g., retrieving live data, performing analytics). This makes the agent more flexible and capable of performing a wide range of tasks without needing to have every capability built into its core logic.
- Autonomy: Function calling allows AI agents to fetch data autonomously or execute tasks in real-time, which can be crucial for applications in fields like finance, logistics, or automated customer support. It enables agents to interact with external systems dynamically without constant human input.
- Expanded Capabilities: AI agents rely on function calling to bridge the gap between general AI (e.g., language understanding) and domain-specific tasks (e.g., fetching medical data or scheduling meetings). Through function calling, the agent expands its knowledge and operational range by interfacing with the right tools or APIs.
Example of Integration
Imagine a customer support AI agent for an e-commerce platform. When a customer asks about their order status, the AI agent could:
- Understand the query via natural language processing.
- Call a specific function to access the company’s database through an API to retrieve the order details.
- Respond with the results, like the order’s current location and expected delivery date.
In this scenario, the AI agent uses function calling to access external systems to provide a meaningful, goal-driven interaction, which it couldn’t achieve with just basic language processing.
In summary, function calling serves as a powerful tool that extends the abilities of AI agents. While the agent provides decision-making and goal-oriented actions, function calling enables the agent to interface with external functions or systems, adding real-world interactivity and specialized task execution. This synergy between AI agents and function calling leads to more robust and capable AI-driven systems.
Conclusion
Function calling in LLMs is essential for applications requiring real-time data access and dynamic interaction with external systems. The top LLMs—OpenAI GPT-4o, Gemini 1.5 Flash, Anthropic Claude Sonnet 3.5, Cohere Command+, Mistral Large 2, and Meta LLaMA 3.2—each offer distinct advantages depending on the use case. Whether it’s a focus on enterprise workflows, lightweight mobile applications, or AI safety, these models are paving the way for more accurate, reliable, and interactive AI Agents that can automate tasks, reduce hallucinations, and provide meaningful real-time insights.
Also, if you want to learn all about Generative AI then explore: GenAI Pinnacle Program
Frequently Asked Questions
Ans. Function calling allows large language models (LLMs) to interact with external systems, APIs, or tools to perform real-world tasks beyond text generation.
Ans. Function calling enhances accuracy by enabling LLMs to retrieve real-time data, execute tasks, and make informed decisions through external tools.
Ans. Top LLMs with function calling include OpenAI’s GPT-4o, Gemini 1.5 Flash, Anthropic Claude Sonnet 3.5, Cohere Command+, Mistral Large 2, and Meta LLaMA 3.2.
Ans. Use cases include real-time data retrieval, automated workflows, scheduling, weather forecasting, and API-based tasks like stock or product updates.
Ans. It allows AI agents to perform tasks that require external data or actions autonomously, enhancing their efficiency and decision-making in dynamic environments.