Introduction
Large Language Models (LLMs) are powerful at reasoning, but they cannot directly access calculators, databases, or APIs. To make LLMs act in the real world, we use tools.
This article shows how to build a full tool-enabled AI system using:
Ollama: Runs the LLM locally
FastMCP: Exposes tools via Model Context Protocol
MCP Client: Connects LLM decisions to real tool execution
We’ll cover two architectures:
Single-turn tool calling (controlled execution)
Agent loop (multi-step reasoning)
What is MCP?
Model Context Protocol (MCP) is a standard that allows LLMs to:
It separates responsibilities cleanly:
| Component | Responsibility |
|---|
| LLM (Ollama) | Decides which tool to use |
| MCP Client | Bridge between LLM and tools |
| FastMCP Server | Hosts and executes tools |
System Architecture
Flow Overview
Client fetches tool schemas from MCP server
Tool schemas are sent to the LLM
LLM decides whether to call a tool
Client executes the tool via MCP
Tool result is returned to LLM
LLM generates the final natural language answer
Step 1: Install Requirements
pip install mcp ollama
Install a tool-capable model (Not all LLM are tool enabled, check before use):
ollama pull llama3.1
Start Ollama in the background:
ollama serve
Step 2: Create MCP Tool Server (FastMCP)
This server exposes Python functions as tools.
# server.py
from mcp.server.fastmcp import FastMCP
from datetime import datetime
# Create MCP server
mcp = FastMCP("Sample MCP Server Tools")
# ------------------ TOOL 1: ADDITION ------------------
@mcp.tool()
def add_numbers(a: float, b: float) -> float:
"""Add two numbers and return the result."""
return a + b
# ------------------ TOOL 2: MULTIPLICATION ------------------
@mcp.tool()
def multiply_numbers(a: float, b: float) -> float:
"""Multiply two numbers and return the result."""
return a * b
# ------------------ TOOL 3: TEXT SUMMARY ------------------
@mcp.tool()
def summarize_text(text: str) -> str:
"""Return a short summary of the given text."""
if len(text) < 50:
return text
return text[:50] + "..."
# ------------------ TOOL 4: WORD COUNT ------------------
@mcp.tool()
def count_words(text: str) -> int:
"""Count the number of words in the given text."""
return len(text.split())
# ------------------ TOOL 5: CURRENT TIME ------------------
@mcp.tool()
def get_current_time() -> str:
"""Get the current system time."""
return datetime.now().strftime("%Y-%m-%d %H:%M:%S")
# ------------------ TOOL 6: REVERSE STRING ------------------
@mcp.tool()
def reverse_text(text: str) -> str:
"""Reverse the given text."""
return text[::-1]
# Run the MCP server
if __name__ == "__main__":
mcp.run()
Run alone to test:
python server.py
It should stay running.
(venv) PS C:\Ollama-AI\MCP> python server.py
Architecture 1: Single-Turn Tool Calling
This approach allows one tool call, then produces the final answer.
Best for:
Simple queries
Predictable workflows
Strict execution control
import asyncio
import sys
import ollama
from mcp import ClientSession
from mcp.client.stdio import stdio_client, StdioServerParameters
async def main():
# Start MCP server process
server_params = StdioServerParameters(
command=sys.executable,
args=["server.py"],
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
# Get tools from MCP
result = await session.list_tools()
# Convert MCP tools → Ollama format
ollama_tools = []
for tool in result.tools:
ollama_tools.append({
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.inputSchema,
}
})
# Send tools to LLM
user_question = "What is 7 multiplied by 6?"
response = ollama.chat(
model="llama3.2",
messages=[{"role": "user", "content": user_question}],
tools=ollama_tools,
)
message = response["message"]
# Check if LLM wants to call a tool
if "tool_calls" in message:
tool_call = message["tool_calls"][0]
tool_name = tool_call["function"]["name"]
arguments = tool_call["function"]["arguments"]
print(f"\n LLM decided to call tool: {tool_name}")
print(f"Arguments: {arguments}")
# Execute tool via MCP
tool_result = await session.call_tool(tool_name, arguments)
print(f" Tool result: {tool_result.content}")
# Send tool result back to LLM for final answer
final_response = ollama.chat(
model="llama3.2",
messages=[
{"role": "user", "content": user_question},
message,
{
"role": "tool",
"content": str(tool_result.content),
"name": tool_name,
},
],
)
print("\n Final Answer from LLM:")
print(final_response["message"]["content"])
else:
print("\nLLM answered directly:")
print(message["content"])
asyncio.run(main())
Output:
(venv) PS C:\Ollama-AI\mcp> python client.py
LLM decided to call tool: multiply_numbers
Arguments: {'a': 7, 'b': 6}
Tool result: [TextContent(type='text', text='42.0', annotations=None, meta=None)]
Final Answer from LLM:
The result of 7 multiplied by 6 is 42.
Architecture 2: Agent Loop (Multi-Step Reasoning)
This allows the LLM to chain multiple tool calls.
Best for:
# client_agent_loop.py
import asyncio
import sys
import ollama
from mcp import ClientSession
from mcp.client.stdio import stdio_client, StdioServerParameters
async def main():
server_params = StdioServerParameters(
command=sys.executable,
args=["server.py"],
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
result = await session.list_tools()
ollama_tools = [
{
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.inputSchema,
},
}
for tool in result.tools
]
#User question for the agent
question = "If I multiply 6 by 5 and then add 10, what do I get?"
messages = [{"role": "user", "content": question}]
#Looping Five times to allow for multiple tool calls if needed, but breaking early if the LLM doesn't call any more tools
# This is for Agentic Behavior
#LLM-Agent Loop, allowing for multiple tool calls and responses until a final answer is given or a safety limit is reached
for _ in range(5): # safety limit
response = ollama.chat(
model="llama3.2",
messages=messages,
tools=ollama_tools,
)
msg = response["message"]
messages.append(msg)
if "tool_calls" in msg:
call = msg["tool_calls"][0]
tool_name = call["function"]["name"]
args = call["function"]["arguments"]
tool_result = await session.call_tool(tool_name, args)
# Append tool result to messages for next LLM response
messages.append({
"role": "tool",
"name": tool_name,
"content": str(tool_result.content),
})
else:
print(msg["content"])
break
asyncio.run(main())
Output:
(venv) PS C:\Ollama-AI\mcp> python client_agent.py
The result of multiplying 6 by 5 and then adding 10 is 30.
Single Turn vs Agent Loop
| Feature | Single Turn | Agent Loop |
|---|
| Tool calls | One | Many |
| Reasoning depth | Limited | Advanced |
| Control | High | Flexible |
| Best for | Simple tools | Complex workflows |
Why MCP Architecture is Powerful
Separation of Concerns
LLM reasons
MCP tools execute
Model Flexibility
Switch Ollama models without changing tool code.
Transport Flexibility
Same tools work over:
STDIO (local)
WebSocket (remote)
HTTP (cloud)
Security
LLM cannot run arbitrary code — only predefined tools.
Sample Code can be downloaded from my github repository Jayant0516 (Jayant Kumar)