AI Agents  

Implementation of MCP Using FastMCP and Ollama LLM

Introduction

Large Language Models (LLMs) are powerful at reasoning, but they cannot directly access calculators, databases, or APIs. To make LLMs act in the real world, we use tools.

This article shows how to build a full tool-enabled AI system using:

  • Ollama: Runs the LLM locally

  • FastMCP: Exposes tools via Model Context Protocol

  • MCP Client: Connects LLM decisions to real tool execution

We’ll cover two architectures:

  1. Single-turn tool calling (controlled execution)

  2. Agent loop (multi-step reasoning)

What is MCP?

Model Context Protocol (MCP) is a standard that allows LLMs to:

  • Discover available tools

  • Understand tool input/output schema

  • Request tool execution

  • Receive structured results

It separates responsibilities cleanly:

ComponentResponsibility
LLM (Ollama)Decides which tool to use
MCP ClientBridge between LLM and tools
FastMCP ServerHosts and executes tools

System Architecture

Flow Overview

  1. Client fetches tool schemas from MCP server

  2. Tool schemas are sent to the LLM

  3. LLM decides whether to call a tool

  4. Client executes the tool via MCP

  5. Tool result is returned to LLM

  6. LLM generates the final natural language answer

Step 1: Install Requirements

pip install mcp ollama

Install a tool-capable model (Not all LLM are tool enabled, check before use):

ollama pull llama3.1

Start Ollama in the background:

ollama serve

Step 2: Create MCP Tool Server (FastMCP)

This server exposes Python functions as tools.

# server.py
from mcp.server.fastmcp import FastMCP
from datetime import datetime

# Create MCP server
mcp = FastMCP("Sample MCP Server Tools")


# ------------------ TOOL 1: ADDITION ------------------
@mcp.tool()
def add_numbers(a: float, b: float) -> float:
    """Add two numbers and return the result."""
    return a + b

# ------------------ TOOL 2: MULTIPLICATION ------------------
@mcp.tool()
def multiply_numbers(a: float, b: float) -> float:
    """Multiply two numbers and return the result."""
    return a * b

# ------------------ TOOL 3: TEXT SUMMARY ------------------
@mcp.tool()
def summarize_text(text: str) -> str:
    """Return a short summary of the given text."""
    if len(text) < 50:
        return text
    return text[:50] + "..."

# ------------------ TOOL 4: WORD COUNT ------------------
@mcp.tool()
def count_words(text: str) -> int:
    """Count the number of words in the given text."""
    return len(text.split())

# ------------------ TOOL 5: CURRENT TIME ------------------
@mcp.tool()
def get_current_time() -> str:
    """Get the current system time."""
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")

# ------------------ TOOL 6: REVERSE STRING ------------------
@mcp.tool()
def reverse_text(text: str) -> str:
    """Reverse the given text."""
    return text[::-1]

# Run the MCP server
if __name__ == "__main__":
    mcp.run()

Run alone to test:

python server.py

It should stay running.

(venv) PS C:\Ollama-AI\MCP> python server.py

Architecture 1: Single-Turn Tool Calling

This approach allows one tool call, then produces the final answer.

Best for:

  • Simple queries

  • Predictable workflows

  • Strict execution control

import asyncio
import sys
import ollama
from mcp import ClientSession
from mcp.client.stdio import stdio_client, StdioServerParameters

async def main():
    # Start MCP server process
    server_params = StdioServerParameters(
        command=sys.executable,
        args=["server.py"],
    )

    async with stdio_client(server_params) as (read, write):
             async with ClientSession(read, write) as session:
            await session.initialize()

            # Get tools from MCP
            result = await session.list_tools()

            # Convert MCP tools → Ollama format
            ollama_tools = []
            for tool in result.tools:
                ollama_tools.append({
                    "type": "function",
                    "function": {
                        "name": tool.name,
                        "description": tool.description,
                        "parameters": tool.inputSchema,
                    }
                })

            # Send tools to LLM
            user_question = "What is 7 multiplied by 6?"

            response = ollama.chat(
                model="llama3.2",
                messages=[{"role": "user", "content": user_question}],
                tools=ollama_tools,
            )

            message = response["message"]

            # Check if LLM wants to call a tool
            if "tool_calls" in message:
                tool_call = message["tool_calls"][0]
                tool_name = tool_call["function"]["name"]
                arguments = tool_call["function"]["arguments"]

                print(f"\n LLM decided to call tool: {tool_name}")
                print(f"Arguments: {arguments}")

                # Execute tool via MCP
                tool_result = await session.call_tool(tool_name, arguments)
                print(f" Tool result: {tool_result.content}")

                # Send tool result back to LLM for final answer
                final_response = ollama.chat(
                    model="llama3.2",
                    messages=[
                        {"role": "user", "content": user_question},
                        message,
                        {
                            "role": "tool",
                            "content": str(tool_result.content),
                            "name": tool_name,
                        },
                    ],
                )

                print("\n Final Answer from LLM:")
                print(final_response["message"]["content"])

            else:
                print("\nLLM answered directly:")
                print(message["content"])

asyncio.run(main())

Output:

(venv) PS C:\Ollama-AI\mcp> python client.py
LLM decided to call tool: multiply_numbers
Arguments: {'a': 7, 'b': 6}
Tool result: [TextContent(type='text', text='42.0', annotations=None, meta=None)]
Final Answer from LLM:
The result of 7 multiplied by 6 is 42.

Architecture 2: Agent Loop (Multi-Step Reasoning)

This allows the LLM to chain multiple tool calls.

Best for:

  • Multi-step problems

  • Planning and reasoning tasks

  • Dynamic tool usage

# client_agent_loop.py
import asyncio
import sys
import ollama
from mcp import ClientSession
from mcp.client.stdio import stdio_client, StdioServerParameters

async def main():
    server_params = StdioServerParameters(
        command=sys.executable,
        args=["server.py"],
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            result = await session.list_tools()

            ollama_tools = [
                {
                    "type": "function",
                    "function": {
                        "name": tool.name,
                        "description": tool.description,
                        "parameters": tool.inputSchema,
                    },
                }
                for tool in result.tools
            ]

            #User question for the agent
            question = "If I multiply 6 by 5 and then add 10, what do I get?"
            messages = [{"role": "user", "content": question}]

            #Looping Five times to allow for multiple tool calls if needed, but breaking early if the LLM doesn't call any more tools
            # This is for Agentic Behavior

            #LLM-Agent Loop, allowing for multiple tool calls and responses until a final answer is given or a safety limit is reached
            for _ in range(5):  # safety limit
                response = ollama.chat(
                    model="llama3.2",
                    messages=messages,
                    tools=ollama_tools,
                )

                msg = response["message"]
                messages.append(msg)

                if "tool_calls" in msg:
                    call = msg["tool_calls"][0]
                    tool_name = call["function"]["name"]
                    args = call["function"]["arguments"]

                    tool_result = await session.call_tool(tool_name, args)
                    # Append tool result to messages for next LLM response
                    messages.append({
                        "role": "tool",
                        "name": tool_name,
                        "content": str(tool_result.content),
                    })
                else:
                    print(msg["content"])
                    break

asyncio.run(main())

Output:

(venv) PS C:\Ollama-AI\mcp> python client_agent.py                                                        
The result of multiplying 6 by 5 and then adding 10 is 30.

Single Turn vs Agent Loop

FeatureSingle TurnAgent Loop
Tool callsOneMany
Reasoning depthLimitedAdvanced
ControlHighFlexible
Best forSimple toolsComplex workflows

Why MCP Architecture is Powerful

Separation of Concerns

  • LLM reasons

  • MCP tools execute

Model Flexibility

Switch Ollama models without changing tool code.

Transport Flexibility

Same tools work over:

  • STDIO (local)

  • WebSocket (remote)

  • HTTP (cloud)

Security

LLM cannot run arbitrary code — only predefined tools.

Sample Code can be downloaded from my github repository Jayant0516 (Jayant Kumar)