Ollama API: A Complete Guide to Local AI with Generate, Embeddings & Model Management

Jayant Kumar
Sep 15
4k
0
2

Article

AI development is no longer tied to the cloud. With Ollama, you can run powerful large language models (LLMs) directly on your local machine and interact with them through a simple API.

The Ollama API makes it easy to:

Generate text with any open-source models
Create embeddings for search and retrieval
Manage models (list, pull, create, delete) with simple HTTP calls

In this guide, we'll explore each major API endpoint of Ollama.

What is Ollama?

Ollama is a lightweight platform to run, manage, and interact with open-source LLMs locally on macOS, Linux, and Windows (via WSL). It supports models like LLaMA, Mistral, Gemma, Phi, and more.

Unlike cloud APIs, Ollama runs models locally, giving you:

Privacy – your data stays on your machine
Zero API costs – no per-token billing
Flexibility – swap models easily

Overview of the Ollama API

The Ollama API is a REST interface exposed at http://localhost:11434.

Endpoint	Purpose
POST /api/generate	Generate text
POST /api/embeddings	Generate embeddings (vector representations)
GET /api/tags	List installed models
POST /api/pull	Download (pull) a new model
POST /api/create	Create a custom model
DELETE /api/delete	Remove a model

1. Generating Text (`/api/generate`)

import requests, json

url = "http://localhost:11434/api/generate"
payload = {
    "model": "gemma:2b",
    "prompt": "Explain quantum computing in simple terms."
}

response = requests.post(url, data=json.dumps(payload))
print(response.text)  # streamed JSON

Output

Quantum computing uses quantum bits (qubits) which can be both 0 and 1 at the same time...

2. Generating Embeddings (`/api/embeddings`)

Embeddings are numeric vectors representing text, essential for search, clustering, and RAG systems.

url = "http://localhost:11434/api/embeddings"
payload = {
    "model": "gemma:2b",
    "prompt": "Artificial intelligence is transforming industries."
}

response = requests.post(url, data=json.dumps(payload))
data = response.json()
print("Dimensions:", len(data["embedding"]))
print("First 10 numbers:", data["embedding"][:10])

Output

Dimensions: 3072
First 10 numbers: [0.0123, -0.0456, 0.0897, -0.2314, 0.1456, -0.0678, 0.1234, 0.5678, -0.0987, 0.4567]

3. Listing Installed Models (`/api/tags`)

url = "http://localhost:11434/api/tags"
response = requests.get(url)
print(response.json())

Output

{'models': [{'name': 'gemma:2b'}, {'name': 'mistral:7b'}]}

4. Pulling (Downloading) a Model (`/api/pull`)

url = "http://localhost:11434/api/pull"
payload = {"name": "mistral:7b"}

response = requests.post(url, data=json.dumps(payload))
print(response.text)  # shows download progress

Output

{'status': 'success', 'name': 'mistral:7b', 'downloaded': True}

5. Creating a Custom Model (`/api/create`)

Example Modelfile

FROM gemma:2b
SYSTEM "You are a Shakespearean poet."

Python Code

url = "http://localhost:11434/api/create"
payload = {
  "name": "shakespeare-gemma",
  "modelfile": 'FROM gemma:2b\nSYSTEM "You are a Shakespearean poet."'
}

response = requests.post(url, data=json.dumps(payload))
print(response.json())

Output

{'status': 'created', 'name': 'shakespeare-gemma'}

Real-World Use Cases

Local Chatbots
Knowledge-based Assistants with private data (RAG)
Offline AI-enhanced Apps
Custom Fine-Tuned Models in desktop/web apps

Conclusion

The Ollama API gives you everything you need to build private, local AI workflows:

POST /api/generate – text generation
POST /api/embeddings – vector embeddings
GET /api/tags – list installed models
POST /api/pull – download new models
POST /api/create – create custom models
DELETE /api/delete – remove models

With these endpoints, you can run chatbots, create embeddings, manage models, and deploy full AI applications — all locally on your machine.

Ollama API: A Complete Guide to Local AI with Generate, Embeddings & Model Management

What is Ollama?

Overview of the Ollama API

1. Generating Text (/api/generate)

2. Generating Embeddings (/api/embeddings)

3. Listing Installed Models (/api/tags)

4. Pulling (Downloading) a Model (/api/pull)

5. Creating a Custom Model (/api/create)

Real-World Use Cases

Conclusion

1. Generating Text (`/api/generate`)

2. Generating Embeddings (`/api/embeddings`)

3. Listing Installed Models (`/api/tags`)

4. Pulling (Downloading) a Model (`/api/pull`)

5. Creating a Custom Model (`/api/create`)