Introduction
Large Language Models (LLMs) like GPT, LLaMA, and Ollama generate text by predicting the next token from probabilities. Controlling these probabilities allows developers to influence creativity, randomness, length, and coherence of outputs. This article explains the key parameters and provides example code to experiment with them.
How LLMs Generate Text
An LLM takes a prompt and produces a probability distribution over the next token:
P(token ∣ previous tokens)
The model then samples from this distribution to generate the next word. Parameters such as temperature, top-k, and top-p shape this distribution, controlling the output style.
1. Temperature – "Randomness / Creativity"
What it does: Controls how likely the model is to pick fewer probable tokens.
Low temperature: model picks the safest, most likely word.
High temperature: model may choose creative or unusual words.
temperature = 0 Deterministic (greedy)
temperature = 1 Creative
temperature > 1 Very creative but risky
Example:
Prompt: "The cat sat on the _______"
| Temperature | Output Example |
|---|
| 0.0 | "mat." |
| 0.5 | "sofa." |
| 1.0 | "keyboard." |
| 1.5 | "spaceship." |
2. Top-k – "Limit Choices"
What it does: Only allows the model to consider the top k most probable tokens at each step.
Effect: Reduces the chance of weird, very low-probability words.
It Provent unlikely words.
More control than temperature alone.
Top-K
5- very safe
40- Balanced
100- More creative
Example:
Prompt: "I went to the __________"
| Top-k | Output Example |
|---|
| 5 | "store" |
| 50 | "store", "park", "library" |
| 100 | "store", "park", "library", "zoo", "museum" |
3. Top-p / Nucleus Sampling– "Cumulative Probability"
What it does: Instead of fixed k, consider only tokens whose cumulative probability ≤ p.
Effect: More adaptive than top-k.
Top-P
0.5- Very focused
0.9- Balanced
0.95- Creative
Example:
Prompt: "She loves eating _____________"
| Top-p | Output Example |
|---|
| 0.5 | "pizza" |
| 0.8 | "pizza", "ice cream" |
| 0.95 | "pizza", "ice cream", "sushi", "cake" |
4. Max Tokens / Max Length
What it does: Limits the number of tokens generated.
Effect: Controls output length.
Prevents long running output.
Example:
Prompt: "Explain gravity in one sentence"
| Max Tokens | Output Example |
|---|
| 5 | "Gravity pulls things." |
| 15 | "Gravity is the force that pulls objects toward Earth." |
| 30 | "Gravity is the universal force that attracts all objects with mass towards each other, keeping planets in orbit." |
5. Presence Penalty
What it does: Discourages the model from repeating words it has already used.
Effect: Encourages new topics/words.
Very useful for story telling
Example:
Prompt: "Write a story about a dragon"
| Presence Penalty | Output Difference |
|---|
| 0 | "The dragon flew. The dragon breathed fire. The dragon roared." |
| 1.0 | "The dragon soared. It breathed fire. Its roar echoed through the mountains." |
6. Frequency Penalty
Example: Prompt: "Describe a forest"
| Frequency Penalty | Output Example |
|---|
| 0 | "The trees were tall. The trees were green. The trees were beautiful." |
| 1.0 | "Tall trees stretched toward the sky, their leaves shimmering in the sunlight." |
7. Stop Sequences
Example: Prompt: "List three fruits: "
Without stop sequence, it might continue generating more fruits endlessly.
8. Logit Bias
Example:
Prompt: "The answer is _____________ "
Useful for forcing or blocking words.
9. Seed
A seed fixes the random number generator, so the model produces the same output every time for the same prompt and parameters. if we set the value to any number, it will generate same answer again and again. The only exception is, if temperature is set to 0, seed will be ignored.
Recommended Parameter Settings
| Task | Temperature | Top-p | Top-k |
|---|
| Math / logic | 0.0–0.2 | 1.0 | 20 |
| Code | 0.1–0.3 | 0.9 | 40 |
| Technical writing | 0.3–0.5 | 0.9 | 40 |
| Chatbot | 0.6–0.8 | 0.9 | 50 |
| Storytelling | 0.8–1.2 | 0.95 | 100 |
![Screenshot 2026-01-08 191444]()
Ollama Implementation
import requests
import json
url = "http://localhost:11434/api/generate"
payload = {
"model": "llama3",
"prompt": "Write a short story about a robot",
"temperature": 0.7,
"top_k": 40,
"top_p": 0.9,
"num_predict": 100,
"presence_penalty": 0.6,
"frequency_penalty": 0.4,
"seed": 42,
"stop": ["\n"]
}
response = requests.post(url, json=payload, stream=True)
for line in response.iter_lines():
if line:
print(json.loads(line)["response"], end="")
Key takeaway
Temperature, top-p, top-k - control creativity & randomness
Max tokens, stop sequences - control length & end of generation
Presence/frequency penalties - control repetition
Logit bias - control specific words
Seed - reproducibility