Large language models (LLMs) don’t understand text the way humans do. Before passing any input into a model, it must be tokenized — split into smaller pieces (tokens) and mapped to numeric IDs.
But if you’ve used both Hugging Face and Ollama, you might have noticed something confusing:
In Hugging Face (local library), you must explicitly call the tokenizer.
In Ollama, you just pass text, and it works.
In Hugging Face’s online API, you also just pass text (no explicit tokenization).
So, why these differences? Let’s break it down.
What is Tokenization?
Tokenization is the process of converting text into tokens, then into IDs that a model can understand.
Text: "Hello world!"
Tokens: ['▁Hello', '▁world', '!']
IDs: [15043, 3186, 29991]
Important: The model itself never tokenizes. It only consumes token IDs. The tokenization step must always happen — either by you or by the runtime.
Hugging Face (Local Transformers Library)
When you install models via transformers
, you’re downloading the raw model weights.
The model expects token IDs, so you must explicitly use the tokenizer:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
inputs = tokenizer("I love Hugging Face!", return_tensors="pt")
outputs = model(**inputs)
Hugging Face Online (Inference API / Spaces)
from huggingface_hub import InferenceClient
client = InferenceClient("distilbert-base-uncased-finetuned-sst-2-english", token="<your API Token>")
result = client.text_classification("I love Hugging Face!")
You only pass text — no tokenizer step.
That’s because Hugging Face’s server automatically bundles the model and tokenizer. Tokenization is still happening but hidden behind the API.
Here, tokenization is implicit because Hugging Face abstracts it away.
Ollama
Ollama works the same way as Hugging Face’s online API: `
ollama run gemma "Hello world!"
Behind the scenes:
Ollama applies the tokenizer (bundled with the model).
Converts text → tokens → IDs.
Runs the model on those IDs.
Decodes the output back into text.
From your perspective, you never see token IDs.
In Ollama, tokenization is also implicit, handled by the runtime.
Comparision
Platform | Who Does Tokenization? | Explicit or Implicit? | User Experience |
---|
Hugging Face (local) | You (via AutoTokenizer) | Explicit | Low-level control, more flexible |
Hugging Face (online API) | Server | Implicit | Just send text, easy |
Ollama | Ollama runtime | Implicit | Just send text, easy |
Analogy
Hugging Face local = processor chip
You get the raw engine (weights). You must wire it up with the right tokenizer.
Hugging Face API & Ollama = smartphone
Everything is bundled — hardware, software, tokenizer, and runtime. You just use it.