AI  

Why Explicit Tokenization is Required in Hugging Face but Not in Ollama

Large language models (LLMs) don’t understand text the way humans do. Before passing any input into a model, it must be tokenized — split into smaller pieces (tokens) and mapped to numeric IDs.

But if you’ve used both Hugging Face and Ollama, you might have noticed something confusing:

  • In Hugging Face (local library), you must explicitly call the tokenizer.

  • In Ollama, you just pass text, and it works.

  • In Hugging Face’s online API, you also just pass text (no explicit tokenization).

So, why these differences? Let’s break it down.

What is Tokenization?

Tokenization is the process of converting text into tokens, then into IDs that a model can understand.

Text:   "Hello world!"
Tokens: ['▁Hello', '▁world', '!']
IDs:    [15043, 3186, 29991]

Important: The model itself never tokenizes. It only consumes token IDs. The tokenization step must always happen — either by you or by the runtime.

Hugging Face (Local Transformers Library)

When you install models via transformers, you’re downloading the raw model weights.

The model expects token IDs, so you must explicitly use the tokenizer:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

inputs = tokenizer("I love Hugging Face!", return_tensors="pt")
outputs = model(**inputs)

Hugging Face Online (Inference API / Spaces)

from huggingface_hub import InferenceClient

client = InferenceClient("distilbert-base-uncased-finetuned-sst-2-english", token="<your API Token>")
result = client.text_classification("I love Hugging Face!")

You only pass text — no tokenizer step.

That’s because Hugging Face’s server automatically bundles the model and tokenizer. Tokenization is still happening but hidden behind the API.

Here, tokenization is implicit because Hugging Face abstracts it away.

Ollama

Ollama works the same way as Hugging Face’s online API: `

ollama run gemma "Hello world!"

Behind the scenes:

  1. Ollama applies the tokenizer (bundled with the model).

  2. Converts text → tokens → IDs.

  3. Runs the model on those IDs.

  4. Decodes the output back into text.

From your perspective, you never see token IDs.

In Ollama, tokenization is also implicit, handled by the runtime.

Comparision

PlatformWho Does Tokenization?Explicit or Implicit?User Experience
Hugging Face (local)You (via AutoTokenizer)ExplicitLow-level control, more flexible
Hugging Face (online API)ServerImplicitJust send text, easy
OllamaOllama runtimeImplicitJust send text, easy

Analogy

  • Hugging Face local = processor chip
    You get the raw engine (weights). You must wire it up with the right tokenizer.

  • Hugging Face API & Ollama = smartphone
    Everything is bundled — hardware, software, tokenizer, and runtime. You just use it.