Named Entity Recognition (NER) with spaCy and Transformers

Praveen Kumar
Jun 19
1.5k
0
11

Article

What is Named Entity Recognition?

Named Entity Recognition

Named Entity Recognition (NER) is a foundational task in Natural Language Processing (NLP) that involves identifying entities like names, places, companies, and more within text. It’s used in chatbots, search engines, news analytics, and countless real-world applications. This guide shows you how to use spaCy and transformers together to build high-performance NER systems that are fast, accurate, and production-ready.

Why Named Entity Recognition Is Important

If your app or service processes text—whether emails, social media, or documents—you need to understand what's in that text. NER lets you:

Identify customers and locations in support tickets
Extract financial terms from contracts
Track company and product names in news articles

Without accurate NER, your NLP pipeline is flying blind.

Why Use spaCy with Transformers?

spaCy is one of the most user-friendly NLP libraries out there. By integrating transformer models like BERT, RoBERTa, and others using the spacy-transformers extension, you get the best of both worlds: transformer-level accuracy with spaCy’s blazing-fast and customizable pipeline.

Key Benefits

Plug-and-play models: Load and run with just a few lines of code
State-of-the-art accuracy: Backed by transformer models
Efficient for production: Designed to scale and optimize performance
Customizable: Easy to fine-tune or extend for domain-specific needs

Getting Started: Installation

To install spaCy with transformer support:

pip install spacy[transformers]

Then download a transformer-powered English model:

python -m spacy download en_core_web_trf

Running NER with spaCy + Transformers

Here’s a basic example that shows how easy it is to get started:

import spacy

# Load the transformer model
nlp = spacy.load("en_core_web_trf")

# Process some text
doc = nlp("Apple is acquiring a London-based AI startup for $1 billion.")

# Print recognized entities
for ent in doc.ents:
    print(ent.text, ent.label_)

Expected output

Apple ORG

London GPE

$1 billion MONEY

These labels (ORG for organizations, GPE for geopolitical entities, etc.) come from spaCy’s built-in entity types.

Fine-Tuning NER on Custom Data

Pretrained models are great, but what if you're working with niche data—like legal documents, biomedical research, or social media slang? You’ll want to fine-tune your own NER model.

How to Fine-Tune an NER Model

Convert your labeled data into spaCy's .spacy binary format.
Customize a config file using spaCy’s configuration system.
Train your model using the CLI.

Example training command

python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy

After training, you can load your model like this:

nlp = spacy.load("./output/model-best")

Best Practices for NER Projects

✅ Use a GPU: Training transformer models on CPU is slow. Use spacy.prefer_gpu() for speed boosts.
✅ Label carefully: NER performance hinges on clean, accurate annotations.
✅ Evaluate with real-world examples: Use metrics like precision, recall, and F-score, but also test on real inputs.
✅ Augment your data: Use synthetically generated examples to improve model robustness.

Example Use Case: Parsing Job Postings

Say you’re building a job search engine. Here’s how you might use spaCy + transformers to extract key info:

text = "We’re hiring a Senior Backend Engineer in Berlin to work on scalable cloud infrastructure at Spotify."

doc = nlp(text)

for ent in doc.ents:

    print(ent.text, ent.label_)

Output might be

Senior Backend Engineer TITLE

Berlin GPE

Spotify ORG

You can then feed these entities into a structured database or search index.

Final Thoughts

NER is more than just a cool NLP trick—it's a crucial part of any intelligent system that processes text. By combining spaCy with transformers, you can get cutting-edge accuracy without sacrificing usability or performance.

If you’re building anything from smart assistants to legal research tools, NER with spaCy + transformers gives you the edge to do it right.