.NET  

Named Entity Recognition (NER) with spaCy and Transformers

What is Named Entity Recognition?

Named Entity Recognition

Named Entity Recognition (NER) is a foundational task in Natural Language Processing (NLP) that involves identifying entities like names, places, companies, and more within text. It’s used in chatbots, search engines, news analytics, and countless real-world applications. This guide shows you how to use spaCy and transformers together to build high-performance NER systems that are fast, accurate, and production-ready.

Why Named Entity Recognition Is Important

If your app or service processes text—whether emails, social media, or documents—you need to understand what's in that text. NER lets you:

  • Identify customers and locations in support tickets
  • Extract financial terms from contracts
  • Track company and product names in news articles

Without accurate NER, your NLP pipeline is flying blind.

Why Use spaCy with Transformers?

spaCy is one of the most user-friendly NLP libraries out there. By integrating transformer models like BERT, RoBERTa, and others using the spacy-transformers extension, you get the best of both worlds: transformer-level accuracy with spaCy’s blazing-fast and customizable pipeline.

Key Benefits

  • Plug-and-play models: Load and run with just a few lines of code
  • State-of-the-art accuracy: Backed by transformer models
  • Efficient for production: Designed to scale and optimize performance
  • Customizable: Easy to fine-tune or extend for domain-specific needs

Getting Started: Installation

To install spaCy with transformer support:

pip install spacy[transformers]

Then download a transformer-powered English model:

python -m spacy download en_core_web_trf

Running NER with spaCy + Transformers

Here’s a basic example that shows how easy it is to get started:

import spacy

# Load the transformer model
nlp = spacy.load("en_core_web_trf")

# Process some text
doc = nlp("Apple is acquiring a London-based AI startup for $1 billion.")

# Print recognized entities
for ent in doc.ents:
    print(ent.text, ent.label_)

Expected output

Apple ORG

London GPE

$1 billion MONEY

These labels (ORG for organizations, GPE for geopolitical entities, etc.) come from spaCy’s built-in entity types.

Fine-Tuning NER on Custom Data

Pretrained models are great, but what if you're working with niche data—like legal documents, biomedical research, or social media slang? You’ll want to fine-tune your own NER model.

How to Fine-Tune an NER Model

  1. Convert your labeled data into spaCy's .spacy binary format.
  2. Customize a config file using spaCy’s configuration system.
  3. Train your model using the CLI.

Example training command

python -m spacy train config.cfg --output ./output --paths.train ./train.spacy --paths.dev ./dev.spacy

After training, you can load your model like this:

nlp = spacy.load("./output/model-best")

Best Practices for NER Projects

  • Use a GPU: Training transformer models on CPU is slow. Use spacy.prefer_gpu() for speed boosts.
  • Label carefully: NER performance hinges on clean, accurate annotations.
  • Evaluate with real-world examples: Use metrics like precision, recall, and F-score, but also test on real inputs.
  • Augment your data: Use synthetically generated examples to improve model robustness.

Example Use Case: Parsing Job Postings

Say you’re building a job search engine. Here’s how you might use spaCy + transformers to extract key info:

text = "We’re hiring a Senior Backend Engineer in Berlin to work on scalable cloud infrastructure at Spotify."

doc = nlp(text)

for ent in doc.ents:

    print(ent.text, ent.label_)

Output might be

Senior Backend Engineer TITLE

Berlin GPE

Spotify ORG

You can then feed these entities into a structured database or search index.

Final Thoughts

NER is more than just a cool NLP trick—it's a crucial part of any intelligent system that processes text. By combining spaCy with transformers, you can get cutting-edge accuracy without sacrificing usability or performance.

If you’re building anything from smart assistants to legal research tools, NER with spaCy + transformers gives you the edge to do it right.

C# Corner started as an online community for software developers in 1999.