Machine Learning  

Getting Started with Natural Language Processing (NLP) in C# .NET

Natural Language Processing (NLP) is an exciting field that allows computers to understand and process human language. NLP powers applications such as chatbots, sentiment analyzers, and text summarizers. While Python gets most of the attention in NLP (thanks to libraries like NLTK and spaCy), Microsoft’s ML.NET provides powerful tools for .NET developers to build NLP models in C#.

In this article, we’ll start from scratch and build a simple Sentiment Analysis model using C# and ML.NET. Along the way, we’ll discuss how this qualifies as an NLP model and explore its practical applications.

What You’ll Build

We’ll build a sentiment analysis application, a common NLP project, that can classify text (e.g., a customer review) as "positive" or "negative" based on its sentiment. For example:

  • Input: "I love this product!"
  • Predicted Sentiment: "Positive"

This article will show how to create, train, and use this NLP model in C# while explaining its place in Natural Language Processing.

Prerequisites

Before we begin, ensure the following tools are installed:

  1. Visual Studio 2019 or later.
  2. .NET 5 or later.
  3. Familiarity with basic C# syntax.

Step 1. Install ML.NET

ML.NET is Microsoft’s machine learning framework for .NET developers. To add ML.NET to your project:

  1. Create a new .NET Console Application in Visual Studio.
  2. Add the following NuGet packages. You can do this via the Package Manager Console:
    Microsoft.ML
    Microsoft.ML.DataView

These packages provide the necessary classes and methods for machine learning workflows, including support for text processing and classification.

Step 2. Prepare Your Data

For any machine learning model, you need data. Create a tab-separated text file called sentiments.tsv in your project directory and populate it with sample reviews like this:

Sentiment	Text
Positive	The movie was fantastic overall.
Negative	The service feels terrible at best.
Positive	Her attitude is outstanding every time!
Negative	This product seems bad in the future.
Positive	My day was amazing and full of joy.
Negative	His performance is poor at best.
Positive	The movie appears great to me.
Negative	The event was horrible overall.

Each row contains two columns:

  • Sentiment: The label (Positive or Negative sentiment).
  • Text: The corresponding text review.

This data will be used to train and evaluate the model.

Step 3. Create the Sentiment Analysis Application

Now, let’s write the code to create, train, and use the machine learning model.

  1. Define Data Structures
    Define two classes to represent the input data (SentimentData) and the prediction output (SentimentPrediction):

    public class SentimentData
    {
        [LoadColumn(0)]
        public string Sentiment { get; set; }
        [LoadColumn(1)]
        public string Text { get; set; }
    }
    
    public class SentimentPrediction
    {
        public string Sentiment { get; set; }          // Predicted sentiment class
        public float[] Score { get; set; }            // Probabilities for each class
    }
  2. Write the Main Program
    Here’s the full program to train and test the sentiment analysis model:

    using Microsoft.ML;
    using Microsoft.ML.Data;
    
    class Program
    {
        static void Main(string[] args)
        {
            // Initialize the ML.NET environment
            MLContext mlContext = new MLContext();
    
            // Load training data
            string dataPath = "sentiments.tsv";
            IDataView dataView = mlContext.Data.LoadFromTextFile<SentimentData>(
                path: dataPath,
                hasHeader: true,
                separatorChar: '\t'
            );
    
            // Define the data pipeline
            var dataPipeline = mlContext.Transforms.Text.FeaturizeText(
                    outputColumnName: "Features",
                    inputColumnName: nameof(SentimentData.Text))
                .Append(mlContext.Transforms.Conversion.MapValueToKey(
                    outputColumnName: "Label",
                    inputColumnName: nameof(SentimentData.Sentiment)))
                .Append(mlContext.MulticlassClassification.Trainers.SdcaMaximumEntropy())
                .Append(mlContext.Transforms.Conversion.MapKeyToValue(
                    outputColumnName: "PredictedLabel"));
    
            // Train the model
            var trainingModel = dataPipeline.Fit(dataView);
    
            // Save the model
            string modelPath = "sentimentModel.zip";
            mlContext.Model.Save(trainingModel, dataView.Schema, modelPath);
    
            // Load the model and make a prediction
            var loadedModel = mlContext.Model.Load(modelPath, out _);
            var predictionEngine = mlContext.Model.CreatePredictionEngine<SentimentData, SentimentPrediction>(loadedModel);
    
    
    
            Console.WriteLine("Enter the text");
            var input = Console.ReadLine();
            var newSample = new SentimentData { Text = input };
            var prediction = predictionEngine.Predict(newSample);
    
            // Display prediction result
            Console.WriteLine($"Text: {newSample.Text}");
            float positiveProbability = prediction.Score[0];  // Probability of Positive class
            Console.WriteLine(string.Join(",", prediction.Score));
            string sentiment;
            if (positiveProbability > 0.66)
            {
                sentiment = "Positive";
            }
            else if (positiveProbability < 0.33)
            {
                sentiment = "Negative";
            }
            else
            {
                sentiment = "Neutral";
            }
            Console.WriteLine(sentiment);
        }
    }

How Is this an NLP Model?

What Makes It NLP?

This program qualifies as an NLP model because it processes and analyzes natural language text—unstructured data generated by humans—and uses machine learning to derive insights. Here’s how:

  1. Text as Input: The raw input to this model is human language text (e.g., "This product is great!").
  2. Feature Extraction: The text is converted into a numeric representation suitable for machine learning using the FeaturizeText transformer. This step is an NLP technique because it uses methods like tokenization, text embedding, and bag-of-words representation.
  3. Text Classification: The model classifies the input text into one of two categories: Positive or Negative. Text classification is a core NLP task.
  4. Applications in Real-World NLP: Sentiment analysis is commonly used in product review analysis, customer feedback analysis, and social media monitoring.

Output Samples

Output sample

When you run the program, it will train the sentiment analysis model, evaluate its accuracy on the training data, and predict the sentiment of new text. The output might look like this:

Enhancing the Model

While this is a beginner-friendly example, you can enhance the NLP capabilities by:

  1. Preprocessing the Text: Remove stop words, apply stemming/lemmatization, and normalize text before featurizing.
  2. Using Pretrained Embeddings: Integrate pre-trained NLP models like Word2Vec or BERT for better feature extraction.
  3. Build on Other NLP Tasks: Expand this framework to perform other NLP tasks like named entity recognition (NER), text summarization, or question answering.

Conclusion

You’ve created your first NLP model using ML.NET in C#. This sentiment analysis application processed human language, extracted meaningful features, and classified text into predefined categories. While this example focuses on sentiment classification, ML.NET can be extended to tackle various NLP challenges.

By exploring this example, you’ve taken a big step into the fascinating world of NLP. Continue experimenting with the model, refine it, and try applying it to real-world datasets!

Happy coding!