How To Do Binary Classification in ASP.Net Core Using ML.Net

How to do Binary Classification in ASP.Net Core using ML.Net

 
In this article, we will see binary classification in ASP.Net Core using ML.Net. I have used a mushroom classification problem to demonstrate binary classification. To quickly review what machine learning and binary classification are, please refer to this article.
 

Prerequisite

Problem

 
This project demonstrates the application of ML.Net to classify the mushrooms whether they are edible or poisonous. This type of task is very popular in the machine learning world and is often referred to as a Two-class or Binary classification problem. The purpose of this project is to see how we can leverage the wonderful capabilities of ML.Net to implement machine learning-based features into our .net applications.

 
Data

 
This dataset includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family Mushroom drawn from The Audubon Society Field Guide to North American Mushrooms (1981). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like "leaflets three, let it be'' for Poisonous Oak and Ivy.

 
Attributes

 

Features

 
Features Name
Values
cap-shape
bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s
cap-surface
fibrous=f,grooves=g,scaly=y,smooth=s
cap-color
brown=n,buff=b,cinnamon=c,gray=g,green=r,pink=p,purple=u,red=e,white=w,yellow=y
bruises
bruises=t,no=f
Odor
almond=a,anise=l,creosote=c,fishy=y,foul=f,musty=m,none=n,pungent=p,spicy=s
gill-attachment
attached=a,descending=d,free=f,notched=n
gill-spacing
close=c,crowded=w,distant=d
gill-size
broad=b,narrow=n
gill-color
black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e,white=w,yellow=y
stalk-root
bulbous=b,club=c,cup=u,equal=e,rhizomorphs=z,rooted=r,missing=?
stalk-surface-above-ring
ibrous=f,scaly=y,silky=k,smooth=s
stalk-surface-below-ring
fibrous=f,scaly=y,silky=k,smooth=s
stalk-color-above-ring
brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y
stalk-color-below-ring
brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y
veil-type
partial=p,universal=u
veil-color
brown=n,orange=o,white=w,yellow=y
ring-number
none=n,one=o,two=t
ring-type
cobwebby=c,evanescent=e,flaring=f,large=l,none=n,pendant=p,sheathing=s,zone=z
spore-print-color
black=k,brown=n,buff=b,chocolate=h,green=r,orange=o,purple=u,white=w,yellow=y
population
abundant=a,clustered=c,numerous=n,scattered=s,several=v,solitary=y
habitat
grasses=g,leaves=l,meadows=m,paths=p,urban=u,waste=w,woods=d
 

Label (Class)

 
Label
edible=e, poisonous=p

 
Solution

 
To solve this problem, first, we will build an estimator to define the ML pipeline we want to use. Then we will train this estimator on existing data, evaluate how good it is using cross-validation, and lastly, we'll consume the model to predict whether a few examples are edible or poisonous.
Below are the steps:
  • Load the dataset from CSV data file
  • Preprocess the data - Create an estimator and transform the data
  • Train the model by providing training dataset as input to the model
  • Evaluate the model using cross-validation
  • Predict the labels of test data

Code

 
I have added the code files to this article.  However, to get the most updated version, please refer to this link Mushroom-Classification-using-C-Sharp-and-ML.Ne
 
Step 1- Create New Project 
 
Open Visual Studio. Click on the menu File àNewàProject. It will open the new project window. Now in this window select Visual C# à.Net core in the left panel and then Console App(.NET Core) in the right panel. In the name, section enters the project name “MushroomClassifier” and click on the OK button.  
 
 
 
Step 2 – Install NuGet Package
 
In the solution explorer, right-click on the project name and then click on Manage NuGet Packages… option
 
  
 
In the browse section, enter Microsoft.ML and install it. It will add ML.Net dll and related dependencies to the project.
 
 
 
Step 3- Import Data file 
  • Download the data file mushroom.csv from the zip source
  • Create a new folder named “Data” in the project. Right-click on it and choose to Add >> Existing Item
  • Browse to the location of the downloaded mushroom.csv file and add it to the project
     
 
  
Step-4 Create Data Models
 
Create a new folder inside the project called “DataModels” and then create a new class inside it called “MushroomModelInput.cs”. This class contains input features of the model.
  1. namespace MushroomClassifier.DataModels    
  2. {    
  3.     class MushroomModelInput    
  4.     {    
  5.         [LoadColumn(0)]    
  6.         public string mClass { getset; }    
  7.         [LoadColumn(1)]    
  8.         public string cap_shape { getset; }    
  9.         [LoadColumn(2)]    
  10.         public string cap_surface { getset; }    
  11.         [LoadColumn(3)]    
  12.         public string cap_color { getset; }    
  13.         [LoadColumn(4)]    
  14.         public string bruises { getset; }    
  15.         [LoadColumn(5)]    
  16.         public string odor { getset; }    
  17.         [LoadColumn(6)]    
  18.         public string gill_attachment { getset; }    
  19.         [LoadColumn(7)]    
  20.         public string gill_spacing { getset; }    
  21.         [LoadColumn(8)]    
  22.         public string gill_size { getset; }    
  23.         [LoadColumn(9)]    
  24.         public string gill_color { getset; }    
  25.         [LoadColumn(10)]    
  26.         public string stalk_shape { getset; }    
  27.         [LoadColumn(11)]    
  28.         public string stalk_root { getset; }    
  29.         [LoadColumn(12)]    
  30.         public string stalk_surface_above_ring { getset; }    
  31.         [LoadColumn(13)]    
  32.         public string stalk_surface_below_ring { getset; }    
  33.         [LoadColumn(14)]    
  34.         public string stalk_color_above_ring { getset; }    
  35.         [LoadColumn(15)]    
  36.         public string stalk_color_below_ring { getset; }    
  37.         [LoadColumn(16)]    
  38.         public string veil_type { getset; }    
  39.         [LoadColumn(17)]    
  40.         public string veil_color { getset; }         
  41.         [LoadColumn(18)]    
  42.         public string ring_number { getset; }         
  43.         [LoadColumn(19)]    
  44.         public string ring_type { getset; }    
  45.         [LoadColumn(20)]    
  46.         public string spore_print_color { getset; }    
  47.         [LoadColumn(21)]    
  48.         public string population { getset; }    
  49.         [LoadColumn(22)]    
  50.         public string habitat { getset; }    
  51.     }   
  52. }  
Create another class called “MushroomModelPrediction.cs”. This class contains the predicted Output/Label and corresponding score.
  1. class MushroomModelPrediction    
  2. {    
  3.     [ColumnName("PredictedLabel")]    
  4.     public string Label { getset; }    
  5.     public float[] Score { getset; }    
  6. } 
Step 5-Model Building
 
Create and initialize the “MLContext” class in program.cs. MLContext class is a starting point and it creates an ML.Net environment that can be shared across model creation workflows. It is a similar concept as DBContext in Entity Framework.
  1. MLContext mlContext = new MLContext(); 
Add the LoadData method after the mail method. This loads the data from the CSV file and divides it into training and testing datasets. The data is loaded into IDataView which is a flexible, efficient way of describing tabular data (numeric and text) in ML.Net. We usually consider the train/test data ratio like 75/25 or 80/20. In this example, I have taken it 75/25 that is the train-test data fraction is .25.
  1. public static TrainTestData LoadData(MLContext mlContext, double testDataFraction) {  
  2.  //Read data    
  3.  IDataView mushroomDataView = mlContext.Data.LoadFromTextFile < MushroomModelInput > (_dataFilePath, hasHeader: true, separatorChar: ',', allowSparse: false);  
  4.  TrainTestData mushroomTrainTestData = mlContext.Data.TrainTestSplit(mushroomDataView, testFraction: testDataFraction);  
  5.  return mushroomTrainTestData;  
  6. } 
In most of the cases, we need to pre-process the data before feeding it to the learning algorithm. For example, the learning algorithm only works on numerical data but our raw dataset contains non-numerical data. So first we need to transform the data in the form which is accepted by the ML algorithm. Add the process data method , for data transformation.
  1. public static IEstimator < ITransformer > ProcessData(MLContext mlContext) {  
  2.  var pipeline = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "Label", inputColumnName: nameof(MushroomModelInput.mClass))  
  3.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "cap_shape", outputColumnName: "cap_shapeFeaturized"))  
  4.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "cap_surface", outputColumnName: "cap_surfaceFeaturized"))  
  5.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "cap_color", outputColumnName: "cap_colorFeaturized"))  
  6.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "bruises", outputColumnName: "bruisesFeaturized"))  
  7.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "odor", outputColumnName: "odorFeaturized"))  
  8.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "gill_attachment", outputColumnName: "gill_attachmentFeaturized"))  
  9.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "gill_spacing", outputColumnName: "gill_spacingFeaturized"))  
  10.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "gill_size", outputColumnName: "gill_sizeFeaturized"))  
  11.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "gill_color", outputColumnName: "gill_colorFeaturized"))  
  12.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "stalk_shape", outputColumnName: "stalk_shapeFeaturized"))  
  13.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "stalk_root", outputColumnName: "stalk_rootFeaturized"))  
  14.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "stalk_surface_above_ring", outputColumnName: "stalk_surface_above_ringFeaturized"))  
  15.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "stalk_surface_below_ring", outputColumnName: "stalk_surface_below_ringFeaturized"))  
  16.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "stalk_color_above_ring", outputColumnName: "stalk_color_above_ringFeaturized"))  
  17.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "stalk_color_below_ring", outputColumnName: "stalk_color_below_ringFeaturized"))  
  18.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "veil_type", outputColumnName: "veil_typeFeaturized"))  
  19.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "veil_color", outputColumnName: "veil_colorFeaturized"))  
  20.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "ring_number", outputColumnName: "ring_numberFeaturized"))  
  21.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "spore_print_color", outputColumnName: "spore_print_colorFeaturized"))  
  22.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "population", outputColumnName: "populationFeaturized"))  
  23.   .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "habitat", outputColumnName: "habitatFeaturized"))  
  24.   .Append(mlContext.Transforms.Concatenate(outputColumnName: "Features", inputColumnNames: new string[] {  
  25.    "cap_shapeFeaturized",  
  26.    "cap_surfaceFeaturized",  
  27.    "cap_colorFeaturized",  
  28.    "bruisesFeaturized",  
  29.    "odorFeaturized",  
  30.    "gill_attachmentFeaturized",  
  31.    "gill_spacingFeaturized",  
  32.    "gill_sizeFeaturized",  
  33.    "gill_colorFeaturized",  
  34.    "stalk_shapeFeaturized",  
  35.    "stalk_rootFeaturized",  
  36.    "stalk_surface_above_ringFeaturized",  
  37.    "stalk_surface_below_ringFeaturized",  
  38.    "stalk_color_above_ringFeaturized",  
  39.    "stalk_color_below_ringFeaturized",  
  40.    "veil_typeFeaturized",  
  41.    "veil_colorFeaturized",  
  42.    "ring_numberFeaturized",  
  43.    "spore_print_colorFeaturized",  
  44.    "populationFeaturized",  
  45.    "habitatFeaturized"  
  46.   }));  
  47.  return pipeline;  
  48. }  
After creating the data processing pipeline, we need to add the learning algorithms to the pipeline using the below code.
  1. var trainPipeline = pipeline.Append(mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers.AveragedPerceptron("Label", "Features", numberOfIterations: 10)))  
  2.                                         .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));  
Then we will perform a model evaluation using cross fold-validation to ensure that our model will perform as well as expected. We do not need to go in detail of cross-fold validation as It is in-built ML.Net. But just to get the idea, in the cross-fold validation, the train data set is divide into fix number of folds N and out of these N folds, N-1 folds are used in training and the remaining 1 fold is used for testing. This process is repeated N times changing train/test sets.
  1. Console.WriteLine("=============== Starting 10 fold cross validation ===============");    
  2. var crossValResults = mlContext.MulticlassClassification.CrossValidate(data: trainDataView, estimator: trainPipeline, numberOfFolds: 10, labelColumnName: "Label");  
  3. var metricsInMultipleFolds = crossValResults.Select(r => r.Metrics);    
  4. var microAccuracyValues = metricsInMultipleFolds.Select(m => m.MicroAccuracy);    
  5. var microAccuracyAverage = microAccuracyValues.Average();         
  6. var macroAccuracyValues = metricsInMultipleFolds.Select(m => m.MacroAccuracy);    
  7. var macroAccuracyAverage = macroAccuracyValues.Average();    
  8. var logLossValues = metricsInMultipleFolds.Select(m => m.LogLoss);    
  9. var logLossAverage = logLossValues.Average();            
  10. var logLossReductionValues = metricsInMultipleFolds.Select(m => m.LogLossReduction);    
  11. var logLossReductionAverage = logLossReductionValues.Average();     Console.WriteLine($"*************************************************************************************************************");    
  12. Console.WriteLine($"*       Metrics Multi-class Classification model      ");    
  13. Console.WriteLine($"*------------------------------------------------------------------------------------------------------------");    
  14. Console.WriteLine($"*       Average MicroAccuracy:    {microAccuracyAverage:0.###} ");    
  15. Console.WriteLine($"*       Average MacroAccuracy:    {macroAccuracyAverage:0.###} ");    
  16. Console.WriteLine($"*       Average LogLoss:          {logLossAverage:#.###} ");    
  17. Console.WriteLine($"*       Average LogLossReduction: {logLossReductionAverage:#.###} ");    
  18. Console.WriteLine($"*************************************************************************************************************");    
  19. //Now we need to train the model using below code    
  20. Console.WriteLine("=============== Create and Train the Model ===============");    
  21. var model = trainPipeline.Fit(trainDataView);    
  22. Console.WriteLine("=============== End of training ==============="); 
Step 6- Prediction
 
Now, as our ML model is built, we are ready for predicting the test data. For testing first, create an instance of input data model class MushroomModelInput.
  1. var mushroomInput1 = new MushroomModelInput {  
  2.  cap_shape = "x",  
  3.   cap_surface = "s",  
  4.   cap_color = "n",  
  5.   bruises = "t",  
  6.   odor = "p",  
  7.   gill_attachment = "f",  
  8.   gill_spacing = "c",  
  9.   gill_size = "n",  
  10.   gill_color = "k",  
  11.   stalk_shape = "e",  
  12.   stalk_root = "e",  
  13.   stalk_surface_above_ring = "s",  
  14.   stalk_surface_below_ring = "s",  
  15.   stalk_color_above_ring = "w",  
  16.   stalk_color_below_ring = "w",  
  17.   veil_type = "p",  
  18.   veil_color = "w",  
  19.   ring_number = "o",  
  20.   ring_type = "p",  
  21.   spore_print_color = "k",  
  22.   population = "s",  
  23.   habitat = "u"  
  24. }; 
Create a method PredictSingleResult. This method creates a prediction engine using mlcontext object and ml model which we built in last step. The prediction engine takes the test input instance as parameter and produce output object which contains predicted label and related score. 
  1. public static MushroomModelPrediction PredictSingleResult(MLContext mlContext, ITransformer model, MushroomModelInput input) {  
  2.  //Creating the prediction engine which takes data model input and output    
  3.  var predictEngine = mlContext.Model.CreatePredictionEngine < MushroomModelInput,  
  4.   MushroomModelPrediction > (model);  
  5.  var predOutput = predictEngine.Predict(input);  
  6.  return predOutput;  
  7. } 

Output 

 
Below is the output from cross-fold validation.
 
 
  
Next is the output from the single input prediction. The predicted label is e that is edible.
 
 
 

Conclusion

 
In this article, we learned how to implement a machine learning task( binary classification) in .net core application using ML.Net with the help of an interesting example (mushroom classification). ML.Net is a great machine learning framework for .net applications and .net developers. It has lots of in-built machine learning algorithms and has the capability of the addition of new algorithms or customization of existing ones. 
 
Thanks for reading :) 


Similar Articles