How To Do Binary Classification in ASP.Net Core Using ML.Net

Deepak Kumar
Jan 22, 2020

17.6k
0
2
- facebook
- twitter
- linkedIn
- Reddit
- WhatsApp
- Email
- Print
- Other Artcile

MushroomClassificationusingCandML.Net.zip

How to do Binary Classification in ASP.Net Core using ML.Net

In this article, we will see binary classification in ASP.Net Core using ML.Net. I have used a mushroom classification problem to demonstrate binary classification. To quickly review what machine learning and binary classification are, please refer to this article.

Prerequisite

Visual Studio 2017 15.9.12 or later
.Net Core 2.1 or later
Microsoft.ML NuGet package
Basic knowledge of Machine Learning

Problem

This project demonstrates the application of ML.Net to classify the mushrooms whether they are edible or poisonous. This type of task is very popular in the machine learning world and is often referred to as a Two-class or Binary classification problem. The purpose of this project is to see how we can leverage the wonderful capabilities of ML.Net to implement machine learning-based features into our .net applications.

Data

This dataset includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family Mushroom drawn from The Audubon Society Field Guide to North American Mushrooms (1981). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like "leaflets three, let it be'' for Poisonous Oak and Ivy.

Attributes

Features

Features Name	Values
cap-shape	bell=b,conical=c,convex=x,flat=f, knobbed=k,sunken=s
cap-surface	fibrous=f,grooves=g,scaly=y,smooth=s
cap-color	brown=n,buff=b,cinnamon=c,gray=g,green=r,pink=p,purple=u,red=e,white=w,yellow=y
bruises	bruises=t,no=f
Odor	almond=a,anise=l,creosote=c,fishy=y,foul=f,musty=m,none=n,pungent=p,spicy=s
gill-attachment	attached=a,descending=d,free=f,notched=n
gill-spacing	close=c,crowded=w,distant=d
gill-size	broad=b,narrow=n
gill-color	black=k,brown=n,buff=b,chocolate=h,gray=g, green=r,orange=o,pink=p,purple=u,red=e,white=w,yellow=y
stalk-root	bulbous=b,club=c,cup=u,equal=e,rhizomorphs=z,rooted=r,missing=?
stalk-surface-above-ring	ibrous=f,scaly=y,silky=k,smooth=s
stalk-surface-below-ring	fibrous=f,scaly=y,silky=k,smooth=s
stalk-color-above-ring	brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y
stalk-color-below-ring	brown=n,buff=b,cinnamon=c,gray=g,orange=o,pink=p,red=e,white=w,yellow=y
veil-type	partial=p,universal=u
veil-color	brown=n,orange=o,white=w,yellow=y
ring-number	none=n,one=o,two=t
ring-type	cobwebby=c,evanescent=e,flaring=f,large=l,none=n,pendant=p,sheathing=s,zone=z
spore-print-color	black=k,brown=n,buff=b,chocolate=h,green=r,orange=o,purple=u,white=w,yellow=y
population	abundant=a,clustered=c,numerous=n,scattered=s,several=v,solitary=y
habitat	grasses=g,leaves=l,meadows=m,paths=p,urban=u,waste=w,woods=d

Label (Class)

Label

edible=e, poisonous=p

Solution

To solve this problem, first, we will build an estimator to define the ML pipeline we want to use. Then we will train this estimator on existing data, evaluate how good it is using cross-validation, and lastly, we'll consume the model to predict whether a few examples are edible or poisonous.

Below are the steps:

Load the dataset from CSV data file
Preprocess the data - Create an estimator and transform the data
Train the model by providing training dataset as input to the model
Evaluate the model using cross-validation
Predict the labels of test data

Code

I have added the code files to this article. However, to get the most updated version, please refer to this link Mushroom-Classification-using-C-Sharp-and-ML.Ne

Step 1- Create New Project

Open Visual Studio. Click on the menu File àNewàProject. It will open the new project window. Now in this window select Visual C# à.Net core in the left panel and then Console App(.NET Core) in the right panel. In the name, section enters the project name “MushroomClassifier” and click on the OK button.

Step 2 – Install NuGet Package

In the solution explorer, right-click on the project name and then click on Manage NuGet Packages… option

In the browse section, enter Microsoft.ML and install it. It will add ML.Net dll and related dependencies to the project.

Step 3- Import Data file

Download the data file mushroom.csv from the zip source
Create a new folder named “Data” in the project. Right-click on it and choose to Add >> Existing Item
Browse to the location of the downloaded mushroom.csv file and add it to the project

Step-4 Create Data Models

Create a new folder inside the project called “DataModels” and then create a new class inside it called “MushroomModelInput.cs”. This class contains input features of the model.

namespace MushroomClassifier.DataModels
{
class MushroomModelInput
{
[LoadColumn(0)]
public string mClass { get; set; }
[LoadColumn(1)]
public string cap_shape { get; set; }
[LoadColumn(2)]
public string cap_surface { get; set; }
[LoadColumn(3)]
public string cap_color { get; set; }
[LoadColumn(4)]
public string bruises { get; set; }
[LoadColumn(5)]
public string odor { get; set; }
[LoadColumn(6)]
public string gill_attachment { get; set; }
[LoadColumn(7)]
public string gill_spacing { get; set; }
[LoadColumn(8)]
public string gill_size { get; set; }
[LoadColumn(9)]
public string gill_color { get; set; }
[LoadColumn(10)]
public string stalk_shape { get; set; }
[LoadColumn(11)]
public string stalk_root { get; set; }
[LoadColumn(12)]
public string stalk_surface_above_ring { get; set; }
[LoadColumn(13)]
public string stalk_surface_below_ring { get; set; }
[LoadColumn(14)]
public string stalk_color_above_ring { get; set; }
[LoadColumn(15)]
public string stalk_color_below_ring { get; set; }
[LoadColumn(16)]
public string veil_type { get; set; }
[LoadColumn(17)]
public string veil_color { get; set; }
[LoadColumn(18)]
public string ring_number { get; set; }
[LoadColumn(19)]
public string ring_type { get; set; }
[LoadColumn(20)]
public string spore_print_color { get; set; }
[LoadColumn(21)]
public string population { get; set; }
[LoadColumn(22)]
public string habitat { get; set; }
}
}

Create another class called “MushroomModelPrediction.cs”. This class contains the predicted Output/Label and corresponding score.

class MushroomModelPrediction
{
[ColumnName("PredictedLabel")]
public string Label { get; set; }
public float[] Score { get; set; }
}

Step 5-Model Building

Create and initialize the “MLContext” class in program.cs. MLContext class is a starting point and it creates an ML.Net environment that can be shared across model creation workflows. It is a similar concept as DBContext in Entity Framework.

MLContext mlContext = new MLContext();

Add the LoadData method after the mail method. This loads the data from the CSV file and divides it into training and testing datasets. The data is loaded into IDataView which is a flexible, efficient way of describing tabular data (numeric and text) in ML.Net. We usually consider the train/test data ratio like 75/25 or 80/20. In this example, I have taken it 75/25 that is the train-test data fraction is .25.

public static TrainTestData LoadData(MLContext mlContext, double testDataFraction) {
//Read data
IDataView mushroomDataView = mlContext.Data.LoadFromTextFile < MushroomModelInput > (_dataFilePath, hasHeader: true, separatorChar: ',', allowSparse: false);
TrainTestData mushroomTrainTestData = mlContext.Data.TrainTestSplit(mushroomDataView, testFraction: testDataFraction);
return mushroomTrainTestData;
}

In most of the cases, we need to pre-process the data before feeding it to the learning algorithm. For example, the learning algorithm only works on numerical data but our raw dataset contains non-numerical data. So first we need to transform the data in the form which is accepted by the ML algorithm. Add the process data method , for data transformation.

public static IEstimator < ITransformer > ProcessData(MLContext mlContext) {
var pipeline = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "Label", inputColumnName: nameof(MushroomModelInput.mClass))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "cap_shape", outputColumnName: "cap_shapeFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "cap_surface", outputColumnName: "cap_surfaceFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "cap_color", outputColumnName: "cap_colorFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "bruises", outputColumnName: "bruisesFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "odor", outputColumnName: "odorFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "gill_attachment", outputColumnName: "gill_attachmentFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "gill_spacing", outputColumnName: "gill_spacingFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "gill_size", outputColumnName: "gill_sizeFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "gill_color", outputColumnName: "gill_colorFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "stalk_shape", outputColumnName: "stalk_shapeFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "stalk_root", outputColumnName: "stalk_rootFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "stalk_surface_above_ring", outputColumnName: "stalk_surface_above_ringFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "stalk_surface_below_ring", outputColumnName: "stalk_surface_below_ringFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "stalk_color_above_ring", outputColumnName: "stalk_color_above_ringFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "stalk_color_below_ring", outputColumnName: "stalk_color_below_ringFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "veil_type", outputColumnName: "veil_typeFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "veil_color", outputColumnName: "veil_colorFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "ring_number", outputColumnName: "ring_numberFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "spore_print_color", outputColumnName: "spore_print_colorFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "population", outputColumnName: "populationFeaturized"))
.Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "habitat", outputColumnName: "habitatFeaturized"))
.Append(mlContext.Transforms.Concatenate(outputColumnName: "Features", inputColumnNames: new string[] {
"cap_shapeFeaturized",
"cap_surfaceFeaturized",
"cap_colorFeaturized",
"bruisesFeaturized",
"odorFeaturized",
"gill_attachmentFeaturized",
"gill_spacingFeaturized",
"gill_sizeFeaturized",
"gill_colorFeaturized",
"stalk_shapeFeaturized",
"stalk_rootFeaturized",
"stalk_surface_above_ringFeaturized",
"stalk_surface_below_ringFeaturized",
"stalk_color_above_ringFeaturized",
"stalk_color_below_ringFeaturized",
"veil_typeFeaturized",
"veil_colorFeaturized",
"ring_numberFeaturized",
"spore_print_colorFeaturized",
"populationFeaturized",
"habitatFeaturized"
}));
return pipeline;
}

After creating the data processing pipeline, we need to add the learning algorithms to the pipeline using the below code.

var trainPipeline = pipeline.Append(mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers.AveragedPerceptron("Label", "Features", numberOfIterations: 10)))
.Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));

Then we will perform a model evaluation using cross fold-validation to ensure that our model will perform as well as expected. We do not need to go in detail of cross-fold validation as It is in-built ML.Net. But just to get the idea, in the cross-fold validation, the train data set is divide into fix number of folds N and out of these N folds, N-1 folds are used in training and the remaining 1 fold is used for testing. This process is repeated N times changing train/test sets.

Console.WriteLine("=============== Starting 10 fold cross validation ===============");
var crossValResults = mlContext.MulticlassClassification.CrossValidate(data: trainDataView, estimator: trainPipeline, numberOfFolds: 10, labelColumnName: "Label");
var metricsInMultipleFolds = crossValResults.Select(r => r.Metrics);
var microAccuracyValues = metricsInMultipleFolds.Select(m => m.MicroAccuracy);
var microAccuracyAverage = microAccuracyValues.Average();
var macroAccuracyValues = metricsInMultipleFolds.Select(m => m.MacroAccuracy);
var macroAccuracyAverage = macroAccuracyValues.Average();
var logLossValues = metricsInMultipleFolds.Select(m => m.LogLoss);
var logLossAverage = logLossValues.Average();
var logLossReductionValues = metricsInMultipleFolds.Select(m => m.LogLossReduction);
var logLossReductionAverage = logLossReductionValues.Average(); Console.WriteLine($"*************************************************************************************************************");
Console.WriteLine($"* Metrics Multi-class Classification model ");
Console.WriteLine($"*------------------------------------------------------------------------------------------------------------");
Console.WriteLine($"* Average MicroAccuracy: {microAccuracyAverage:0.###} ");
Console.WriteLine($"* Average MacroAccuracy: {macroAccuracyAverage:0.###} ");
Console.WriteLine($"* Average LogLoss: {logLossAverage:#.###} ");
Console.WriteLine($"* Average LogLossReduction: {logLossReductionAverage:#.###} ");
Console.WriteLine($"*************************************************************************************************************");
//Now we need to train the model using below code
Console.WriteLine("=============== Create and Train the Model ===============");
var model = trainPipeline.Fit(trainDataView);
Console.WriteLine("=============== End of training ===============");

Step 6- Prediction

Now, as our ML model is built, we are ready for predicting the test data. For testing first, create an instance of input data model class MushroomModelInput.

var mushroomInput1 = new MushroomModelInput {
cap_shape = "x",
cap_surface = "s",
cap_color = "n",
bruises = "t",
odor = "p",
gill_attachment = "f",
gill_spacing = "c",
gill_size = "n",
gill_color = "k",
stalk_shape = "e",
stalk_root = "e",
stalk_surface_above_ring = "s",
stalk_surface_below_ring = "s",
stalk_color_above_ring = "w",
stalk_color_below_ring = "w",
veil_type = "p",
veil_color = "w",
ring_number = "o",
ring_type = "p",
spore_print_color = "k",
population = "s",
habitat = "u"
};

Create a method PredictSingleResult. This method creates a prediction engine using mlcontext object and ml model which we built in last step. The prediction engine takes the test input instance as parameter and produce output object which contains predicted label and related score.

public static MushroomModelPrediction PredictSingleResult(MLContext mlContext, ITransformer model, MushroomModelInput input) {
//Creating the prediction engine which takes data model input and output
var predictEngine = mlContext.Model.CreatePredictionEngine < MushroomModelInput,
MushroomModelPrediction > (model);
var predOutput = predictEngine.Predict(input);
return predOutput;
}

Output

Below is the output from cross-fold validation.

Next is the output from the single input prediction. The predicted label is e that is edible.

Conclusion

In this article, we learned how to implement a machine learning task( binary classification) in .net core application using ML.Net with the help of an interesting example (mushroom classification). ML.Net is a great machine learning framework for .net applications and .net developers. It has lots of in-built machine learning algorithms and has the capability of the addition of new algorithms or customization of existing ones.

Thanks for reading :)

Recommended Free Ebook

Python Libraries for Machine Learning

Download Now!