Getting Started With Machine Learning .NET For Clustering Model

In Build 2018, Microsoft introduced the preview of ML.NET (Machine Learning .NET) which is a cross-platform, open source machine learning framework. Yes, now it's easy to develop our own Machine Learning application or develop custom modules using a Machine Learning framework. ML.NET is a machine learning framework which was mainly developed for .NET developers. We can use C# or F# to develop ML.NET applications. ML.NET is open source and can be run on Windows, Linux, and macOS. The ML.NET is still in development, however, we can use the preview version to work and play with ML.NET.

Here are the reference links.

In this article, we will see how to develop our first ML.NET application to predict the item's stock quantity. 

Machine Learning for Clustering Model

Machine Learning is nothing but a set of programs which is used to train the computer to predict and display the output for us. Examples of live applications which are using Machine Learning are Windows Cortana, Facebook News Feed, Self-Driving Cars, Future Stock Prediction, Gmail Spam detection, Paypal fraud detection, etc.

In Machine Learning, there are 3 main types,

  • Supervised learning
    Machine gets labeled inputs and their desired outputs. For example, Taxi Fare detection. 

  • Unsupervised learning
    Machine gets inputs without desired outputs. Example - Customer Segmentations.

  • Reinforcement learning
    In this kind of algorithm, we will interact with the dynamic interaction. For example -  Self-Driving Cars.

In each type, we will be using an algorithm to train the machine for producing results. We can see the algorithm for each machine learning type.

  • Supervised learning has Regression and Classification Algorithms 
  • Unsupervised learning has Clustering and Association Algorithms
  • Reinforcement learning has Classification and Control Algorithms

In my previous article, I have explained about predicting future stock for an item using ML.NET for the regression model for supervised learning.

In this article and sample program, we will see how to work on a clustering model for predicting mobile sales by model, gender,  before 2010 and after 2010 using the clustering model with ML.NET.

Ref link,

Things to know before starting ML.NET

Initialize the Model

For working with Machine Learning first we need to pick our best fit machine learning algorithm. Machine learning has clustering, regression, classification and anomaly detection modules. Here in this article we will be using the Clustering model for predicting the Customer Segmentation of mobile phone usage.

Train

We need to train the machine learning model. Training is the process of analyzing input data by model. The training is mainly used for the model to learn the pattern and save it as a trained model. For example, we will be creating a csv file in our application and in the csv file we will be giving the Customer details as Male, Female, Before2010 and After2010 and MobilePhone type for the Input. We give more than 100 records in the csv file as samples with all the necessary details. We need to give this csv file as input to our model. Our model needs to be trained and using this data, our model needs to be analyzed to predict the result. The predicted result will be displayed as Cluster ID and scored as distance to us in our console application.

Score

Score here is not the same as our regression model, where in Regression we will be having the labeled input as well as labeled output, but for the Clustering model we don’t have the desired output here in score will contain the array with squared Euclidean distances to the cluster centroids. Ref link - ML.NET to cluster.

Prerequisites

Make sure you have installed all the prerequisites in your computer. If not, then download and install Visual Studio 2017 15.6 or later with the ".NET Core cross-platform development" workload installed.

Code part

Step 1 - Create C# Console Application

After installing the prerequisites, click Start >> Programs >> Visual Studio 2017 >> Visual Studio 2017 on your desktop. Click New >> Project. Select Visual C# >> Windows Desktop >> Console APP (.Net Framework). Enter your project name and click OK.

Machine Learning .NET For Clustering Model 

Step 2 - Add Microsoft ML package

Right click on your project and click on Manage NuGet Packages.

Machine Learning .NET For Clustering Model 

Select Browse tab and search for Microsoft.ML

Machine Learning .NET For Clustering Model 

Click on Install, I Accept and wait until the installation is complete.

Machine Learning .NET For Clustering Model 

We can see the Microsoft.ML package has been installed and all the references for Microsoft.ML has been added in our project references.

Machine Learning .NET For Clustering Model 

Step 3 - Creating Train Data

Now we need to create a Model training dataset. For creating this we will add csv file for training the model. We will create a new folder called Data in our project to add to our csv files.

Add Data Folder

Right click the project and Add New Folder and name the folder as “Data”

Machine Learning .NET For Clustering Model 

Creating Train CSV file

Right click the Data folder click on Add >> New Item >> select the text file and name it as “custTrain.csv”

Machine Learning .NET For Clustering Model 

Select the properties of the “StockTrain.csv” change the Copy in Output Directory to Copy always”

Machine Learning .NET For Clustering Model 

Add your csv file data like below.

Here we have added the data with the following fields.

(Feature)

  • Male - Total number of phones (Feature)
  • Female – Total number of phones (Feature)
  • Before2010 – Total number of phones (Feature)
  • After2010 – Total number of phones (Feature)
  • MobilePhone – Mobile Phone Type.

Note
We need a minimum of 100 records of data to be added to train our Model

Step 4 - Creating Class for Input Data and Prediction

Now we need to create a class for Input Data and prediction; for doing this right click our project and add new class and name it as “CustData.cs”

In our class, first we need to import the Microsoft.ML.Runtime.Api for column and ClusterPrediction Class creation.

  1. using Microsoft.ML.Runtime.Api;  

Next, we need to add all our columns, like our csv file, in the same order in our class and set  the column from 0 to 3.

  1. class CustData  
  2.     {  
  3.         [Column("0")]  
  4.         public float Male;  
  5.   
  6.         [Column("1")]  
  7.         public float Female;  
  8.   
  9.         [Column("2")]  
  10.         public float Before2010;  
  11.   
  12.         [Column("3")]  
  13.         public float After2010;  
  14.     }  

Creating prediction class. Now we need to create a prediction class and, in this class, we need to add our Prediction column. Here we add PredictedLabel and Score column as PredictedCustId and Distances. Predicted Label will contain the ID of the predicted cluster. Score column contains an array with squared Euclidean distances to the cluster centroids. The array length is equal to the number of clusters. For more details refer to this link - ML.NET to cluster

Note
Important to note is that in the prediction column we need to set the column name as the “Score” and also set the data type as the float[] for Score and for PredictedLabel set as uint.

  1. public class ClusterPrediction  
  2.     {  
  3.         [ColumnName("PredictedLabel")]  
  4.         public uint PredictedCustId;  
  5.   
  6.         [ColumnName("Score")]  
  7.         public float[] Distances;  
  8.     }  

Step 5 - Program.cs

To work with ML.NET we open our “program.cs” file and first we import all the needed ML.NET references.

  1. using Microsoft.ML.Legacy;  
  2. using Microsoft.ML.Legacy.Data;  
  3. using Microsoft.ML.Legacy.Trainers;  
  4. using Microsoft.ML.Legacy.Transforms;  

Also import the below to your program.cs file.

  1. using System.Threading.Tasks;  
  2. using System.IO;  

Dataset Path

We set the custTrain.csv data and Model data path. For the traindata we give “custTrain.csv” path

The final trained model needs to be saved. For this we set modelpath with the “custClusteringModel. zip” file. The trained model will be saved in the zip fil automatically during runtime of the program in our bin folder with all needed files.

  1. static readonly string _dataPath = Path.Combine(Environment.CurrentDirectory, "Data""custTrain.csv");  
  2. static readonly string _modelPath = Path.Combine(Environment.CurrentDirectory, "Data""custClusteringModel.zip");  

Change the Main method to async Task Main method like below code

  1. static async Task Main(string[] args){  }  

Before doing this, we need to perform 2 important tasks to successfully run our program

First is to set Platform Target as x64.The ML.NET only runs in x64, for doing this right click the project and select properties >> Select Build and change the Platform target to x64.

 In order to run with our async Task Main method we need to change the language version to C#7.1

In the Project Properties >> Build tab >> click on Advance button at the bottom and change the Language Version to C#7.1

Machine Learning .NET For Clustering Model 

Working with Training Model

First, we need to train the model and save the model to the zip file. For this in our main method we call the predictionModel method and pass the CustData and ClusterPrediction class and return the model to the main method.

  1. static async Task Main(string[] args)  
  2.       {  
  3.           PredictionModel<CustData, ClusterPrediction> model = await Train();  
  4.       }  
  5.   
  6. public static async Task<PredictionModel<CustData, ClusterPrediction>> Train()  
  7.       {  
  8.       }  

Train and Save Model

In the above method we add the function to train the model and save the model to the zip file.

LearningPipeline

In training the first step will be working the LearningPipeline().

The LearningPipeline loads all the training data to train the model.

TextLoader

The TextLoader is used to get all the data from the train csv file for training and here we set the useHeader:true to avoid reading the first row from the csv file.

ColumnConcatenator

Next, we add all our columns to be trained and evaluated.

Adding Learning Algorithm

KMeansPlusPlusClusterer

The learner will train the model. We have selected the Clustering model for our sample and we will be using KMeansPlusPlusClustererlearner. KMeansPlusPlusClusterer is one of the clustering learners provided by the ML.NET. Here we add the KMeansPlusPlusClusterer to our pipeline.

We also need to set the K value as how many clusters we are using for our model. Here we have 3 segments as Windows Mobile, Samsung, and Apple so we have set K=4 in our program for the 3 clusters.

Train and Save Model

Finally, we will train and save the model from this method.

  1. public static async Task<PredictionModel<CustData, ClusterPrediction>> Train()  
  2.         {  
  3.             // Start Learning  
  4.             var pipeline = new LearningPipeline();  
  5.                
  6.             // Load Train Data  
  7.             pipeline.Add(new TextLoader(_dataPath).CreateFrom<CustData>(useHeader: true, separator: ','));  
  8.             // </Snippet6>  
  9.   
  10.             // Add Features columns  
  11.             pipeline.Add(new ColumnConcatenator(  
  12.                     "Features",  
  13.                     "Male",  
  14.                     "Female",  
  15.                     "Before2010",  
  16.                     "After2010"));  
  17.                
  18.             // Add KMeansPlus Algorithm for k=3 (We have 3 set of clusters)  
  19.             pipeline.Add(new KMeansPlusPlusClusterer() { K = 3 });  
  20.               
  21.             // Start Training the model and return the model  
  22.             var model = pipeline.Train<CustData, ClusterPrediction>();  
  23.             return model;  
  24.         }   

Prediction Results

Now it's time for us to produce the result of predicted results by model. For this we will add one more class and, in this Class we will give the inputs.

Create a new Class named as “TestCustData.cs“

We add the values to the TestCustDataClass which we already created and defined the columns for Model training.

  1. static class TestCustData  
  2.     {  
  3.         internal static readonly CustData PredictionObj = new CustData  
  4.         {  
  5.             Male = 300f,  
  6.             Female = 100f,  
  7.             Before2010 = 400f,  
  8.             After2010 = 1400f  
  9.         };  
  10.     }  

We can see in our custTrain.csv file we have the same data for the inputs.

Machine Learning .NET For Clustering Model 

Produce the Model-Predicted results

In our program's  main method, we will add the below code at the bottom after the train method  to predict the result of ClusterID and distances, and display the results from the model to users in the command window.

  1. var prediction = model.Predict(TestCustData.PredictionObj);  
  2.           Console.WriteLine($"Cluster: {prediction.PredictedCustId}");  
  3.           Console.WriteLine($"Distances: {string.Join(" ", prediction.Distances)}");  
  4.           Console.ReadLine();   

Build and Run

When we can run the program, we can see the result in the command window like below.

Machine Learning .NET For Clustering Model 

Conclusion

ML.NET (Machine Learning DotNet) is a great framework for all the dotnet lovers who are all looking to work with machine learning. Now only the preview version of ML.NET is available and I can’t wait till the release of the public version of ML.NET. Here in this article I have used the clustering for Unsupervised type. If you are .Net lovers, and are not aware about Machine Learning and are looking forward to working with machine learning then ML.Net is for you. It's a great framework for getting started with ML.NET. Hope you all enjoy reading this article and see you all soon with another post.