Getting Started With Machine Learning .NET For Clustering Model

Introduction

 
In Build 2018, Microsoft introduced the preview of ML.NET (Machine Learning .NET) which is a cross-platform, open-source machine learning framework. Yes, now it's easy to develop our own Machine Learning application or develop custom modules using a Machine Learning framework. ML.NET is a machine learning framework that was mainly developed for .NET developers. We can use C# or F# to develop ML.NET applications. ML.NET is an open source and can be run on Windows, Linux, and macOS. The ML.NET is still in development, however, we can use the preview version to work and play with ML.NET.
 
Here are the reference links.
In this article, we will see how to develop our first ML.NET application to predict the item's stock quantity. 
 

Machine Learning for Clustering Model

 
Machine Learning is nothing but a set of programs that are used to train the computer to predict and display the output for us. Examples of live applications that are using Machine Learning are Windows Cortana, Facebook News Feed, Self-Driving Cars, Future Stock Prediction, Gmail Spam detection, Paypal fraud detection, etc.
 
In Machine Learning, there are 3 main types,
  • Supervised learning
    Machine gets labeled inputs and their desired outputs. For example, Taxi Fare detection. 

  • Unsupervised learning
    Machine gets inputs without desired outputs. Example - Customer Segmentations.

  • Reinforcement learning
    In this kind of algorithm, we will interact with the dynamic interaction. For example -  Self-Driving Cars.
In each type, we will be using an algorithm to train the machine for producing results. We can see the algorithm for each machine learning type.
  • Supervised learning has Regression and Classification Algorithms 
  • Unsupervised learning has Clustering and Association Algorithms
  • Reinforcement learning has Classification and Control Algorithms
In my previous article, I have explained about predicting future stock for an item using ML.NET for the regression model for supervised learning.
In this article and sample program, we will see how to work on a clustering model for predicting mobile sales by model, gender,  before 2010 and after 2010 using the clustering model with ML.NET.
 
Ref link,
Things to know before starting ML.NET
 
Initialize the Model
For working with Machine Learning first we need to pick our best-fit machine learning algorithm. Machine learning has clustering, regression, classification and anomaly detection modules. Here in this article, we will be using the Clustering model for predicting the Customer Segmentation of mobile phone usage.
 
Train
We need to train the machine learning model. Training is the process of analyzing input data by model. The training is mainly used for the model to learn the pattern and save it as a trained model. For example, we will be creating a CSV file in our application and in the CSV file we will be giving the Customer details as Male, Female, Before2010 and After2010 and MobilePhone type for the Input. We give more than 100 records in the CSV file as samples with all the necessary details. We need to give this CSV file as input to our model. Our model needs to be trained and using this data, our model needs to be analyzed to predict the result. The predicted result will be displayed as Cluster ID and scored as the distance to us in our console application.
 
Score
The score here is not the same as our regression model, wherein Regression we will be having the labeled input as well as labeled output, but for the Clustering model we don’t have the desired output here in score will contain the array with squared Euclidean distances to the cluster centroids. Ref link - ML.NET to the cluster.
 
Prerequisites
Make sure you have installed all the prerequisites on your computer. If not, then download and install Visual Studio 2017 15.6 or later with the ".NET Core cross-platform development" workload installed.
 
Code part
 
Step 1 - Create a C# Console Application
 
After installing the prerequisites, click Start >> Programs >> Visual Studio 2017 >> Visual Studio 2017 on your desktop. Click New >> Project. Select Visual C# >> Windows Desktop >> Console APP (.Net Framework). Enter your project name and click OK.
 
Machine Learning .NET For Clustering Model
 
Step 2 - Add Microsoft ML package
 
Right-click on your project and click on Manage NuGet Packages.
 
Machine Learning .NET For Clustering Model
 
Select the Browse tab and search for Microsoft.ML
 
Machine Learning .NET For Clustering Model
 
Click on Install, I Accept and wait until the installation is complete.
 
Machine Learning .NET For Clustering Model
 
We can see Microsoft.ML package has been installed and all the references for Microsoft.ML has been added to our project references.
 
Machine Learning .NET For Clustering Model
 
Step 3 - Creating Train Data
 
Now we need to create a Model training dataset. For creating this we will add CSV file for training the model. We will create a new folder called Data in our project to add to our CSV files.
 
Add Data Folder
 
Right-click the project and Add New Folder and name the folder as “Data”.
 
Machine Learning .NET For Clustering Model
 
Creating a Train CSV file
 
Right-click the Data folder click on Add >> New Item >> select the text file and name it as “custTrain.csv”.
 
Machine Learning .NET For Clustering Model
 
Select the properties of the “StockTrain.csv” change the Copy in Output Directory to Copy always”.
 
Machine Learning .NET For Clustering Model
 
Add your CSV file data like below.
 
Here we have added the data with the following fields.
(Feature)
  • Male - Total number of phones (Feature)
  • Female – Total number of phones (Feature)
  • Before2010 – Total number of phones (Feature)
  • After2010 – Total number of phones (Feature)
  • MobilePhone – Mobile Phone Type.
Note
We need a minimum of 100 records of data to be added to train our Model
 
Step 4 - Creating Class for Input Data and Prediction
Now we need to create a class for Input Data and prediction; for doing this right-click our project and add a new class and name it as “CustData.cs”
In our class, first, we need to import the Microsoft.ML.Runtime.Api for column and ClusterPrediction Class creation.
  1. using Microsoft.ML.Runtime.Api;  
Next, we need to add all our columns, like our CSV file, in the same order in our class and set the column from 0 to 3.
  1. class CustData    
  2.     {    
  3.         [Column("0")]    
  4.         public float Male;    
  5.     
  6.         [Column("1")]    
  7.         public float Female;    
  8.     
  9.         [Column("2")]    
  10.         public float Before2010;    
  11.     
  12.         [Column("3")]    
  13.         public float After2010;    
  14.     }   
Creating prediction class. Now we need to create a prediction class and, in this class, we need to add our Prediction column. Here we add PredictedLabel and Score column as PredictedCustId and Distances. Predicted Labels will contain the ID of the predicted cluster. The score column contains an array with squared Euclidean distances to the cluster centroids. The array length is equal to the number of clusters. For more details refer to this link - ML.NET to cluster
 
Note
Important to note is that in the prediction column we need to set the column name as the “Score” and also set the data type as the float[] for Score and for PredictedLabel set as uint.
  1. public class ClusterPrediction    
  2.     {    
  3.         [ColumnName("PredictedLabel")]    
  4.         public uint PredictedCustId;    
  5.     
  6.         [ColumnName("Score")]    
  7.         public float[] Distances;    
  8.     }  
Step 5 - Program.cs
To work with ML.NET we open our “program.cs” file and first we import all the needed ML.NET references.
  1. using Microsoft.ML.Legacy;    
  2. using Microsoft.ML.Legacy.Data;    
  3. using Microsoft.ML.Legacy.Trainers;    
  4. using Microsoft.ML.Legacy.Transforms;   
Also import the below to your program.cs file.
  1. using System.Threading.Tasks;    
  2. using System.IO;   
Dataset Path
We set the custTrain.csv data and Model data path. For the traindata we give “custTrain.csv” path
 
The final trained model needs to be saved. For this, we set modelpath with the “custClusteringModel. zip” file. The trained model will be saved in the zip file automatically during the runtime of the program in our bin folder with all needed files.
  1. static readonly string _dataPath = Path.Combine(Environment.CurrentDirectory, "Data""custTrain.csv");    
  2. static readonly string _modelPath = Path.Combine(Environment.CurrentDirectory, "Data""custClusteringModel.zip");   
Change the Main method to async Task Main method like below code
  1. static async Task Main(string[] args){  }   
Before doing this, we need to perform 2 important tasks to successfully run our program.
 
The first is to set Platform Target as x64. The ML.NET only runs in x64, for doing this right-click the project and select properties >> Select Build and change the Platform target to x64.
 
 In order to run with our async Task Main method, we need to change the language version to C#7.1.
 
In the Project Properties >> Build tab >> click on the Advance button at the bottom and change the Language Version to C#7.1
 
Machine Learning .NET For Clustering Model
 
Working with Training Model
 
First, we need to train the model and save the model to the zip file. For this, in our main method, we call the predictionModel method and pass the CustData and ClusterPrediction class and return the model to the main method.
  1. static async Task Main(string[] args)    
  2.       {    
  3.           PredictionModel<CustData, ClusterPrediction> model = await Train();    
  4.       }    
  5.     
  6. public static async Task<PredictionModel<CustData, ClusterPrediction>> Train()    
  7.       {    
  8.       }   
Train and Save Model
 
In the above method, we add the function to train the model and save the model to the zip file.
 
LearningPipeline
 
In training, the first step will be working the LearningPipeline().
 
The LearningPipeline loads all the training data to train the model.
 
TextLoader
 
The TextLoader is used to get all the data from the train CSV file for training and here we set the useHeader:true to avoid reading the first row from the CSV file.
 
ColumnConcatenator
 
Next, we add all our columns to be trained and evaluated.
 

Adding Learning Algorithm

 
KMeansPlusPlusClusterer
 
The learner will train the model. We have selected the Clustering model for our sample and we will be using KMeansPlusPlusClustererlearner. KMeansPlusPlusClusterer is one of the clustering learners provided by the ML.NET. Here we add the KMeansPlusPlusClusterer to our pipeline.
 
We also need to set the K value as to how many clusters we are using for our model. Here we have 3 segments as Windows Mobile, Samsung, and Apple so we have set K=4 in our program for the 3 clusters.
 
Train and Save Model
 
Finally, we will train and save the model from this method.
  1. public static async Task<PredictionModel<CustData, ClusterPrediction>> Train()    
  2.         {    
  3.             // Start Learning    
  4.             var pipeline = new LearningPipeline();    
  5.                  
  6.             // Load Train Data    
  7.             pipeline.Add(new TextLoader(_dataPath).CreateFrom<CustData>(useHeader: true, separator: ','));    
  8.             // </Snippet6>    
  9.     
  10.             // Add Features columns    
  11.             pipeline.Add(new ColumnConcatenator(    
  12.                     "Features",    
  13.                     "Male",    
  14.                     "Female",    
  15.                     "Before2010",    
  16.                     "After2010"));    
  17.                  
  18.             // Add KMeansPlus Algorithm for k=3 (We have 3 set of clusters)    
  19.             pipeline.Add(new KMeansPlusPlusClusterer() { K = 3 });    
  20.                 
  21.             // Start Training the model and return the model    
  22.             var model = pipeline.Train<CustData, ClusterPrediction>();    
  23.             return model;    
  24.         }    
Prediction Results
 
Now it's time for us to produce the result of predicted results by model. For this we will add one more class and, in this Class we will give the inputs.
Create a new Class named “TestCustData.cs“
  1. static class TestCustData    
  2.     {    
  3.         internal static readonly CustData PredictionObj = new CustData    
  4.         {    
  5.             Male = 300f,    
  6.             Female = 100f,    
  7.             Before2010 = 400f,    
  8.             After2010 = 1400f    
  9.         };    
  10.     }   
We add the values to the TestCustDataClass which we already created and defined the columns for Model training.
 
We can see in our custTrain.csv file we have the same data for the inputs.
 
Machine Learning .NET For Clustering Model
 
Produce the Model-Predicted results
 
In our program's main method, we will add the below code at the bottom after the training method to predict the result of ClusterID and distances, and display the results from the model to users in the command window.
  1. var prediction = model.Predict(TestCustData.PredictionObj);    
  2.           Console.WriteLine($"Cluster: {prediction.PredictedCustId}");    
  3.           Console.WriteLine($"Distances: {string.Join(" ", prediction.Distances)}");    
  4.           Console.ReadLine();    
Build and Run
 
When we can run the program, we can see the result in the command window like below.
 
Machine Learning .NET For Clustering Model
 

Conclusion

 
ML.NET (Machine Learning DotNet) is a great framework for all the dotnet lovers who are all looking to work with machine learning. Now only the preview version of ML.NET is available and I can’t wait till the release of the public version of ML.NET. Here in this article, I have used the clustering for Unsupervised type. If you are .Net lovers, and are not aware of Machine Learning and are looking forward to working with machine learning then ML.Net is for you. It's a great framework for getting started with ML.NET. Hope you all enjoy reading this article and see you all soon with another post.