Your First Experiment In Azure ML Studio

Introduction

 
For creating a Machine Learning experiment, we will use an automobile data set and try to predict the price of the automobile based on factors, such as ‘make’ and ‘technical specification’. Before we get started with the steps involved in creating this experiment in Azure ML Studio, we need to Sign Up (In) on the platform. For that, visit here and sign in using your Outlook account. You may use any other Microsoft account, work account, or school account.
 

Steps involved to create an Experiment

 
Creation of a model can be divided into three parts (a) Creation of Model (b) Training of Model (c) Testing the Model, and the steps involved are -
 

Creation of Model

  • Step 1: Get the data
  • Step 2: Prepare the data
  • Step 3: Define features

Train the Model

  • Step 4: Choose and apply a learning algorithm

Score and Test the Model

  • Step 5: Predict the new automobile price.
Let’s get started.
 
Step 1 - Get the Data
 
The very first step is to get the data. The data can come in different types, formats, and structures. Azure ML Studio comes with many sample datasets that we can use. For this experiment, we are going to use Automobile Price Data (Raw), which is present in the Azure ML workspace. Note that we can import data from various resources.
 
1.1 At the bottom of the Machine Learning Studio window, you’ll find the ‘+New’ button. Click on it to create a new experiment and then, select Blank Experiment.
 
Your First Experiment In Azure ML Studio
 
Figure: The +New button
 
Your First Experiment In Azure ML Studio
 
Figure: Select Blank Experiment
 
1.2 At the top of the canvas, you can find the default given name. Rename it to Automobile Price Prediction.
 
Your First Experiment In Azure ML Studio
 
Figure: Rename the experiment name
 
1.3 Towards the left, you’ll find there is a pallet of dataset and modules. At the top of this pallet, in the search box, type "Automobile" to find the dataset labeled "Automobile Price Data (Raw)". Drag and drop this data set on the experiment canvas.
 
Your First Experiment In Azure ML Studio
 
Figure: Search for automobile
 
Your First Experiment In Azure ML Studio
 
Figure: Drag and drop the data on the canvas
 
To visualize this dataset, click on the output port of the dataset and then select "Visualize". 
 
Your First Experiment In Azure ML Studio
 
Figure: Click on Visualize
 
In this dataset, the data is stored in row and column format. Row carries the instance of automobile appearing and column describes different features associated with each automobile. From the given dataset, our task is to predict the price of an automobile located in column 26 and titled as ‘price’.
 
Your First Experiment In Azure ML Studio
 
Figure: Dataset
 
You may close the window by clicking the ‘x’ button in the upper right corner.
 
Step 2 - Prepare the data
 
This step is often called feature engineering where data is pre-processed before it can be analyzed. For instance, there are missing values in many columns. Also, the normalized-losses column has a huge proportion of missing values, so we will drop that column while analyzing. First, we’ll remove the normalized-losses column and then the rows that have many missing data.
 
2.1 In the search box, type "Select Columns" to find the "Select Columns in Dataset" module. Drag and drop this module on canvas. By using this module, we can select a column we want to include or exclude in the model.
 
Your First Experiment In Azure ML Studio
 
Figure: Search Select columns
 
2.2 Click on the output port of Automobile Price Data (Raw) and connect it to the input port of the "Select Columns in Dataset".
 
Your First Experiment In Azure ML Studio
 
Figure: Connect data and select the column module
 
2.3 Click on the "Select Columns in Dataset" module and on the right side, you’ll find "Properties" pane. Click on "Launch Column Selector".
 
Your First Experiment In Azure ML Studio
 
Figure: Click on Select Columns in Dataset Module
 
Your First Experiment In Azure ML Studio
 
Figure: Click on Launch Column Selector
  • From Select Column Window, click on "With Rules".
  • Click "All Columns" under Begin With which selects all columns except those we are going to exclude.
  • To exclude the normalized-losses column, we will select Exclude and Column Name from the drop-down. In the list of columns displayed, select normalized-losses and add it to the text box.
  • Click OK and close the column selector.
Your First Experiment In Azure ML Studio
 
Figure: Exclude normalized-losses column
 
Look at the properties pane of "Select Columns in Dataset". It indicates that it allows all columns to pass except normalized-losses.
 
Your First Experiment In Azure ML Studio
 
Figure: Normalized-losses columns now excluded
 
Tip: Double click the module to add a comment which can help to get a better understanding of the experiment.
 
2.4 Let us now figure out the missing values in rows. As done earlier, search for "Clean Missing Data" and drag and drop the module on canvas.
 
Your First Experiment In Azure ML Studio
 
Figure: Search for clean missing data
  • Connect it to "Select Columns in Dataset".
  • In Properties pane under Cleaning Mode title, select "Remove Entire Row". This removes all rows which have missing values.
Your First Experiment In Azure ML Studio
 
Figure: Connect Clean Missing Data to Select Columns in Dataset
 
Your First Experiment In Azure ML Studio
 
Figure: Select Remove Entire Row
2.5 At the bottom of the window, click Run. 
 
Your First Experiment In Azure ML Studio
 
Figure: Click on RUN
 
After the experiment has finished running, all modules are marked with green checkmarks indicating that they are finished successfully. Also, at the top right corner, you’ll find the status "Finished Running".
 
Your First Experiment In Azure ML Studio
 
Figure: Green marks after a successful run
 
Ok, let us visualize our dataset now. Click on the left output port of the Clean Missing Data module and select "Visualize". Note, there are no missing values and column normalized-losses are dropped. Our data is now clean and ready for analysis.
 
Step 3 - Define Features
 
Features in machine learning are most often columns in the dataset that help us derive the output. In this dataset, each row indicates one automobile and each column is a feature of that automobile.
 
Some features are good for predicting target values and some are not. Some features are very co-related to each other and can be dropped. In our case, ‘city-mpg’ and ‘highway-mpg’ are closely related. Hence, we can keep one of them and drop other features, without affecting the predictive outcome.
 
To get started, let us use the following set of features.
 
make, body-style, wheel-base, engine-size, horsepower, peak-rpm, highway-mpg, price
 
3.1 From the search box, once again, type Select Column and drag & drop the "Select Columns in Dataset" to the experiment canvas. Join the left output port of the Clean Missing Data module to the input port of Select Columns in Dataset.
 
Your First Experiment In Azure ML Studio
 
Figure: Connect the two modules
 
3.2 From the Properties pane, click on "Column Selector".
  • Click on "With Rules" and under Begin With, click "No Columns".
  • Select "Include" and Columns Names from the drop-down in the text box and add the following list of columns.
make, body-style, wheel-base, engine-size, horsepower, peak-rpm, highway-mpg, price
  • Click OK.
Your First Experiment In Azure ML Studio
 
Figure: Include specific columns
 
After this module runs, it provides a filtered dataset containing features we passed. These features will only be used to pass to the learning algorithm. Remember, you can always come back and play around this module to add or remove features to get a better output.
 
Step 4 - Choose and apply the algorithm
 
As our data is ready for analyzing, we can now construct a predictive model that consists of training and testing. We will use most of the data to train the model (70% - 80% of data) and the rest of the data will be used to test the model to check the accuracy of our predicted values.
From our previous discussions, Regression is used to predict a number, and as we want to predict the price of the automobile, we will use the Regression Algorithm because the price is a number.
 
We train our model by giving it sample data, i.e., training data that includes price. The models analyze the data and find the relation between prices and automobile features. We, then, test our model with the training data. We give a model set of features for automobiles, whose answer we are familiar with and then see how closely our model was able to predict the known price.
 
We are going to split our dataset into a test dataset and train dataset for training and testing the model.
 
4.1 From the pallet, search for the "Split Data" module and drag it to the experiment canvas. At the same time, connect it to the previous "Select Columns in Dataset".
 
Your First Experiment In Azure ML Studio
 
Figure: Add Split Data module and connect it to the previous model
 
4.2 Click Split Data Module and in the Properties section, under heading "Fraction of rows" in the first output dataset, set its value to 0.75, which means 75% of data will be used to train the model and the left-over data, i.e., 25% of the data will be used for testing. You can always come back and change the values.
 
Your First Experiment In Azure ML Studio
 
Figure: Split the dataset into train and test dataset
 
Random Seed produces different random samples for training and testing.
 
4.3 Run the experiment to pass a defined set of features from the dataset and split the dataset into training and testing of the dataset. Click on the left output port of Split Data module and select "Visualize" to see the training dataset, and click on the right output port and select "Visualize" to see the testing dataset.
 
Your First Experiment In Azure ML Studio
 
Figure: Train dataset with 145 records i.e. 75% of the original dataset
 
Your First Experiment In Azure ML Studio
 
Figure: Test dataset with 48 records i.e. 25% of the original dataset
 
4.4 It’s time we select our machine learning algorithm. From the pallet, on the left side, expand the Machine Learning category and then expand Initialize Model. Here, you can see many Machine Learning Algorithms. Drag and drop the Linear Regression module under the Regression category.
 
Your First Experiment In Azure ML Studio
 
Figure: Look for Linear Regression model in pallet
 
You could simply search for Linear Regression and drag & drop to the experiment canvas from the model pallet.
 
Your First Experiment In Azure ML Studio
 
Figure: Type Linear Regression in search box
 
4.5 Search for Train Model module and drag and drop on the canvas. Connect the left output port of the Split Data module, i.e. Training Dataset to the right port Train Model, and connect the output port of the Linear Regression model to the left port of the Training Model.
 
Your First Experiment In Azure ML Studio
 
Figure: Feeding model with algorithm and train dataset
 
 4.6 Click the "Train Model" module. From the Properties pane, click on the "Launch Column Selector".
  • Click on "By Name" and then, select the “price” column which is the value that we are going to predict.
  • Select “price” column from the Available Columns section and move it to the Selected Columns
Your First Experiment In Azure ML Studio
 
Figure: Selecting a column to predict
 
4.7 RUN the experiment.
 
The model is now trained to predict the new price of the automobile when given a set of parameters.
 
Your First Experiment In Azure ML Studio
 
Figure: Green checkmarks after successful RUN
 
Step 5 - Predict New Automobile Price
 
As we trained our model with 75% of data, the leftover (25%) data can be used to score how well the model has performed.
 
5.1 Search for Score Model module and drag & drop it to the experiment canvas.
 
Connect the test data output port from the "Split Data" module to the right input port of ScoreModel and the output port of the Train Model to the left input port of the Score Model.
 
Your First Experiment In Azure ML Studio
 
Figure Connect Score with Train Model and Split Data
 
5.2 RUN the experiment and click on the output port of the Score Model and select Visualise. The output shows the new price calculated by the model and the known price values from the test data.
 
Your First Experiment In Azure ML Studio
 
Figure Predicted and Known Values Compared
 
5.3 Towards the send, we check the quality of the results. Search Evaluate Model module and drag & drop to the experiment canvas. Connect the output port of the Score Model to the left input port of the Evaluate Model.
 
Your First Experiment In Azure ML Studio
 
Figure Connect Score Model and Evaluate Model
 
5.4 RUN the experiment.
 
Click on the output port of the Evaluate Model and select Visualize.
 
Your First Experiment In Azure ML Studio
 
Figure Output of Evaluation Model
 
Note
 
The "Evaluate" model contains two input ports that can be used to compare output from two different models simultaneously. We may use two different algorithms in the experiment and use evaluate the model to check which one gives the better output.
 
Note
 
The difference between the predicted and actual value is an error.
 
The following statistics are shown for our model,
  • Mean Absolute Error (MAE)
     
    The average of the absolute errors is known as MAE.
     
  • Root Mean Squared Error (RMSE)
     
    It is calculated by taking the square root of the average of squared errors of predictions made on the test dataset.
     
  • Relative Absolute Error
     
    It is the average of absolute errors relative to the absolute difference between the average of all actual values and actual values.
     
  • Relative Squared Error
     
    It is the average of squared errors relative to the squared difference between the average of all actual values and the actual values.
     
  • Coefficient of Determination
     
    It is the statistical metric that indicates how well a model fits the data. It is also known as the R squared value.
For all the statistics errors, smaller is better. The smaller the error value predicted values are close to actual values. In the case of the Coefficient of Determination, the closer its value is to one (1.0), the better the predictions.


Similar Articles