Manage Data In Microsoft Azure Machine Learning

Nitin Pandit
10y
8.1k
0
0

Article

Today in this article, we’ll learn about how we can generate titanic data in Azure Machine Learning. Microsoft Azure Machine Learning is a data modeling environment from which we can get an end-to-end approach to a problem to an answer. Data is summarized according to the status like class, age, sex, survival and this is a format of an ocean liner (Titanic).

Firstly, you sign in to Microsoft Azure, after sign in go to the URL.

Azure Machine Learning page will open, make sure you must have to login into Azure.

In the top left of the window, expand the details. (Shown in rectangle).

window

The menu for Azure Machine will open, inspect it and click on it to open its sub menu. There would be Studio, click on it. Azure machine Learning page will open.

Azure machine Learning

The following screenshot shows there is a number items related to machine learning.

learning

Next step is, go to the following link.

Firstly, you will have to login on the kaggle website.

This website is helpful for titanic data and contains very large datasets. It provides direct path of the file to download it in to your local system.

For the dataset, go to “Data” tab.

data

Whenever you click on data, there would be many files with their extension like .csv, .py.

We will use data in “train.csv” file to train a neural network to recognize digits. Click on to “train.csv” file to download it.

train.csv

Go back in to the azureml page, we need to upload that .csv file as a data set as in the following figure.

Create a dataset, click on it and click on to “New” button.

New

It will ask for uploading from local file, so just click and upload that .csv file.

click

Choose the .csv data file from your location where you downloaded it. Enter a name for the Dataset and click on to check button to create a dataset.

dataset

After clicking on check button, a new dataset created; in my norms my dataset name is “My Training DataSet”. Now we need to create an experiment.

Click on “EXPERIMENTS” and create a new experiment.

From the samples click on “Blank Experiment”.

It will show a welcome page of the experiment like the following screenshot. Here, you can drag and drop items from the left side of the menu.

menu

Under the saved datasets you can see your created dataset “My training DataSet”, drag and drop this dataset in to the right side of the page area.

My training Dataset

Under the Data Transformation, expand the Manipulation and click on to “Project Columns”. You can search it directly from Search box.

Drag and drop select columns in to the working area. You can see the "Values required" as red mark.

rquiered

You can check all the information of your dataset, right click on to it and select “Visualize” .

There are 891 rows and 12 columns in “My training dataset”, which I have created in a titanic way.

dataset

Connect your dataset to project columns as a relationship,

project

Click on to project columns to launch the column selector where you can select the desired values.

values

After clicking on launch column selector, the following pop up will open. Select columns from the left side of the page which you want to see in project columns, after selecting the desired columns click on the check button.

button

In some cases we are missing some data like age of some people, so in this situation we can use the clean missing data. Search the clean missing data facility from search box, drag and drop in the page.

page

Again connect your output to the input.

Then I need columns to be cleaned, select clean missing data and in the right side of the page click on "Launch column selector".

launch

We want to select age from all the columns, click column names and in dropdown menu select age. Click on to Next button to complete the action.

The cleaning mode will replace with median as in the following figure,

mode

The next thing to do, we want to split data. Just go into search box and search it. Then, connect the clean missing data box to the Split Data box. Click on the Split Data box and you will see that you can set a few parameters for it on the right of the screen. Enter “0.7” for “fraction of rows”.

rows

Now, Find and drop “Train Model” from the menu. Connect the first data set from the Split Data to the second dataset of Train Model.

In train model there is a red mark, because of train model does not know how many columns we are going to use, so we need to tell it that. Click on Train Model and then “Launch column selector” from the right pane.

pane

Start typing “Survived” into the column name box. A suggestion box will open. Select the first item named “Survived” in the list and click on check mark.

mark

Now, I’m going to drop multiclass decision forest in to the working page, connect it with train model.

model

We have taken the necessary steps to train a model from our training data. Before going on with performance evaluation, let’s run our experiment to verify if it works or not.

Click on “Run” button at bottom of the page.

Run

You will see that, one by one, green check boxes appear on the boxes on the working area. This means a calculation is performed inside the box and its results are ready for the next box to use.

For producing a model structure, right click on to Train Model and select “Visualize”.

Now, the “Train Model” box contains a trained model.

Now we need to drop a score model in the page to test our model to see how it performs. So go in to search box and type score model and drop it in to the working page.

Connect Train Model with first input of Score Model and the second output from Split Data to Score Model.

model

Now run your score model.

model

Right click on Score Model and Visualize it.

Model

The result is shown in the following screenshot, where lowest fare is of Pclass.

Pclass

Now add an Evaluate Model, and connect it from Score Model, Click on to Run button.

Run

Right click on Evaluate Model and click on “Visualize”.

After this, in the following screenshot there are many types of accuracy completed tasks.

Under Metrics, you can see how well the algorithm performed by checking out “Overall Accuracy”. The value here shows what percentage of the digits in the cross validation set we have correctly predicted. In our case, it is somewhere around 67 percent. This means that we have learned a model that can be used to recognize handwritten digits with 67 percent accuracy.

67accuracy

I have tried to show how you can easily use Azure Machine Learning Studio for basic easy tasks.

I hope you enjoyed this article. Stay tuned with me for more article on Azure and other Microsoft technologies.

Thanks.

Connect (“Nitin Pandit”);