Data Source And Manipulation In Azure ML

Overview

  1. Data uploading in Azure ML.
  2. Use of uploaded data in experiment.
  3. Convert data in another format using convert module.
  4. Canvas usability features.
In this article we are going to learn how to upload data and convert it in to another format using Azure ML Studio.  In Azure ML Studio tool bar, you can see a big “New+” button. Click it and it brings a new page where we need to select “Dataset” navigation menu and choose select from local file,  which allows you to upload file from a local computer. See example in image.
 
Data Source And Manipulation In Azure ML
 
Azure ML Studio supports many data type formats which are given in dropdown and auto selected based on the uploaded file format. Mostly you can see each type has two selections, one with header and one without header, to specify if weather data has header row or not. The most-used formats are CSV, TSV and zip files. Along with these, you can see plain.txt and R Objects data types.

	Data Source And Manipulation In Azure ML
 
Once it is loaded, data will be available in Dataset in left navigation menu. There you can review all uploaded datasets for past experiments and ready to use for future experiments.
 
To utilize a dataset, next you have to create a blank experiment. Again you have to click “NEW +” toolbar button and select blank experiment. On new experiment canvas you can see your uploaded dataset and sample data sets all together under Saved Data like a tree view, example is given in image.
 
Data Source And Manipulation In Azure ML
 
In Azure ML, an experiment is very similar to flowcharts in which you can easily understand data flow along with nodes.
 
Here we can see some canvas features which are useful when you are doing a big experiment. See the image for a by-the-numbers description. 
  1. User based zoom ratio input bar.
  2. 1:1 is used to auto zoom in to actual size.
  3. Zoom to Fit- use to zoom selected node to fit in screen.
Data Source And Manipulation In Azure ML
 
Now we are ready to start our experiment, you can drag and drop dataset component on canvas. You can see a node is added on canvas with a link point. In our case we have only one direction entry point as it is a starting point of the experiment. These node points are used to link data flow with other nodes or visualized datasets. To visualize dataset right click on node and select “Visualize” as per the example image.
 
	Data Source And Manipulation In Azure ML
 
Now we can see how to convert data sets in different formats. For this you can use the Convert Dataset module. Drag and drop module on canvas and link both modules for data conversion. There are many conversion formats available and those are Convert to ARFF, Convert To CSV, Convert To dataset, Convert to SVMLight, Convert to TSV. As per your need you can select data conversion component and convert it. See the example image.
 
Data Source And Manipulation In Azure ML
 
I hope you understood all covered points clearly and you can see the next portion in other blogs. Keep exploring. Happy programming!!