Handwriting Recognition

In this article, we’ll learn about using the Automated ML in Azure to solve a classification problem of Handwritten Digit Recognition. We’ll use the machine learning workspace in Azure and the Automated ML functionality to experiment automatically with numerous algorithms and hyperparameters to obtain the best possible output with high accuracy.  

Azure Machine Learning 

The Azure Machine Learning enriches and consolidates the functionalities to support model training and deployment which transitions from Machine Learning Studio. It provides tools for Machine Learning works for all skill levels, provides an open and interoperable framework with support to different languages, and enables robust end-to-end MLOps. It also supports Automated Machine Learning. Read this article Auto ML to learn more about it. 

Automated Machine Learning 

Automated Machine Learning (Auto ML) refers to automating the machine learning model development process which is mostly iterative and extremely time-consuming which enables developers, analysts, and data scientists to build highly scalable, efficient, and productive Machine Learning Models. Azure provides the feature of Auto ML which makes it easier to obtain production-ready Machine Learning Models without having to spend much time. Dozens of Models can be created and compared at the same time with the accurate ones to be decided for usage. 

Classification 

Classification is one of the supervised learning approaches to classify the data into a specific category. The system to classify data into spam, fraud detection, object detection are usually classification problems. 

Handwriting Recognition 

Handwriting Recognition is the process of interpreting input sources from different forms of documents such as paper, photographs and even touch screens into intelligible human languages.  

Here, we’re going to work with Azure to use Auto ML functionality in Azure to identify the handwritten digits through classification in images. Let us get started with it.  

Handwriting Recognition – Auto ML in Azure 

Pre-requisite 

First, of all you’d need to create a Machine learning Workspace in Azure following the article, Azure Machine Learning - Create Workspace For Machine Learning

Step 1 

Visit your machine learning workspace you created following the pre-requisite article.  

Step 2 

Here, click on Launch Studio.  

You’ll be directed to the welcome page of the Azure Machine Learning Studio.  

Step 3 

Here, click on Go to Workspace.  

Now, on the left-hand side menu, click on Automated ML.  

Step 4 

Now, click the New Automated ML run button.  

Step 5 

Now, we’ll setup the Automated ML run.  

First of all, we need to choose the dataset.  

Click on Create dataset.  

As you can see, we have multiple options here. For now as a learning process, we’ll take dataset from the Open Datasets.  

Here, we can search for the MNIST database of handwritten digits. Click on the option.  

 

Now, Click on next.  

Here, we name the dataset and filter the options. Select All – include train dataset and test dataset for Subset and Tabular for Register option.  

Once, this is done, click on Create.  

Step 6 

Now, we have a pop-up of the success of the dataset creation. We also have the link to access the dataset.  

Here, we can view the details of the dataset we selected.  

Step 7 

Now, on the Automated ML run details page, select the Dataset name and click on Next.  

Under the Configuration setup for the run, we create a new Experiment.  

Firstly, preview the Dataset.  

 


Next, name your Experiment and Select the Target Column which is basically what the Automated ML in Azure will be trained to predict.  

Step 8

Next, Click on New under the Compute Cluster.  

Set the Virtual Machine Tier to Dedicated and Virtual Machine Type as CPU which is sufficient for this case. For demanding needs, it is wiser to choose GPU later on in future.  

Now, For the VM size, we choose a Standard_DS12_v2 which is memory optimized and can work for datasets of size from 1 to 10GB with 6 cores. This consist of 26GB of RAM and 56GB of memory storage and costs around $0.30 per hour. It is quite expensive compared to other VM offerings, but this will make our task easier and faster to run and complete too.  

Once all is set, click on Next which will take up to the Advanced Settings page.  

Step 9 

Here, we setup the number of nodes and idle seconds before scale down. Leave it default and 120 for the scale down which is a sufficient amount of time to secure our costs too.  

With this, click on Next.  

We’ll be notified as the cluster is setup.  

Step 10 

Now, on the Auto ML configuration page, select the cluster you just create. Mine is the ojashcluster and click on Next.  

Classification 

Step 11 

Here, under the Select Task and Settings, we select the Classification as the handwritten digit recognition is basically a classification problem.  

Also click on Enable Deep Learning which will also help in featuring our text data and support better accuracy.  

Validation and Testing 

Step 12 

Now, the integral part of validation and testing as we’ve discussed in our previous articles such as Machine Learning Workflow.  

Here, we set our Validation Data Percentage to 20 and Test Data percentage to 10.  

Once done, click on Finish.  

Step 13 

Now, the automated run will be created as the data is validated.  

Following up, we can see the automatic ml run notification for our handwritten digit recognition.  

We can see the update in the Properties. From Status as Not Started to Setting up the run, Model Training and so on it continues.  

Step 14 

Under the Models in Automated ML, we can see the algorithms that are implemented and its result with different hypermeters setup.  

We can also explore the numerous child runs for each individual tasks too.  

Step 15 

As the run continues, we can see the increase in the number of algorithms that are tested with better results adding up.  

Step 16 

Now, as the run is completed, we are notified about it in the Status. 

On the left, we have the Best Model Summary which picks up the best Algorithms for this task with the specific hyperparameters and AUC weights.  

Step 17 

We can now visit the metrics to visualize how the numerous prediction parameters suggest.  

Also, we can explore the confusion matrix and see the summary of the performance.  

Thus, as we can see, we run the algorithms under the automated ML for over one and half hour and obtained AUC weighted of over 99 percentiles. We can see our tasks went through numerous algorithms such as the LightGBM, XGBoost Classifier and numerous more with variations in the hyperparameter.  

Step 18 

Lastly, as all the task are completed, make sure to delete all the resources to save yourself from any additional charges to incur. You can do this by deleting the main resource group as a whole and deleting all the other resources within it. Visit the resource group in Azure and click on Delete resource group.  

Conclusion 

Thus, in this article, we learned about using the Automated ML for Classification problem of handwritten digit recognition. In one and half hour of run, we obtained and AUC weighted of 0.99945 with MaxAbsScaler, LightGBM which is splendid. Comparing it to having to test each algorithms manually, this is a thousand times fold better where the Automated ML in Azure took care of experimenting with numerous algorithms with different variations in the hyperparameter and it barely even cost us a dollar for almost 2 hours of the experiment.  


Similar Articles