Modeling And Deployment Of Machine Learning Application

In this article, we’ll learn about the various properties of modeling and deployment of machine learning applications. We’ll explore Hyperparameters, Model Versioning, Model Monitoring, Model Updating, Model Routing, and Model Predictions. Moreover, we'll also dive into on-demand predictions and batch predictions.


A machine learning model can be understood as the file which is trained with sets of data using algorithms such that it becomes capable to recognize specific types of patterns. Modeling is the process of using mathematical models in order to generate predictions so that patterns can be identified. This is one of the key processes in Machine Learning Workflow. To learn more about the Machine Learning Workflow, check out the previous article, Machine Learning Workflow And Its Methods Of Deployment.


Deployment to production can be understood as the process of integrating machine learning models into a production environment such that decisions and predictions can be made based on the input data of the model. 

Production Environment

Known as Production or the Production Environment, it refers to the location where the application is live for the public and the intended users. It can be understood as any form of software application ranging from web to mobile that is in usage for numerous people and it is critical that it responds as soon as possible to the requests of these users.

This article talks about some of the properties of modeling and deployment for machine learning applications. Learning these concepts will help one familiarize with the understanding of various features that are made available in different cloud platform services for Machine Learning from AWS Sagemaker to Google Cloud Platform and Microsoft Azure. These services will minimize the work required with the functionality provided by the platforms that can be integrated instead of having to implement our own handwritten code.

Properties of Modeling


Hyperparameter in machine learning can be understood as the specific parameter for which estimation of value is not possible from the data. As the hyperparameter value cannot be directly learned from the estimators, it is required that the model developers set the value for the hyperparameter. This clearly showcases how integral hyperparameter tuning is for improving optimization during model training. Moreover, various cloud platform services provide the methods that enable automatic hyperparameters tuning for usage with the model training. In case, the machine learning cloud service is unable to provide the automatic hyperparameter option, using the methods of Scikit-learn Python Library for the hyperparameter tuning would always be a possibility.


Scikit-learn provides freely available resources of machine learning modules as a library for Python Programming language with support for numerous algorithms for classification, clustering, regression, support vector machines, and more. Moreover, it helps for hyperparameter tuning with the methods included in the Python Library.

Properties of Deployment

The properties of deployment can be listed as follows.

Model Versioning

Model versioning is one of the characteristics of deployment for any machine learning model. Maintaining the version of the model for deployment is essential for future reference as well as to support other properties of deployment. More than just allowing to save the model version as a component of the metadata of the model in the database, allowing the indication of the deployed model version is important by the deployment platform. This enables maintaining, monitoring, and updating the deployed model easier for the future.

Model Monitoring

Model Monitoring is another property of deployment which enables easy monitoring for the models that are deployed. It is of utmost importance to keep in check that the model that is deployed is functioning above the threshold criteria of the performance metrics. Else a better performing model must be updated to meet the requirement of the application in time. Model Monitoring helps us achieve that.

Model Updating and Routing

Another property/ characteristic of deployment for machine learning applications is Model Updating and Routing. Updating deployed model should be as easy as possible. When the deployed model fails to meets the performance metrics, an update of the model is necessary. When the input data attained for the predictions in the model have some fundamental changes, these input data must be collected to be further used for updating the model.

Moreover, routing of diverse proportions of user requests to the model that is deployed must be supported by the deployment platform so that comparisons of the performance in between the deployed model variants can be made. Routing in this format enables testing of the model performance in comparison to the rest of the model variants.

Model Predictions

The type of predictions that are provided by the deployed model is one of the properties of deployment. There are mainly two popularly used types of predictions.

  • On-demand Predictions 
  • Batch Predictions 

On-demand Predictions

The on-demand predictions can be understood as real-time synchronous predictions. 

Each prediction request will require low latency for a response while providing high variability in the request volume with the on-demand prediction. In response to the requests, predictions are returned. The requests and responses are achieved through the API using JSON and XML formatted strings. You can learn more about the communication between models and applications, from the previous article, How Is The Communication Between Machine Learning Models And Applications Performed In Production Environment. One or more requests for the prediction can be contained in each of the prediction requests. However, the limit of the request would be dependent upon the size of the data that are sent as request. The demand prediction request size limit for ML Engine of Google Cloud Platform is 1.5MB and for SageMaker of AWS is 5MB.

The on-demand predictions are mostly used to enable users, customers, and employees with real-time responses online which is based upon the model that is deployed. For machine learning-enabled recommendation systems in Youtube and Amazon, on-demand prediction requests are used in the web application.

Batch Predictions

Batch predictions are also known as asynchronous and batch-based predictions which enable a high volume of requests with periodic submissions such that latency is never an issue. The batch requests point to the formatted data file of requests and then return the predictions to the file. These files are stored in the cloud of the cloud provider. Based upon the limits that are imposed on the size of files that can be stored on the cloud storage services in different cloud platforms, there is a limitation on how much data are processed in each batch request. For an instance, the size limit enforced on the object in the S3 Storage device in Amazon is the limit of batch prediction in the Amazon Sagemaker.

These batch predictions mostly support business decision-making. When providing services to a convoluted model such as in cases of prediction of customer satisfaction across hundreds of products which require the estimation for weekly reports on a weekly basis, batch prediction is performed on the processing of customer data.


Thus, in this article, we learned about the various properties of the modeling and deployment of machine learning applications. We learned about Hyperparameters and Scikit-learn to attain hyperparameter tuning and various characteristics of deployment such as Model Versioning, Model Monitoring, Model Updating, Model Routing, and Model Predictions. We also learned about on-demand predictions and batch predictions.