Introduction To AWS SageMaker

Ojash Shrestha
2y
7.6k
0
1

Article

In this article, we’ll learn about AWS SageMaker and various tools provided by it for Machine Learning purposes. The article describes the ways the machine learning workflow is supported by different tools of SageMaker, SageMaker Instances, Availability Zones for SageMaker, and Instances that can be used for Deep Learning. This will give us a brief overview of Amazon SageMaker as a whole.

Amazon SageMaker

Amazon SageMaker is a cloud platform dedicated to artificial intelligence, machine learning, and deep learning which enables creating, training, tuning, and deploying models for machine learning in the cloud. Large-scale machine learning models can be managed easily with the Amazon SageMaker. It provides numerous tools to simplify the machine learning workflow elaborately discussed in the previous article, Machine Learning Workflow.

The primary components of Machine Learning Workflow are,

Exploration and Processing of data
Modeling
Deployment

Exploration and Processing of data

As discussed in the previous article, we know that in this step the data are retrieved, cleaned, and explored. Moreover, the preparation and transformation of data fall in this step too. Amazon SageMaker enables this process with different tools such as Ground Truth and Notebook for these purposes.

Modeling

Models are trained and developed during the modeling process. The validation and evaluation also fall under this step. Modeling is the process of using mathematical models in order to generate predictions so that patterns can be identified. This is one of the key processes in Machine Learning Workflow. To learn more about the Machine Learning Workflow, check out the previous article, Machine Learning Workflow And Its Methods Of Deployment

Deployment

The models are deployed to production using the Amazon SageMaker. Even the monitoring and updating of the model and the data is enabled by Sagemaker.

The plethora of tools are provided by Amazon SageMaker used in the Machine Learning Workflow. Let us discuss them.

Ground Truth

Ground Truth in Amazon SageMaker is used to label the jobs, datasets, and workforces. Ground Truth helps to fully manage the label serve to make it extremely easy to build training datasets with high accuracy for various machine learning applications. Moreover, automatic data labeling is also enabled by the Ground Truth.

Notebook

The Notebook tools in Amazon SageMaker help create Jupyter notebook instances, attach Git repositories and configure the lifecycle of the notebooks. The end-to-end workflow with the notebook, training, and hosting environment is supported by the Amazon SageMaker. Howsoever, if one wants to use their pre-existing tools outside the SageMaker, an easy transfer process for the results in and out at different stages is made accessible as per the requirements of the businesses.

Training

The Training for different models is supported by SageMaker. Choose the Machine Learning algorithms, defining the training jobs, and tuning the hyperparameter can be done easily. The data can be prepared, models can be built, trained, and deployed seamlessly with SageMaker.

Inference

The compiling and configuring of the trained models and endpoints for deployments can be performed. Moreover, inferences can be performed too with services like elastic inference making it even more economical with optimization of the GPU used for the computation for inferences.

SageMaker Instances

SageMaker Instances can be understood as the dedicated Virtual Machines which are highly optimized in order to fit the multitudes of machine learning use cases. The combinations of CPU, GPU, primary memory, GPU memory, and network capacity characterizes the instance type. Howsoever, it is important to realize that the instance types of SageMaker, the names, and prices differ from that of the EC2.

Availability Zones and Instance Types

Different varieties of instance types are offered by Amazon SageMaker. Moreover, depending upon the Amazon Web Servies Regions and the Availability Zones, the SageMaker instance type support varies. These list AWS Regions that also support the Amazon SageMaker can be viewed below,

Region Name	Code
US East (Ohio)	us-east-2
US East (N. Virginia)	us-east-1
US West (N. California)	us-west-1
US West (Oregon)	us-west-2
Africa (Cape Town)	af-south-1
Asia Pacific (Hong Kong)	ap-east-1
Asia Pacific (Mumbai)	ap-south-1
Asia Pacific (Osaka)	ap-northeast-3
Asia Pacific (Seoul)	ap-northeast-2
Asia Pacific (Singapore)	ap-southeast-1
Asia Pacific (Sydney)	ap-southeast-2
Asia Pacific (Tokyo)	ap-northeast-1
Canada (Central)	ca-central-1
China (Beijing)	cn-north-1
China (Ningxia)	cn-northwest-1
Europe (Frankfurt)	eu-central-1
Europe (Ireland)	eu-west-1
Europe (London)	eu-west-2
Europe (Milan)	eu-south-1
Europe (Paris)	eu-west-3
Europe (Stockholm)	eu-north-1
Middle East (Bahrain)	me-south-1
South America (São Paulo)	sa-east-1

More about the supported ML Instance types from Standard Instances to Compute Optimized and Accelerated Computing can be view from this official documentation of AWS with the elaborate pricing structure.

Instances for Deep Learning

If we are working on a deep learning project, we would require instances for deep learning. The table below describes the use of specific instances that could be used in order to accomplish the goal for different segments of the machine learning workflow.

SageMaker Instance	vCPU	GPU Memory	Primary Memory	Performance of Network	Use Case Examples	Quota Limit (Default)
ml.t2.medium	2	-	4	Low to Moderate	Run the Notebooks	0-20
ml.m4.xlarge	4	-	16	High	Training, Batch Transformation using different models, Deploying models	0-20
ml.p2.xlarge	4	12	61	High	Training, Batch Transformation for GPU accelerated Model in PyTorch	0-1

Depending upon various work scenarios and demand, different SageMaker Instances are used for multiple purposes. The table above describes the SageMaker Instance with the specific size of different memory allocated to it, the performance of the network of these instances the use cases examples.

The Amazon SageMaker also provides other numerous services with the SageMaker Studio such as SageMaker Pipelines, SageMaker Autopilot, SageMaker Experiments, SageMaker Debugger, SageMaker Model Monitor, SageMaker Clarify, SageMaker JumpStart, and many more. Most of these services are similar to ones provided in Microsoft Azure ML Studio as described in the previous article, Microsoft Azure AI Fundamentals. In a future article, we’ll learn more about these Amazon SageMaker Services.

Conclusion

In this article, we learnt what Amazon SageMaker is, the primary components of Machine Learning Workflow supported by the SageMaker for end-to-end process and the different tools available in Amazon SageMaker to make this possible. We also learned about SageMaker Instances, the Availability Zones and gave an example for the instances that are used for different usage for different segments of deep learning projects.