Amazon Launches Open-Source SageMaker XGBoost Algorithm Container

XGBoost is a famous and effective ML algorithm for regression and classification tasks.

Amazon announced the open-source SageMaker XGBoost algorithm container, which brings increased flexibility, scalability, extensibility, and Managed Spot Training.
 
Well, XGBoost is a popular and efficient machine learning algorithm for regression and classification tasks on tabular datasets. It makes use of a technique known as gradient boosting on trees and performs remarkably well in ML competitions. Amazon SageMaker has supported XGBoost, since its laucnh, as a built-in managed algorithm.
 
The open-source XGBoost container supports the latest XGBoost 1.0 release and all improvements. This includes better performance scaling on multi-core instances and improved stability for distributed training.
 
 
 
The new script mode enables you to customize or use your own training script. You can add in custom pre- or post-processing logic, run additional steps after the training process, or take advantage of the full range of XGBoost functions like cross-validation support.
 
The container brings a more efficient implementation of distributed training. It can scale out to more instances and reduces out-of-memory errors. And, as the container is open source, you can extend, fork, or modify the algorithm to suit your needs, beyond using the script mode.
 
Amazon said that Managed Spot Training support saves up to 90% on your Amazon SageMaker XGBoost training jobs. This fully managed option allows data scientists take advantage of unused compute capacity in the AWS Cloud. SageMaker manages the Spot Instances itself so you don’t have to worry about polling for capacity. Not only this, the new version of XGBoost automatically manages checkpoints and makes sure that job finishes reliably.
 
XGBoost now also includes support for Parquet and Recordio-protobuf input formats.
To learn more, you can visit the official announcement here.