Using Amazon Sagemaker for Scalable Machine Learning Training

Using Sagemaker

Not long ago, Amazon unveiled Sagemaker, their machine learning training and deployment infrastructure. To understand why it might be useful, its worth considering the current difficulties of scaling out machine learning services to the cloud. The wild successes of deep learning have increased its demand and taught people to demand its accuracy, which is not easy to achieve without troves of data and the GPU-backed, distributed training platforms to ingest them.

A few services out there attempt to help you evade this data barrier: Google ML Engine (and soon AutoML) allow you to train and deploy custom Tensorflow models on Google’s cloud infrastructure, Azure has an anaytics platform, BitFusion is trying to help distribute GPUs across cloud providers. There isn’t exactly a mad-dash to become the AWS of machine learning, but there is a healthy competition. That being said, there already is an AWS of machine learning: AWS.

Prebuilt or Custom Models

Sagemaker provides a number of supervised training models like XGBoost, regression, classification, all out-of-the-box and ready to train. They also offer unsupervised algorithms like principal component analysis and k-mean to fill out traditional analytics algorithms. These prebuilt containers allow you to specify the data on which you want to train your model and just let it go. But the more powerful (in this author’s opinion) feature is the ability to spin up Docker containers of your own creation, using custom models in any framework you like, be it caffe2 or Tensorflow or Theano. It injects your training data into your container at the start of the job and saves your models when you are done. The rest is up to your runtime environment.

Sagemaker also has nvidia-docker support built in, meaning you can spin up GPU instances for training sessions. For those on smaller budgets, the ability to launch costly GPU services for only the amount of time needed for training is incredibly handy.

This all amounts to a powerful, albeit small and new, tool. It allows your machine learning training scripts to become stateless machines into which new data can be fed and out of which come new models. It can even inject saved models from your training jobs into inference containers, which are spun up as REST servers, for immediate inference.

These features integrate nicely with current AWS SDKs like boto3. At Filestack, we utilize Sagemaker for custom deploy and start running inference. As ML services become more in demand, so too will tools that apply this decade’s lessons in scalable cloud architecture to the new era of cloud-based AI.

 

Read More →