In this long read we look at SageMaker from Amazon. Packt authors Julien Simon and Francesco Pochetti (of Learn Amazon SageMaker) talk you through the cloud machine learning platform and how to use AWS infrastructure for developing high quality and low cost machine learning models.
If I had to pick a single source of frustration coming from the Machine Learning world, nowadays, without doubt I would mention the break necking speed at which everything around this domain evolves: algorithms, infrastructure, frameworks, best practices. By the time you experiment with a technique, a new method smashes the state-of-the-art.
By the time you refactor that computer vision pipeline, your library of choice gets a major upgrade. Let alone all the headaches coming with shipping actual models to actual customers in production. Add debugging, scaling and monitoring to the list.
The way to cope with the ML madness can be summarized in two points: being up-to-date and having the right set of tools.
Once again, it is a matter of handling speed efficiently. You want to be on top of the latest advances in the field, in order to jump quickly from one technique to another. To do that effectively, though, you want to be using the right set of tools. Streamline your experimentation’s pipelines, shorten the time from development to production, scale projects removing all the hassle of infrastructure maintenance, setup and updates. If you think this is what any Machine-Learning-oriented organization deserves, then AWS and SageMaker are what you are looking for.
Having worked with SageMaker myself, I can bring my testimony to the table. The completeness of the product is truly impressive: the team got your back for all the steps of the ML pipeline. From data ingestion by seamless integration with all the AWS storage solutions, to rapid prototyping within the familiar Jupyter notebooks. From automated data labelling, to automated model debugging. From hyper-parameter tuning and experiment handling, to the breadth of services supporting the deployment stage.
Off-the-shelf Docker images, A/B testing, canary deployments capabilities, features’ distribution shifts tracking. Just to name a few. SageMaker is a fully-fledged environment letting practitioners hit the ground running.
You still might wonder why you need Machine Learning at all. You have your rule-based systems in place. You master them inside out, and they are driving that nice uplift your company is currently enjoying. This is true. Thing is, it will not last forever.
Rules are good starting points, they are simple to debug, and provide tangible benefits almost immediately. They are not easy to adapt to a rapidly evolving market, though.
Eventually, the initial uplift will start shrinking and the competitive advantage to fade out. That is when you realize you need to play smarter. The patterns a human spots in the data, the same that drove a rule-based approach in the first place, win in the short term.
However, in the long run, you must increase the level of abstraction, and try removing the human touch from the workflow as much as possible. Welcome Machine Learning to the stage. The scale, efficiency, and financial gains reached by employing modern statistical learning strategies are almost limitless. The “almost” part of the story is generally driven by the technical debt slowing down Data Science projects. Which, once again, is why you imperatively need the right tools, SageMaker being one of those.
I still hear people pointing out that they are neither engineers nor scientists. They are analysts, more or less technical product managers. Business people to make it short. Do these individuals need to hear all the ML fuss? Yes, they do. Statistical learning is not only about building Netflix’s recommendation engine from the ground up, or shipping Tesla’s autonomous vehicles driving system. Those are impressive but still niche examples of the power of these techniques. It turns out that, as far as I am concerned, being able to build a model is a lot more impactful in a business context. First, because of the much higher number of professionals potentially involved. Second, because you do not necessarily build a model to blindly predict A given B.
You might want to train an algorithm to model the interaction between A and B, their interdependencies, the process bridging them. In a nutshell, you train an algorithm to gain insights. This is what data-driven decision-making is all about, and what every moderately technical business person should pursue as a must. For this to happen, we need to democratize the access to Machine Learning, of course. Demystify the black box. Online learning resources are nowadays widely available to cover for the scientific part. What about the infrastructural side of the story instead? This is where AWS and SageMaker come to the rescue, bridging the gap between the product manager analyzing quarterly financial results, and the research scientist shipping self-flying drones. A single environment to win them all.
Machine learning (ML) practitioners use a large collection of tools in the course of their projects: open source libraries, deep learning frameworks, and more. In addition, they often have to write their own tools for automation and orchestration. Managing these tools and their underlying infrastructure is time-consuming and error-prone. This is the very problem that Amazon SageMaker was designed to address. Amazon SageMaker is a fully managed service that helps you quickly build and deploy ML models.
Whether you’re just beginning with ML or you’re an experienced practitioner, you’ll find SageMaker features to improve the agility of your workflows, as well as the performance of your models. You’ll be able to focus 100% on the ML problem at hand, without spending any time installing, managing, and scaling ML tools and infrastructure.
Amazon SageMaker was launched at AWS re:Invent 2017. Since then, a lot of new features have been added. At the core of Amazon SageMaker is the ability to build, train, optimize, and deploy models on fully managed infrastructure, and at any scale. This lets you focus on studying and solving the ML problem at hand, instead of spending time and resources on building and managing infrastructure. Simply put, you can go from building to training to deploying more quickly. Let’s zoom in on each step and highlight relevant SageMaker capabilities.
Amazon SageMaker provides you with two development environments:
- Notebook instances: Fully managed Amazon EC2 instances that come preinstalled with the most popular tools and libraries: Jupyter, Anaconda, and so on.
- Amazon SageMaker Studio: A full-fledged integrated development environmentfor ML projects.
When it comes to experimenting with algorithms, you can choose from the following:
- A collection of 17 built-in algorithms for ML and deep learning, already implemented and optimized to run efficiently on AWS. No ML code to write!
- A collection of built-in open source frameworks (TensorFlow, PyTorch, Apache MXNet, scikit-learn, and more), where you simply bring your own code.
- Your own code running in your own container: custom Python, R, C++, Java, and so on.
- Algorithms and pretrained models from AWS Marketplace for ML
In addition, Amazon SageMaker Autopilot uses AutoML to automatically build, train, and optimize models without the need to write a single line of ML code.
Amazon SageMaker also includes two major capabilities that help with building and preparing datasets:
- Amazon SageMaker Ground Truth: Annotate datasets at any scale. Workflows for popular use cases are built in (image detection, entity extraction, and more), and you can implement your own. Annotation jobs can be distributed to workers that belong to private, third-party, or public workforces.
- Amazon SageMaker Processing: Run data processing and model evaluation batch jobs, using either scikit-learn or Spark.
As mentioned earlier, Amazon SageMaker takes care of provisioning and managing your training infrastructure. You’ll never spend any time managing servers, and you’ll be able to focus on ML. On top of this, SageMaker brings advanced capabilities such as the following:
- Managed storage using either Amazon S3, Amazon EFS, or Amazon FSx for Lustre depending on your performance requirements
- Managed spot training, using Amazon EC2 Spot instances for training in order to reduce costs by up to 80%
- Distributed training automatically distributes large-scale training jobs on a cluster of managed instances
- Pipe mode streams infinitely large datasets from Amazon S3 to the training instances, saving the need to copy data around
- Automatic model tuning runs hyperparameter optimization in order to deliver high-accuracy models more quickly
- Amazon SageMaker Experiments easily tracks, organizes, and compares all your SageMaker jobs
- Amazon SageMaker Debugger captures the internal model state during training, inspects it to observe how the model learns, and detects unwanted conditions that hurt accuracy.
Just as with training, Amazon SageMaker takes care of all your deployment infrastructure, and brings a slew of additional features:
- Real-time endpoints: This creates an HTTPS API that serves predictions from your model. As you would expect, autoscaling is available.
- Batch transform: This uses a model to predict data in batch mode
- Infrastructure monitoring with Amazon CloudWatch: This helps you to view real-time metrics and keep track of infrastructure performance
- Amazon SageMaker Model Monitor: This captures data sent to an endpoint, and compares it with a baseline to identify and alert on data quality issues (missing features, data drift, and more).
- Amazon SageMaker Neo: This compiles models for a specific hardware architecture, including embedded platforms, and deploys an optimized version using a lightweight runtime.
- Amazon Elastic Inference: This adds fractional GPU acceleration to CPU-based instances in order to find the best cost/performance ratio for your prediction infrastructure.
Demonstrating the strengths of Amazon SageMaker
Alice and Bob are both passionate, hardworking people who try their best to build great ML solutions. Unfortunately, a lot of things stand in their way and slow them down. In this section, let’s look at the challenges that they face in their daily projects, and how Amazon SageMaker could help them be more productive.
Solving Alice’s problems
Alice has a PhD and works in a large public research lab. She’s a trained data scientist, with a strong background in math and statistics. She spends her time on large scientific projects involving bulky datasets. Alice generally doesn’t know much about IT and infrastructure, and she honestly doesn’t care at all for these topics. Her focus is on advancing her research, and publishing papers.
For her daily work, she can rely on her own powerful (but expensive) desktop workstation. She enjoys the fact that she can work on her own, but she can only experiment with a fraction of her dataset if she wants to keep training times reasonable.
She tries to maintain the software configuration of her machine herself, as IT doesn’t know much about the esoteric tools she uses. When something goes wrong, she wastes precious hours fixing it, and that’s frustrating.
When Alice wants to run large experiments, she has to use remote servers hosted in the computing centre: a farm of very powerful multi-GPU servers, connected to a petabyte of network-attached storage. Of course, she has to share these servers with other researchers.
Every week, the team leads meet and try to prioritize projects and workloads: this is never easy, and decisions often need to be escalated to the lab director.
Let’s see how SageMaker and cloud computing can help Alice.
Launching an inexpensive SageMaker notebook instance in minutes, Alice could start running some sample notebooks, and she would quickly become familiar with the service, as it’s based on the same tools she already uses. Scaling up, she then could train her own model on a cluster of powerful GPU instances, created on demand with just a couple of lines of code. That’s more computing power than she would have ever managed using in the computing centre, and she wouldn’t have to set up anything!
Thanks to the automatic model tuning feature in SageMaker, Alice would also be able to significantly improve the accuracy of her models in just a few hours of parallel optimization. Again, doing this with her previous setup would have been impossible due to the lack of computing resources.
Deploying models would be equally straightforward: adapting a couple of lines of code found in a sample notebook, Alice would use the batch transform feature to predict her test dataset, again spending no time at all worrying about tools or infrastructure.
Last but not least, keeping track of her expenses would be easy: the AWS console would tell her how much she’s spent, which would be less than expected thanks to the on-demand nature of SageMaker infrastructure!
Solving Bob’s problems
Bob is a DevOps engineer, and he’s in charge of a large training cluster shared by a team of data scientists. They can start their distributed jobs in seconds, and it’s just simpler for Bob to manage a single cluster. Auto Scaling is set up, but capacity planning is still needed to find the right amount of EC2 instances and to optimize the cost using the right mix of Reserved, Spot, and On-Demand instances.
Bob has a weekly meeting with the team to make sure they’ll have enough instances… and they also ping him on Slack when they need extra capacity on the fly. Bob tries to automatically reduce capacity at night and on weekends when the cluster is less busy, but he’s quite sure they’re spending too much anyway. Oh, well.
Once models have been trained and validated, Bob uses Continuous Integration and Continuous Deployment (CI/CD) to deploy them automatically to the production Docker cluster. Bob maintains bespoke containers for training and prediction: libraries, dependencies, and in-house tools. That takes a bit of time, but he enjoys doing it. He just hopes that no one will ask him to do PyTorch and Apache MXNet too.
Let’s see how Bob could use SageMaker to improve his ML workflows.
As SageMaker is based on Docker containers, Bob could get rid of his bespoke containers and use their built-in counterparts. Migrating the training workloads to SageMaker would be pretty easy. This would help Bob get rid of his training cluster, and let every data scientist train completely on demand instead. With Managed Spot Training, Bob could certainly optimize training costs even more.
The data science team would quickly adopt advanced features like distributed training, Pipe mode, and automatic model tuning. This would save them a lot of time, and the team would no longer have to maintain the kludgy code they have written to implement similar features.
Of course, Alice and Bob are fictional characters. Yet, I keep meeting many customers who share some (and sometimes all) of their pain points. That may be your case too, which is why you should consider Amazon SageMaker.
Packt have a number of titles that cover machine learning and their associated tools and platforms. Browse them here or pick from some of the titles below:
Quickly build and deploy machine learning models without managing infrastructure, and improve productivity using Amazon SageMaker’s capabilities such as Amazon SageMaker Studio, Autopilot, Experiments, Debugger, and Model Monitor.
○ Build, train, and deploy machine learning models quickly using Amazon SageMaker
○ Analyze, detect, and receive alerts relating to various business problems using machine learning algorithms and techniques
○ Improve productivity by training and fine-tuning machine learning models in production
Take a comprehensive and step-by-step approach to understanding machine learning.
○ Discover how to apply the scikit-learn uniform API in all types of machine learning models
○ Understand the difference between supervised and unsupervised learning models
○ Reinforce your understanding of machine learning concepts by working on real-world examples
Applied machine learning with a solid foundation in theory. Revised and expanded for TensorFlow 2, GANs, and reinforcement learning.
○ Third edition of the bestselling, widely acclaimed Python machine learning book
○ Clear and intuitive explanations take you deep into the theory and practice of Python machine learning
○ Fully updated and expanded to cover TensorFlow 2, Generative Adversarial Network models, reinforcement learning, and best practices