The field of machine learning is full of exciting research and development. The news is abuzz with the latest developments in fields such as self-driving cars, robotics, and facial recognition. But when it comes to the more practical side of machine learning—deploying models to production, monitoring them, and post-deployment troubleshooting—resources are much harder to find.
This is one of the reasons I reached out to Luigi Patruno, founder of the blog and newsletter ML in Production. Patruno created his blog and newsletter to establish a hub for content dedicated to building and monitoring machine learning systems. Outside of ML in Production, Patruno also works as the director of data science at 2U, an education technology company that helps universities bring their academic programs online.
In this interview, we talk about why resources for production ML are so hard to find, the importance of monitoring machine learning systems, advice for project managers embarking on machine learning projects, and challenges for businesses and data scientists.
Why is it hard to find resources for monitoring machine learning models?
First of all it’s a relatively new discipline. Some companies have been running machine learning systems in production for decades, but most companies are only just getting started. For the few companies that were early adopters of the technology—I’m thinking mostly ad tech and advertising systems—there was really no need to democratize the information. And as more companies have begun adopting the technologies, there are relatively few players who know how to do this well.
The majority of people who do know how to do this well are spread out across various companies. So the number of people with this knowledge is quite small, the information is disparate, and there are even fewer people with this knowledge who are also writing about it.
This was one of the founding hypotheses of my newsletter, and it’s why I offer a place to get these resources directly. It’s one attempt to make it easier for data scientists, machine learning, engineers, ML product managers, software engineers and dev ops folks to get this information.
Why is it so important to monitor machine learning systems post-deployment?
I like to say that the real work starts once the model has been deployed. And the reason is that it’s the first time you’re actually testing your models.
What I mean is that prior to deploying your model in the real world, that model has only seen historical data. You’ve trained it on historical datasets and validated it on other historical datasets, and everything was in a sandbox environment. You know everything the model sees so you can improve the model based on those datasets. In order to properly validate your model, you need a good validation scheme that really mirrors what the post-production world looks like.
For instance, let’s say you’re training a customer segmentation model where you need to offer predictions on future data. You have historical data on your customers, but now you want to be able to evaluate new customers or new potential customers. Typically, you set up your validation scheme to mirror that process. Maybe you train on two year’s worth of past data, and you validate on the following year. You know what the data looks like, but once you deploy your model, things totally change.
It’s difficult to know exactly what type of data is coming in. Ideally, your data will match the training datasets you used because the new data coming in should match the distribution of your previous data. But if that’s not the case, then you need to consider retraining your model or coming up with a new model architecture.
So that’s what I mean about the work really starting after deployment. That’s when you really start seeing whether the model is doing what you want. Post deployment is the first time where the generalization of your model is actually tested. If your models don’t generalize then they’re useless, because you just don’t know what data is coming in.
Can you talk more about the importance of understanding and responding to how a model works with real world data?
Suppose you’ve trained on a certain set of data that you’ve deemed to be representative. You have certain valuation metrics, and before deployment you worked with your product management team and established a threshold of predictive performance that you need to hit in order to deploy. So you work on that problem and finally reach that level of predictive performance, but when you deploy your models new data starts coming in that you haven’t previously included in your training set for whatever reason. One possible reason is that the world has changed and now the distribution has shifted, and there was nothing you could have done about that during the experimental phase.
Even after deploying a model, you need some mechanism in order to detect those shifts, because once you detect them you can take corrective action. If you’re unable to detect it, your model will continue to spit out possibly useless results, or even counterproductive results that can be harmful to your business or situation.
So the monitoring piece is really critical because you need to know when it’s time to take corrective action. Machine learning is really interesting because unlike typical if-then software—which will usually throw up an error when it happens—in machine learning your model will continue to spit out predictions; you don’t have a red flag going up in the air telling you there’s a problem to fix.
Do you see a shift towards post-deployment strategies as more businesses deploy machine learning models?
I believe that more and more companies are going to start planning for those stages before implementation. Implementation is really expensive when it comes to machine learning, because a priori you just don’t know what results you’re going to get. It may be pretty easy to get to 70 or 80% accuracy, but getting to 90% accuracy, or to 99%, is going to get exponentially more expensive.
So if you don’t understand exactly what level of accuracy you need to get to, or you don’t have an understanding of how quickly you can get there—and I believe it’s possible to have an understanding in most cases—then you need to understand what your system will look like when you finally deploy it.
If you don’t have that understanding, then you may not be able to even justify the investment to build a model. It’s critical to understand what metrics you need to drive, what business outcomes you want to achieve, and how the model and the systems you build are driving those things.
What general advice would you give to a company looking to implement a machine learning system? What basic points can project managers keep in mind to ensure a system works?
The first thing is to make sure you have in your mind what the goal of your system is, because oftentimes the goal of the system and the goals of the specific models are slightly different. And not because of major misalignments, but because the metrics that you’re optimizing for when you’re building a model are different than the metrics you care about at the business level.
So for example, at the business level you might care about user engagement with your system. That’s your north star metric. But then when it comes to tuning the model, you actually care about things like increasing the accuracy or the AUC of your specific model.
So the metrics that you care about are a little different at each level, and there has to be some good work done to translate the north star business metrics into your machine learning metrics. So at the root level, understand what the goals of your system are, and from there translate those into a good set of metrics that you can monitor easily.
Once you’re there and you’re training your first version of the model, train something simple. Some companies and teams rush to build complex models because it’s sort of sexy, but that can actually lead to a lot of potential issues when it comes to debugging the model. That can become really difficult.
So start with the business goal, figure out how to measure that goal, and then start with a really simple model that’s easy to understand and simple to debug. Then get your pipeline right. Once you have that running and it’s running well, you’ll be able to see which part of your pipeline and which part of your systems are under performant. You can start optimizing from there.
Your ML course covers similar material. Can you tell me more about who it’s for and why you started it?
A couple of years ago, if you wanted to build models, deploy them, and monitor them, you needed to be a machine learning specialist. Then you needed to be a software engineer specialist in order to write the web API for what services to deploy. Then you needed to be a dev ops specialist in order to bring to life the necessary infrastructure underlying all these systems.
But today it’s not really necessary to be a specialist in each of these areas. There are platforms that allow you to leverage technology to bring your average level of skill up so you can take the models you’ve trained on your laptop and deploy them into the real world in a scalable way. One of these tools is Amazon SageMaker from AWS.
I really like SageMaker because it gives you the ability to do a lot of these different functionalities in a really scalable and encapsulated way. So the point of my course is to teach you how to build, deploy, and monitor machine learning models once they get to production with Amazon SageMaker.
The target audience for my course includes data scientists who are responsible for doing things in an end to end way, software engineers and dev ops people working tangentially with ML who want to learn more about it, and machine learning engineers whose responsibility it is to build, deploy, and monitor machine learning models.
Finally, what do you see as the biggest challenges facing companies and data scientists looking to implement AI processes or solutions?
I think one of the biggest challenges is that you’re not guaranteed any particular outcomes when you embark on a machine learning project. So you really need to invest early in understanding how feasible a project is going to be. Then you need buy-in from upper management to understand that the investments you’re putting into a project may not actually turn into a lot of value because you don’t know what you’re going to see until you start looking at the data.
So you can think about this sort of problem along two axes. One is how much value can solving a problem bring for the company. The second is feasibility, or how feasible the solution is. So if you’re working on highly feasible but low value problems, you’re going to spend a lot of money building things that don’t really matter. But if you’re working on high value problems where the feasibility is very low, then you’ll never get the outcome you’re looking for because it’s just going to be too expensive. So the challenge is to pick projects that are in certain places along this two by two matrix.
And if you have a couple of them, then you can start to take bets. What I do on my team is I try to think about each different machine learning project as a different bet, and I try to allocate resources to these projects based on my belief as to the probability of success in each of them. From there, once you start seeing certain results, you can change the allocations of your resources in order to bet more on certain projects.
If you’d like to follow more of Luigi Patruno’s work, be sure to check out ML in Production here. Details on his machine learning course are available here. If you’d like to read more interviews with experts and professionals in the field of AI and machine learning, be sure to check out the related resources section below, and sign up to the Lionbridge AI newsletter for interviews and news delivered directly to your inbox.