MLOps is becoming increasingly popular and is the topic of many talks. While many of these talks focus on explaining key MLOps concepts such as model registry, monitoring, or how specific tools work, such as how to use Azure ML and AWS Sagemaker, few address the challenges of adopting MLOps in your current setup. How to make your existing infrastructure, and the way of working, suitable for these MLOps requirements?
MLOps adoption challenges
MLOps transformation comes with numerous challenges; among them, we emphasize two.
MLOps tooling is fragmented
There are tons of tools on the market, see MAD landscape. None of them can meet all of your organization’s needs. Despite the variety of tools at your disposal, you will always have to combine several of them creatively. Even if you find a tool that offers an end-to-end solution, you still need to integrate it into your system.
2. Organisational change is required
Implementing MLOps requires organizational changes, such as hiring ML Engineers and training your platform team on what MLOps entails. You may need to purchase and implement specialized software. All of this can take a lot of time and patience.
We had all these challenges, and we survived. In this article, we would like to share with you what worked for us.
Our team
We work as machine learning engineers within a centralized team at Ahold Delhaize, one of the largest food retail companies in the world. Our focus is to support our European brands within the data science domain. Some brands have large data science teams, others do not. By operating from a centralized unit, we have the opportunity to gain insights into those brands from a broader perspective. When we started working in this global team, we had 2 main goals in mind: increase the re-usability of code between brands and boost MLOps maturity.
We have many brands with e-commerce websites, and they all want similar data science products, personalized offers, demand forecasting, and better search engines. Increasing code reusability for all brands can lead to significant savings in both time and money.
However, we can’t just copy and paste pieces of code, we require a more structured approach that allows us to share models efficiently across brands. That brings us to increasing the maturity of MLOps.
Data science teams from different brands show varying degrees of MLOps maturity, and even the most advanced teams can miss key MLOps components, such as monitoring and tracking. To measure this gap, we conducted a maturity assessment.
MLOps maturity levels
We took inspiration from Microsoft’s MLOps maturity model and Google’s definition of MLOps levels. But they are focusing on the organization as a whole. We preferred to look at each data science product developed within each brand and produce actionable insights.
Level 0: We have a model developed in a notebook, as most of our data scientist loves to start with.
Level 1: We have DevOps but no MLOps, data scientists produce model artifacts, and send them to the DevOps team, as zip files or pickle files via email or etc.
Level 2: We have automated training so there is a pipeline that produces model artifacts but is still unattached from the deployment pipeline controlled by the DevOps team.
Level 3: We have automated training and deployment.
Level 4: We have more advanced components such as central monitoring, and standardized A/B testing.
We identified the models and brands that were between level 1 and level 2 and decided to start our MLOps transformation there. We had the motivation to bring all models to level 3 so that we had automated deployment, where we could reuse models across brands.
The way of working
The reason for the poor implementation and maintenance of ML models was the way they worked.
Data scientists developed models in the development environment, and when they worked, they sent the Python package or code (via email!) to a DevOps team. The DevOps team deployed the project in a production environment inaccessible to the data scientist, and resolving errors, which occurred frequently, would take a lot of time. We call this way of working “throwing it over the fence”, which was common in the development world until DevOps became a thing.
The recipe for MLOps transformation
To cover all principles in MLOps, at least two teams should be involved: the IT/platform team and the data science team. The data science team expressed contentment with the proposed change in the way of working, as they no longer need to address errors in an inefficient way. The platform team, on the other hand, needed more planning to alter their way since it required a structural change in their deployment procedure.
MLOps Standards
We prepared MLOps standards, and a checklist for production and shared it with the platform team. This list summarizes what is needed to efficiently produce and maintain ML models. We divided the list into 4 parts: documentation, code quality, traceability, and monitoring. Some example requirements are shown below.
Data Science Project Methodology
Simply providing a list of required standards was insufficient to initiate a shift in the current approach. To illustrate the rationale behind this checklist, we created a document, a data science methodology, explaining each requirement and its underlying reasoning.
MLOps standards and data science project methodology helped all teams to understand the necessity of MLOps principles and identify the missing parts. Eventually, we started working in a more transparent and robust way.
Data science teams are responsible for data exploration, model development, bug fixing, and deployment to the production environment via service credentials. The platform team provides infrastructure provisioning, delivers data, and supports MLOps needs.
Golden Path
The last step in our MLOps transformation is to provide a golden path, a guideline to implement each criteria in MLOps standards.
Example golden path: Traceability & Reproducibility
Impact of MLOps transformation
As a result of MLOps transformation, data science teams have permission to deploy machine learning models, which allowed us to standardize the deployments and create reusable workflows.
Because of standardization we could cut down model deployment time from 6 months to 1 month and reduced compute costs by 60%, in some cases even by 90%. Additionally, we can now easily find the root cause of the problem when an ML model makes an unexpected prediction by ensuring traceability.