What do ML engineers deploy: batch use case
In the article Deployment strategies for ML products, we talked about the need for 3 environments with access to production data (DEV, ACC, PRD) and how those environments are used in the ML deployment process. We have touched a bit on what exactly is being deployed, but it is good to come up with some concrete examples.
I will take a very common example from the retail industry, a use case with probably the most impact for any retailer: demand forecast for a warehouse or stores. Typically, we are talking about multiple models here: one for each product category, and there are tens, or hundreds of them.
Steps involved in the deployment
Demand forecast is usually implemented as a batch process, where predictions for coming x days are delivered daily: via SFTP transfer, or via writing to a database. What are the steps involved to make it happen?
Data preprocessing. Usually, there is one big table where new data is processed and added incrementally each day. This table contains features needed for model retraining and model inference.
(Conditional) model retraining. The model can be retrained periodically (for example, every week), or only when significant data drift occurs. Otherwise, the latest artifact is used.
Model inference (generation of predictions).
Delivery of predictions
Let’s assume we have 5 different product categories, predictions need to be delivered daily, each model needs to be retrained only every Monday. Then the process will look like:
We want to have separate processes for training and inference for each product category for multiple reasons:
speed & costs: Now we train 5 models in parallel, and smaller machines are needed for each process, It is typically more cost-efficient to have 5 smaller machines instead of one big one;
ability to repair runs per category: It happens that something is wrong with the data for a specific product category, we do not want to rerun the whole thing just because one piece failed
Where is the deployment logic defined?
There are 4 main pieces pieces involved:
Demand forecast Python package with unit tests + at least 2 modules:
data preprocessing module;
ML model module with custom model class with train & predict logic
2. Execution files: how many of them you need, depends on an orchestration system. In the case of Databricks and Databricks workflows you would need 5 Python execution scripts:
data_preprocessing.py;
train.py (parameterized for each category)
predict.py (parametrized for each category)
combine_predictions.py
sftp_transfer.py
The last 2 steps can be combined in one script, but it may make sense to keep it separate for better granularity.
3. Orchestration logic. In the case of Databricks, it is possible to deploy a job in 3 different ways: with Databricks API and job JSON definition, with terraform, or with dbx.
4. CD pipeline. A pipeline that uploads all the files & updates the definition of the orchestration logic.
What about different environments?
We mentioned earlier that there are 3 different environments needed for the development and deployment of an ML project:
DEV is used to run integration testing
ACC is used for acceptance testing
PRD is used for deployment to production
To do proper integration/acceptance testing, we need to test orchestration as well, but to keep code DRY we do not want to repeat orchestration logic multiple times, we want to parametrize it:
Triggering: We may want to trigger code execution in DEV/ACC just once and not have any triggers active, while code execution in PRD is scheduled
Environment variables/ Python parameters may be different in different environments to allow, for example, different file paths
Jinja2 can be a very useful tool to accommodate this.
Conclusions & next steps
In this article, we touched on a batch use case and looked at what is being deployed without going into code details. There are other aspects that are important while deploying an ML model: logging, monitoring, and alerting. Those we will discuss in further articles.
As the next step, we also want to go into showing some code, including how orchestration can be implemented. Stay tuned!