Dagger — CI/CD as Code that Runs Anywhere
Written with Arne Müller.
The current developer workflow for CI/CD pipelines is not pleasant. It takes many “git push” before you get things working as expected since running the pipeline is often the only way to test your pipeline end-to-end.
The main issue here is that the tools you are using on the workflow runner are not the same as in your development environment, not to mention that the tools are not always as flexible as we wish. Jenkins with its groovy language, Github Actions with all the custom actions from the marketplace. Very few people enjoy writing CI/CD pipelines.
The project Dagger, created by Solomon Hykes, co-founder and the former CTO of Docker, solves the problem. Just like Docker helps you to bridge the gap between your development and production environment by packaging your logic into docker images, Dagger does the same for your pipelines by running your workflow within containers.
Dagger is not just about closing the gap between local and remote, it is also about abstracting away from the language of your CI/CD pipelines. You can still use your favorite CI/CD tool, but define the logic of the pipeline using the language of your preference (Dagger supports Node.js, Go, and Python).
In this article, we will show you how to use Dagger Python SDK together with GitHub Actions to run unit tests, build Python package, and build and push docker image. We will also look into how this is different from just using GitHub Actions. No more “push and pray”.
Old approaches
For this article, we took unit tests we described in our article about testing ML code (https://marvelousmlops.substack.com/p/how-to-test-ml-code). The idea is to run those tests as part of CI/CD pipeline in GitHub Actions and be able to run it locally as well. We also build a Python package and docker image that can be used for model training.
Without Dagger, we would define the logic of the pipeline using GitHub Actions and test it locally using act. We have a “hello world” pipeline that looks the following way:
name: gh-action
on:
push:
branches: [master]
workflow_dispatch:
jobs:
build:
name: build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install deps
run: |
pip install -r requirements.txt
pip install ".[test]"
- name: Run tests
run: pytest tests
- name: Build python package
run: |
python3 -m pip install --upgrade build
python3 -m build
- name: Login to Docker Hub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USER }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Build model serving docker image
uses: docker/build-push-action@v2
with:
context: .
file: Dockerfile
tags: "marvelousmlops/dagger_example:latest"
The pipeline defined above has the following steps:
check the repository into the runner so that we can access files from the repository
setup python 3.11, install dependencies
run tests
build Python package
build and push the docker image
To run the GitHub Actions pipeline, secrets are required. We need them to push the docker image to the DockerHub repository. Secrets are defined as repository secrets on GitHub. To run GitHub Actions locally, we use act and pass secrets in the following way:
act -s DOCKERHUB_USER=<user> -s DOCKERHUB_TOKEN=<token>
Act is easy to set up (check out the installation guide) and works well in simple scenarios. In more complex scenarios, issues may arise.
Act comes with some limitations related to the docker images it uses (from act documentation):
“These default images do not contain all the tools that GitHub Actions offers by default in their runners. Many things can work improperly or not at all while running those image. Additionally, some software might still not work even if installed properly, since GitHub Actions are running in fully virtualized machines while act is using Docker containers”.
It might be challenging to work with if your workflow contains cross-repository dependencies.
Some custom actions and workflows may not behave as expected.
Here comes Dagger
With Dagger, the pipeline is defined using Python. We follow the example from Dagger documentation.
In this pipeline, we create Dagger client using dagger.Connection(), obtain a reference to the current directory on the host, define a container to run the code (in this case, python:3.11-slim-bullseye), define a working directory, install dependencies, and run the tests. Then we build a Python package, build a docker image with the Python package installed.
## test_base.py
import os
import sys
import anyio
import dagger
async def test_and_publish():
async with dagger.Connection(dagger.Config(log_output=sys.stderr)) as client:
src = client.host().directory(".")
source = (
client.container()
# pull container
.from_("python:3.11-slim-bullseye")
# mount source directory
.with_directory("/ws", src)
)
# install package
runner = (
source.with_workdir("/ws")
.with_exec(["pip", "install", "-r", "requirements.txt"])
.with_exec(["pip", "install", ".[test]"])
)
# run tests
test = runner.with_exec(["pytest", "-v", "tests"])
# build python package
build_dir = (
await runner.with_exec(
["python3", "-m", "pip", "install", "--upgrade", "build"]
)
.with_exec(["python3", "-m", "build"])
.directory("dist")
)
await build_dir.export("dist")
# build and publish image
image_ref = "marvelousmlops/dagger_example:latest"
secret = client.set_secret(
name="dockerhub_secret", plaintext=os.environ["DOCKERHUB_TOKEN"]
)
build = (
src.with_directory("/tmp/dist", client.host().directory("dist"))
.docker_build(dockerfile='Dockerfile_dagger')
.with_registry_auth(
address=f"https://docker.io/{image_ref}",
secret=secret,
username=os.environ["DOCKERHUB_USER"],
)
)
await build.publish(f"{image_ref}")
print(f"Published image to: {image_ref}")
if __name__ == "__main__":
anyio.run(test_and_publish)
The GitHub Actions pipeline is minimalistic compared to the one we saw earlier. Important to mention that we create environment variables containing secrets as part of the GitHub Actions workflow so that we can access those in the Dagger pipeline.
name: dagger
on:
push:
branches: [master]
workflow_dispatch:
jobs:
build:
name: build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install deps
run: pip install dagger-io==0.9.11
- name: Install Dagger CLI
run: cd /usr/local && { curl -L https://dl.dagger.io/dagger/install.sh | sh; cd -; }
- name: Setup env vars
run: |
echo "DOCKERHUB_TOKEN=${{ secrets.DOCKERHUB_TOKEN }}" >> $GITHUB_ENV
echo "DOCKERHUB_USER=${{ secrets.DOCKERHUB_USER }}" >> $GITHUB_ENV
- name: Run Dagger pipeline
run: dagger run python ci/test_base.py
The benefit here is that this pipeline will not change much over time, does not matter how complex the Python pipeline definition will get. This makes migration to another CI/CD provider very simple.
Check the full code in the repository: https://github.com/marvelousmlops/dagger_example.
Conclusions
Overall, based on the example, we can say that Dagger comes with the following advantages:
Python is much more flexible than GitHub Actions yml definitions, which allows to create more complex flows and more elegant solutions.
What runs locally is the same thing that runs using your CI/CD provider.
It is platform-agnostic, making you less dependent on the chosen CI/CD provider.
Seems like a nice way to run CI/CD pipelines!