Sharpen your cookiecutter: speed up repo creation with workflows
We talk about ChatGPT, BardAI, Dall-E and all other cool stuff recently being released in the AI field, but let’s face the reality: we still have a long way to go to enhance the maturity of data and AI in our organizations. We see too often that data analysts are not utilizing version control and conducting analysis on their local machines. We aim to make their first step easier for them by providing workflows to create a repository with all necessary files and permissions for them. And yes, it starts with cookiecutter.
The main purpose of using cookiecutter template is to reuse existing project structures and avoid tedious repetition by old-school copy-pasting. It’s very easy to get started on a project using cookiecutter. You prepare a template with necessary files, set up certain parameters (project name, author name, repo name etc.), and run cookicutter command to create a folder from the template. In this way, you ensure that mandatory files like Readme, and .gitignore are not forgotten. It also guides developers to follow standards, write tests, fill docstrings, etc.
This is all great, it works perfectly on a local machine. But how about automating the end-to-end process, including the repository creation, not just the content folder?
Especially for data analysts, or early-stage data scientists, it is much easier to trigger a workflow on GitHub and have the repository created itself, instead of doing it manually.
In this article, we will show you how we constructed our cookiecutter template, which not only provides the content structure but also creates the repository in your organization with the required permissions.
It’s as simple as triggering a GitHub Actions workflow. You can see our example repository here: https://github.com/marvelousmlops/cookiecutter-datascience
Disclaimer: We didn’t pick a commonly used Data Science project template with data, model, notebook, etc. folders and many configuration files. We kept our template as simple as possible for 2 reasons. 1. We don’t think you need all these models/, docs/, notebooks/, references/, reports/ folders as a start. 2. We want to focus on the automation of creating a repository.
Optional Prerequisites:
1. Team creation
In any organization, it can be very handy to create teams with certain users. In this way, you can implement a way to check users and team memberships to control who creates what repository in the organization. For example, we created a team called ml-engineers in our organization marvelousmlops. We will use this later for checking the user and team.
2. GitHub App
GitHub App is a great way to impersonate GitHub Token when secrets.GITHUB_TOKEN has limitations and the personal access token is not suitable.
Follow the instruction to create a GitHub app: https://docs.github.com/en/apps/creating-github-apps/setting-up-a-github-app/creating-a-github-app
You should generate a GitHub App within your organization account. https://github.com/organizations/<your-organization-name>/settings/apps
Choose only necessary permissions. For security reasons, it’s advised to keep permissions as minimal as possible for each GitHub app you create.
The permissions you need to create a repository and add collaborators:
Repository Permission→ Administration read-write (to create repos)
Repository Permission→ Content read-write (to commit and push to repos)
Repository Permission→ Workflows read-write(to create workflows in repos, e.g. CI.yml)
Repository Permission→ Metadata read-only (default, mandatory)
Organization Permission → Members read-only (to check members and teams within the organization)
You can see the App ID of your GitHub app, this will be needed in our cookiecutter repository. You can also generate “a private key” on your GitHub app. This will be needed for the cookiecutter repository.
The user who is creating the app must be an admin in the organization as we want to install this app on the organization to use in our repositories.
Then go to your organization page, and install the GitHub app you just created. Instructions: https://docs.github.com/en/apps/maintaining-github-apps/installing-github-apps#
https://github.com/organizations/<your-organization-name>/settings/installations
In our example, it looks like this:
Creating a team and GitHub App are optional, but highly suggested. Both are nice practices to have control over repository creation within your organization and avoid using personal access tokens.
Cookiecutter template repository
You need to create a repository in your organization for your customized cookiecutter template. We created a repository called cookiecutter-datascience.
https://github.com/marvelousmlops/cookiecutter-datascience
If you choose to use GitHub App for authentication, then you must add APP_ID and APP_PRIVATE_KEY (mentioned above) as secrets to this repository. APP_ID is the App ID you can see on the GitHub App configuration page. APP_PRIVATE_KEY is the secret you can create on the same page. These secrets will be used to access GitHub App, which will generate a token to provide permission on our workflow to create a repository.
If you choose to use a personal access token, add that as a secret to your repository and use it on the workflow directly.
This is how it looks like when we add 2 secrets to the repository.
GitHub Actions Workflow to create a repo
In cookiecutter-datasicence repository, the project template is provided in the folder {{cookiecutter.reponame}}. As we mentioned earlier, it’s as minimal as possible. You can add/remove any file/folder according to your project need.
The most important part is the workflow to create a repository .github/workflows/create_repo.yml
This workflow simply does the following
Takes necessary inputs: repo_name, product_name, team_name (you can modify these, add, or remove as your need)
Generates token by using GithHub App (you can change it to using PAT token from secrets)
Checks the user and team membership (optional) (this is an additional action defined here. If you don’t need to check team membership, remove this step, and don’t forget to delete or modify lines 65 and 71 where TEAM_NAME is used as env variable.)
Render cookiecutter (simply rendering Jinja template)
Create a new repository and push
name: Create Repo
on:
workflow_dispatch:
inputs:
repo_name:
description: 'Name of the repository to be created'
required: true
default: ''
product_name:
description: 'Name of the product, eg. topn, search'
required: true
default: ''
operation_team:
description: 'Operation team, eg. MLOps, Data Science'
required: true
default: ''
jobs:
create-repository:
runs-on: ubuntu-20.04
name: Creating Organization Repository
steps:
- name: Echo inputs
id: echo_inputs
run: |
echo "Repo name: ${{ inputs.repo_name }}"
echo "Product_name: ${{ inputs.product_name }}"
echo "Operation team: ${{ inputs.operation_team }}"
echo "Who triggered the workflow: ${{ github.actor }}"
# Generating token with using Github App
- name: Generate token
id: generate_token
uses: tibdex/github-app-token@v1
with:
app_id: ${{ secrets.APP_ID }}
private_key: ${{ secrets.APP_PRIVATE_KEY }}
- name: Setup git token
id: setup_git_token
shell: bash
run: |
echo "GITHUB_TOKEN=${{ steps.generate_token.outputs.token }}" >> $GITHUB_ENV
- name: Checkout the project repo
id: checkout_repo
uses: actions/checkout@v3
- name: Check user team and repo name
uses: ./.github/actions/check_user
id: check_user
with:
user: ${{ github.actor }}
repo_name: ${{ inputs.repo_name }}
- name: Install dependencies
run: |
pip install jinja2-cli==0.8.2
pip install cookiecutter==2.1.1
shell: bash
- name: Render cookiecutter.json
env:
TEAM_NAME: ${{ steps.check_user.outputs.team_name }}
run: |
jinja2 cookiecutter_tmpl.json \
-D "repo_name=${{ inputs.repo_name }}" \
-D "product_name=${{ inputs.product_name }}" \
-D "operation_team=${{ inputs.operation_team }}" \
-D "code_owners=@${{ env.TEAM_NAME }}" \
> cookiecutter.json
cat cookiecutter.json
- name: Run cookiecutter
run: |
cookiecutter . --no-input
- name: Push to repo
run: |
git config --global user.email "github-actions[bot]@users.noreply.github.com" && \
git config --global user.name "github-actions[bot]" && \
cd ./${{ inputs.repo_name }}
git init
git add .
git config --global init.defaultBranch master
git commit -m "Add cookiecutter template"
gh repo create marvelousmlops/${{ inputs.repo_name }} --source=. --public --remote=origin --push
- name: Add user to ${{ inputs.repo_name }}
shell: bash
run: |
gh api --method=PUT "repos/marvelousmlops/${{ inputs.repo_name }}/collaborators/${{ github.actor }}" -f permission=admin
Optional check team membership: (assuming that there exist teams called ml-engineers, data-scientists in the organization)
name: Check user team and repository name
description: >
Check if the user is part of the team
inputs:
user:
description: 'User to check'
required: true
repo_name:
description: 'Repository name'
required: true
outputs:
team_name:
description: Team name in GitHub
value: ${{ steps.check_team_and_repo.outputs.team_name }}
runs:
using: "composite"
steps:
- name: Check user for team ml-engineers
uses: tspascoal/get-user-teams-membership@v2
id: mlengineers
with:
GITHUB_TOKEN: ${{ env.GITHUB_TOKEN }}
username: ${{ github.actor }}
team: "ml-engineers"
- name: Check user for team data-scientists
uses: tspascoal/get-user-teams-membership@v2
id: datascientists
with:
GITHUB_TOKEN: ${{ env.GITHUB_TOKEN }}
username: ${{ github.actor }}
team: "data-scientists"
# Example conditions: if user is a member of ml engineer, they can create any repo
# If user is a member of data-scientist, the repo name should start with ds-
- name: Check if team membership and repo name match
id: check_team_and_repo
shell: bash
run: |
if [[ "${{ steps.mlengineers.outputs.isTeamMember }}" == "true" ]]; then \
(team="marvelousmlops/ml-engineers"; echo "team_name=$team" >> $GITHUB_OUTPUT; echo "user belongs to $team";);
elif [[ "${{ inputs.repo_name }}" == *"ds-"* && "${{ steps.datascientists.outputs.isTeamMember }}" == "true" ]]; then \
(team="marvelousmlops/data-scientists"; echo "team_name=$team" >> $GITHUB_OUTPUT; echo "user belongs to $team";);
else
echo "user is not a member of expected team to create repo with given name"
exit 1
fi