Skip to content

Commit a590ab5

Browse files
committed
Create CD pipeline
1 parent 73b799b commit a590ab5

File tree

4 files changed

+142
-4
lines changed

4 files changed

+142
-4
lines changed

.github/workflows/cd.yml

+86
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
name: 'Continuous Deployment'
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
- develop
8+
- release/*
9+
10+
jobs:
11+
deployment:
12+
runs-on: ubuntu-latest
13+
environment: dev
14+
env:
15+
branch: main
16+
17+
steps:
18+
- uses: actions/checkout@v4
19+
20+
- name: Get the branch name
21+
id: get_branch_name
22+
run: |
23+
echo "branch=${GITHUB_HEAD_REF:-${GITHUB_REF#refs/heads/}}" >> $GITHUB_OUTPUT
24+
25+
- name: Set up Python
26+
uses: actions/setup-python@v5
27+
with:
28+
python-version: '3.9'
29+
30+
- name: Authenticate to GCP
31+
uses: 'google-github-actions/auth@v2'
32+
with:
33+
credentials_json: '${{ secrets.CD_SA_KEYS }}'
34+
35+
- name: Install dependencies
36+
run: |
37+
pip install -r requirements.txt -r requirements-dev.txt
38+
39+
- name: Run training script
40+
run: |
41+
python train.py
42+
43+
- name: Authenticate Docker to GAR
44+
uses: docker/login-action@v3
45+
with:
46+
registry: '${{ vars.GCP_REGION }}-docker.pkg.dev'
47+
username: _json_key
48+
password: ${{ secrets.CD_SA_KEYS }}
49+
50+
- name: Build and push Docker image
51+
uses: docker/build-push-action@v6
52+
with:
53+
push: true
54+
tags: '${{ vars.GAR_REPOSITORY }}/${{ vars.GAR_IMAGE_NAME }}-${{ steps.get_branch_name.outputs.branch }}'
55+
56+
- name: Deploy the service to Cloud Run
57+
id: 'deploy'
58+
uses: 'google-github-actions/deploy-cloudrun@v2'
59+
with:
60+
service: '${{ vars.GCR_SERVICE_NAME }}-${{ steps.get_branch_name.outputs.branch }}'
61+
image: '${{ vars.GAR_REPOSITORY }}/${{ vars.GAR_IMAGE_NAME }}-${{ steps.get_branch_name.outputs.branch }}'
62+
region: '${{ vars.GCP_REGION }}'
63+
flags: '--allow-unauthenticated'
64+
65+
outputs:
66+
service_url: ${{ steps.deploy.outputs.url }}
67+
68+
stress_test:
69+
runs-on: ubuntu-latest
70+
needs: deployment
71+
72+
steps:
73+
- uses: actions/checkout@v4
74+
75+
- name: Set up Python
76+
uses: actions/setup-python@v5
77+
with:
78+
python-version: '3.9'
79+
80+
- name: Install dependencies
81+
run: |
82+
pip install -r requirements-test.txt
83+
84+
- name: Run stress test
85+
run: |
86+
make stress-test API_URL=${{ needs.deployment.outputs.service_url }}

Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ install: ## Install dependencies
2323
pip install -r requirements-test.txt
2424
pip install -r requirements.txt
2525

26-
STRESS_URL = https://delay-model-dpmrk4cwxq-uw.a.run.app
26+
STRESS_URL = $(API_URL)
2727
.PHONY: stress-test
2828
stress-test:
2929
# change stress url to your deployed app

docs/challenge.md

+28-3
Original file line numberDiff line numberDiff line change
@@ -128,8 +128,33 @@ The results of the stress test are an error rate of 0%, an average response time
128128
129129
On this final step, the goal is to setup a proper CI/CD pipeline.
130130
131-
The CI workflows focus on running the tests and assesing the quality of the code each time there's a push to the repository, with the goal of detecting bugs earlier, correcting code faster and ensuring good code quality practices.
131+
The Continuous Integration (CI) workflow focuses on running the tests and assesing the quality of the code each time there's a push to the repository, with the goal of detecting bugs earlier, correcting code faster and ensuring good code quality practices.
132132
133-
The CD workflows focus on training the model, deploying the API and running the stress testing against it. These workflows only run when there's a push to the `main`, `develop` or `release` branches on the repository.
133+
The Continuous Deployment (CD) workflow focuses on training the model, deploying the API and running the stress test against it. This workflow only runs when there's a push to the `main`, `develop` or `release` branches.
134134
135-
* Undesirable model tracking
135+
Let's describe each workflow with more detail.
136+
137+
### Continuous Integration
138+
139+
The goals of this workflow are checking the code quality and testing it. For the first goal, the code is checked using `black`, `flake8` and `isort` to ensure that the style and format are correct and fit the repository standards. For the second goal, the provided test suites (`model-test` and `api-test`) are ran to ensure that the changes done on the code don't affect the functionality of the `DelayModel` class and the API.
140+
141+
**Observation:** The test suites require a trained model available for testing purposes. However, this test suites run on Github workers and don't have access to local models. To circumvent this, the model checkpoint is tracked with Git and uploaded to the remote. This is not desirable, since model's can crow rapidly in size and managing them inside the repository can become a problem. The ideal solution would be to maintain a proper Model Registry, with remote storage and a good version management, so that trained models can be uploaded to it or downloaded for testing or deployment. Due to time restrictions and since the model checkpoint is lightweight on this case, the decision to track the model was taken.
142+
143+
### Continuous Deployment
144+
145+
The goal of this workflow is to train the model, build the Docker image with it and deploy it to a Cloud Run service. This workflow only runs when there's a push to the `main`, `develop` or `release` branches and it deploys a different API for each of these. The reasoning is that having different deployments for different stages of the development of features and releases can help in testing how the changes affect the deployment, while keeping the `main` API intact and serving only the released code features.
146+
147+
Here are the most important steps taken to develop this workflow:
148+
149+
* A GCP Service Account `cd-pipeline-sa` was created to grant the Github Action runner with permissions to push the Docker image to the Artifact Registry repository and to deploy the Cloud Run Service. The roles given to this SA are:
150+
- `Artifact Registry Writer`: enables the SA to push Docker images to the Artifact Registry repositories
151+
- `Cloud Run Admin`: gives the SA full control over the Cloud Run services deployed
152+
- `Service Account User`: gives the SA the necessary permissions to act as the default Cloud Run service account. This permission is needed for deploying from the Github Action.
153+
We created one single SA for simplification, since we only use it in a single workflow. Ideally, we should have multiple SAs, each with more granular and reduced permissions; for example, we could have a "Cloud Run SA" which only has control over the services and nothing else, and a separate "Artifact Registry SA" which only has access to the repository.
154+
* A `dev` environment was created on the Github Repository, containing various configuration variables (mostly names used through the GCP deployment) and secrets (the key to access the SA `cd-pipeline-sa`). The created configuration variables are:
155+
- `GAR_IMAGE_NAME=delay-model-api`
156+
- `GAR_REPOSITORY=us-west1-docker.pkg.dev/rodrigo-tryolabs-latam/delay-model-service`
157+
- `GCP_PROJECT_ID=rodrigo-tryolabs-latam`
158+
- `GCP_REGION=us-west1`
159+
- `GCR_SERVICE_NAME=delay-model`
160+
* After deployment of the service, the stress tests run against the deployed API. As mentioned, different APIs are deployed depending on the branch. To point the stress test script to the correct API, a small modification was needed to be done to the `Makefile`, so that the URL of the API is passed as an argument on the `make stress-test` command. The final command is `make stress-test API_URL=<api-url>`.

train.py

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
import pandas as pd
2+
3+
from challenge.model import DelayModel
4+
5+
print("Loading data...")
6+
# Read the data
7+
df = pd.read_csv("data/data.csv")
8+
print("-> Data loaded")
9+
10+
# Create the model
11+
model = DelayModel()
12+
13+
print("Preprocessing data...")
14+
# Preprocess the data
15+
X_train, y_train = model.preprocess(df, "delay")
16+
print("-> Preprocessed data")
17+
18+
19+
print("Training model...")
20+
# Train the model
21+
model.fit(X_train, y_train)
22+
print("-> Model trained")
23+
24+
print("Saving model...")
25+
# Store the model
26+
model.save("challenge/tmp/model_checkpoint.pkl")
27+
print("-> Model saved")

0 commit comments

Comments
 (0)