Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(ci): add e2e tests for cloud distros #1259

Open
wants to merge 81 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
c17641b
chore: add e2e tests for cloud distros
noahpb Feb 5, 2025
4025736
lint fix
noahpb Feb 5, 2025
5ba76b0
add validate task
noahpb Feb 5, 2025
aa1c4d7
trigger workflow
noahpb Feb 5, 2025
9fca24c
mute task, make passthrough gw validation default to false
noahpb Feb 5, 2025
f3235e5
add metrics-server
noahpb Feb 5, 2025
ec770f5
exclude metrics-server on aks, group into one task
noahpb Feb 6, 2025
a80d8fc
syntax fix maybe?
noahpb Feb 6, 2025
27cb7fd
no metrics server on eks
noahpb Feb 6, 2025
391dbc0
more syntax fun
noahpb Feb 6, 2025
8ae0ab8
try differnt syntax
noahpb Feb 6, 2025
5871a11
change logic in if statement
noahpb Feb 6, 2025
6bba450
add additional entries to hosts file
noahpb Feb 6, 2025
97185ba
Merge branch 'main' into feat/e2e-test-nightly
noahpb Feb 13, 2025
3a4d424
query for hostname
noahpb Feb 13, 2025
1900b7b
get ip from hostname, lint
noahpb Feb 14, 2025
3cf1f9a
allow e2e test to be built for multiple archs
noahpb Feb 18, 2025
a2f0432
fix arch value
noahpb Feb 18, 2025
edb4d69
switch arch default
noahpb Feb 18, 2025
ea443d1
Merge branch 'main' into feat/e2e-test-nightly
noahpb Feb 18, 2025
f4c1e4c
workaround for eks e2e tests
noahpb Feb 19, 2025
9cd4b65
rm temp workaround
noahpb Feb 19, 2025
b1c9103
add aws-lb-controller to rke2, cleanup
noahpb Feb 20, 2025
7d8437f
Merge branch 'main' into feat/e2e-test-nightly
noahpb Feb 21, 2025
c4794d2
debugging
noahpb Feb 21, 2025
770b805
use `dig` instead of `curl` for ipv4 lookup
noahpb Feb 21, 2025
71398c8
switch to azure gov
noahpb Feb 21, 2025
38206c7
set azure gov env
noahpb Feb 21, 2025
ce92ce4
set audience and arm env
noahpb Feb 21, 2025
19af81a
set environment in config block
noahpb Feb 21, 2025
9766662
try access key
noahpb Feb 21, 2025
8529730
add logic for fetching az token
noahpb Feb 24, 2025
1350d44
revert using key
noahpb Feb 24, 2025
510c15d
update region
noahpb Feb 24, 2025
38a7454
add `availability_zone` for rke2 agent node(s)
noahpb Feb 24, 2025
8fbfb08
switch availability zone
noahpb Feb 24, 2025
e8cf788
Merge branch 'main' into feat/e2e-test-nightly
noahpb Feb 24, 2025
a657f9e
sku and psql dns fix
noahpb Feb 24, 2025
8aaa050
buy time to debug
noahpb Feb 24, 2025
ef7c0e2
dns fix and os disk type
noahpb Feb 24, 2025
3da3c74
switch to zone a
noahpb Feb 24, 2025
b33f8bf
update velero credentials override
noahpb Feb 24, 2025
77a216d
rm temp debug changes
noahpb Feb 24, 2025
b7503da
rm unused terraform.tfvars
noahpb Feb 24, 2025
f5f03e4
tmp disable eks e2e testing
noahpb Feb 24, 2025
e5b4c4c
increase disk space for rke2 nodes
noahpb Feb 24, 2025
87b87c1
specify `backup.velero.io` kind in `./src/velero/tasks.yaml`
noahpb Feb 25, 2025
65ca3dd
Merge branch 'main' into feat/e2e-test-nightly
noahpb Feb 25, 2025
6d20335
more disk for rke2 nodes
noahpb Feb 25, 2025
6ae6eda
Merge branch 'main' into feat/e2e-test-nightly
noahpb Feb 25, 2025
cba3465
create new task for coredns-custom cm
noahpb Feb 25, 2025
9df9e48
rm task call
noahpb Feb 25, 2025
346f8cb
Merge branch 'main' into feat/e2e-test-nightly
noahpb Mar 6, 2025
4accee0
update rke2 tasks to do coredns override
noahpb Mar 6, 2025
b00e4d1
Merge branch 'main' into feat/e2e-test-nightly
noahpb Mar 6, 2025
0591e66
fix lint
noahpb Mar 6, 2025
270f35f
address pr feedback
noahpb Mar 6, 2025
29271d9
switch to gov iam role
noahpb Mar 7, 2025
50232fd
include util task
noahpb Mar 7, 2025
5056ab4
add coredns fixes and add netpol for rke2
noahpb Mar 7, 2025
79969f5
Merge branch 'main' into feat/e2e-test-nightly
noahpb Mar 7, 2025
335353f
fix task name
noahpb Mar 7, 2025
c37402b
include nested dirs in workflow trigger for rke2 IaC
noahpb Mar 10, 2025
f7f4e22
fix: adjust network tests to work across k8s distros
noahpb Mar 11, 2025
07a437f
metrics fix for rke2 hopefully
noahpb Mar 11, 2025
1aeb6d3
add args for etcd and kube-scheduler
noahpb Mar 11, 2025
de2e1dc
add sudo
noahpb Mar 12, 2025
cafa022
fix etcd args
noahpb Mar 12, 2025
e02a1b1
add eks testing and coredns patch
noahpb Mar 12, 2025
bdbfa14
eks coredns cm patch
noahpb Mar 12, 2025
614cb57
rke2 components update
noahpb Mar 12, 2025
b11335b
Merge branch 'main' into feat/e2e-test-nightly
noahpb Mar 12, 2025
bfc9375
ignore e2e test failures on eks
noahpb Mar 13, 2025
7c291e4
switch to local path provisioner
noahpb Mar 13, 2025
1b1d7a9
dont expand vars
noahpb Mar 13, 2025
994e5aa
Merge branch 'main' into feat/e2e-test-nightly
noahpb Mar 13, 2025
bdf747c
selinux fix for local path provisioner rke2
noahpb Mar 13, 2025
c097ba9
install longhorn
noahpb Mar 14, 2025
7e885ad
no longhorn ui replicas
noahpb Mar 14, 2025
f8bee5e
Merge branch 'main' into feat/e2e-test-nightly
noahpb Mar 14, 2025
5368bb4
rm `chcon` command for local path provisioner
noahpb Mar 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .github/bundles/rke2/uds-bundle.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,20 @@ packages:
- istio-ambient
- metrics-server
overrides:
istio-admin-gateway:
gateway:
values:
- path: service.annotations
value:
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
service.beta.kubernetes.io/aws-load-balancer-target-node-labels: "kubernetes.io/os=linux"
istio-tenant-gateway:
gateway:
values:
- path: service.annotations
value:
service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
service.beta.kubernetes.io/aws-load-balancer-target-node-labels: "kubernetes.io/os=linux"
velero:
velero:
variables:
Expand Down
2 changes: 1 addition & 1 deletion .github/test-infra/aws/rke2/data.tf
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ data "aws_vpc" "vpc" {

data "aws_subnet" "rke2_ci_subnet" {
vpc_id = data.aws_vpc.vpc.id
availability_zone = "${var.region}c"
availability_zone = "${var.region}a"

filter {
name = "tag:Name"
Expand Down
10 changes: 10 additions & 0 deletions .github/test-infra/aws/rke2/iam.tf
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,16 @@ data "aws_iam_policy_document" "aws_ccm" {
}
}

data "http" "aws-lb-controller-iam" {
url = "https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/v2.11.0/docs/install/iam_policy_us-gov.json"
}

resource "aws_iam_role_policy" "aws-lb-controller" {
name = "${local.cluster_name}-lb-controller"
role = aws_iam_role.rke2_server.id
policy = data.http.aws-lb-controller-iam.response_body
}

resource "aws_iam_role_policy" "s3_token" {
name = "${local.cluster_name}-server-token"
role = aws_iam_role.rke2_server.id
Expand Down
8 changes: 5 additions & 3 deletions .github/test-infra/aws/rke2/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ locals {
ccm_external = true,
token_bucket = module.statestore.bucket,
token_object = module.statestore.token_object
cluster_name = local.tags.cluster_name
}
}

Expand Down Expand Up @@ -95,7 +96,7 @@ resource "aws_instance" "rke2_ci_control_plane_node" {
associate_public_ip_address = true

root_block_device {
volume_size = 100
volume_size = 250
}

tags = merge(local.tags, { "kubernetes.io/cluster/${local.cluster_name}" = "owned" })
Expand All @@ -107,15 +108,16 @@ resource "aws_instance" "rke2_ci_agent_node" {
ami = data.aws_ami.rhel_rke2.image_id
instance_type = var.agent_instance_type
key_name = aws_key_pair.control_plane_key_pair.key_name
user_data = templatefile("${path.module}/scripts/user_data.sh", merge(local.userdata, { BOOTSTRAP_IP = aws_instance.rke2_ci_bootstrap_node.private_ip }))
user_data = templatefile("${path.module}/scripts/user_data.sh", merge(local.userdata, { BOOTSTRAP_IP = aws_instance.rke2_ci_bootstrap_node.private_ip, AGENT_NODE = true }))
subnet_id = data.aws_subnet.rke2_ci_subnet.id
user_data_replace_on_change = true
iam_instance_profile = aws_iam_instance_profile.rke2_server.name
vpc_security_group_ids = [aws_security_group.rke2_ci_node_sg.id]
associate_public_ip_address = true
availability_zone = "${var.region}a"

root_block_device {
volume_size = 100
volume_size = 250
}

tags = merge(local.tags, { "kubernetes.io/cluster/${local.cluster_name}" = "owned" })
Expand Down
2 changes: 1 addition & 1 deletion .github/test-infra/aws/rke2/scripts/get-kubeconfig.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ done
mkdir -p ~/.kube

# Copy kubectl from cluster node
ssh -o StrictHostKeyChecking=no -i key.pem ${node_user}@${bootstrap_ip} "mkdir -p /home/${node_user}/.kube && sudo cp /etc/rancher/rke2/rke2.yaml /home/${node_user}/.kube/config && sudo chown ${node_user} /home/${node_user}/.kube/config" > /dev/null
ssh -o StrictHostKeyChecking=no -i key.pem ${node_user}@${bootstrap_ip} "mkdir -p /home/${node_user}/.kube && sudo cp /etc/rancher/rke2/rke2.yaml /home/${node_user}/.kube/config && sudo chown ${node_user} /home/${node_user}/.kube/config" > /dev/null
scp -o StrictHostKeyChecking=no -i key.pem ${node_user}@${bootstrap_ip}:/home/${node_user}/.kube/config ./rke2-config > /dev/null

# Replace the loopback address with the cluster hostname
Expand Down
42 changes: 24 additions & 18 deletions .github/test-infra/aws/rke2/scripts/user_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
# Copyright 2024 Defense Unicorns
# SPDX-License-Identifier: AGPL-3.0-or-later OR LicenseRef-Defense-Unicorns-Commercial



info() {
echo "[INFO] " "$@"
}
Expand Down Expand Up @@ -42,32 +40,39 @@ spec:
- --cloud-provider=aws
EOM

#longhorn helm values: https://github.com/longhorn/longhorn/tree/master/chart
cat > /var/lib/rancher/rke2/server/manifests/01-longhorn.yaml << EOM
# aws lb controller helm values: https://github.com/kubernetes-sigs/aws-load-balancer-controller/tree/main/helm/aws-load-balancer-controller#configuration
cat > /var/lib/rancher/rke2/server/manifests/01-lb-controller.yaml << EOM
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: longhorn
name: aws-load-balancer-controller
namespace: kube-system
spec:
chart: longhorn
repo: https://charts.longhorn.io
version: 1.7.1
targetNamespace: kube-system
chart: aws-load-balancer-controller
repo: https://aws.github.io/eks-charts
version: 1.11.0
targetNamespace: kube-system
valuesContent: |-
clusterName: ${cluster_name}
EOM

#metallb helm values: https://github.com/metallb/metallb/tree/main/charts/metallb
cat > /var/lib/rancher/rke2/server/manifests/02-metallb.yaml << EOM
#longhorn helm values: https://github.com/longhorn/longhorn/tree/master/chart
cat > /var/lib/rancher/rke2/server/manifests/02-longhorn.yaml << EOM
apiVersion: helm.cattle.io/v1
kind: HelmChart
metadata:
name: metallb
name: longhorn
namespace: kube-system
spec:
chart: metallb
repo: https://metallb.github.io/metallb
version: 0.14.8
chart: longhorn
repo: https://charts.longhorn.io
version: 1.8.1
targetNamespace: kube-system
valuesContent: |-
defaultSettings:
deletingConfirmationFlag: true
longhornUI:
replicas: 0
EOM

info "Installing awscli"
Expand All @@ -89,14 +94,15 @@ chmod +x yq
./yq -i '.cloud-provider-name += "external"' /etc/rancher/rke2/config.yaml
./yq -i '.disable-cloud-controller += "true"' /etc/rancher/rke2/config.yaml
./yq -i '.kube-apiserver-arg += "service-account-key-file=/irsa/signer.key.pub"' /etc/rancher/rke2/config.yaml
./yq -i '.kube-apiserver-arg += "service-account-key-file=/irsa/signer.key.pub"' /etc/rancher/rke2/config.yaml
./yq -i '.kube-apiserver-arg += "service-account-signing-key-file=/irsa/signer.key"' /etc/rancher/rke2/config.yaml
./yq -i '.kube-apiserver-arg += "api-audiences=kubernetes.svc.default"' /etc/rancher/rke2/config.yaml
./yq -i '.kube-apiserver-arg += "service-account-issuer=https://${BUCKET_REGIONAL_DOMAIN_NAME}"' /etc/rancher/rke2/config.yaml
./yq -i '.kube-apiserver-arg += "audit-log-path=/var/log/kubernetes/audit/audit.log"' /etc/rancher/rke2/config.yaml
#Fix for metrics server scraping of kubernetes api server components
./yq -i '.kube-controller-manager-arg[2] = "bind-address=0.0.0.0"' /etc/rancher/rke2/config.yaml
./yq -i '.kube-scheduler-arg += "bind-address=0.0.0.0"' /etc/rancher/rke2/config.yaml
./yq -i '.etcd-arg += "listen-metrics-urls=http://0.0.0.0:2381"|.etcd-arg style="double"' /etc/rancher/rke2/config.yaml
rm -rf ./yq


}

pre_userdata
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/test-aks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,12 +88,18 @@ jobs:
- name: Create IAC
run: uds run -f tasks/iac.yaml apply-tofu --no-progress --set K8S_DISTRO=aks --set CLOUD=azure

- name: Configure Cluster DNS
run: uds run -f tasks/utils.yaml aks-coredns-setup --no-progress

- name: Deploy Core Bundle
env:
UDS_CONFIG: .github/bundles/aks/uds-config.yaml
run: uds deploy .github/bundles/aks/uds-bundle-uds-core-aks-nightly-*.tar.zst --confirm
timeout-minutes: 30

- name: Test UDS Core
run: uds run -f tasks/test.yaml uds-core-non-k3d --set EXCLUDED_PACKAGES="metrics-server"

- name: Debug Output
if: ${{ always() }}
uses: ./.github/actions/debug-output
Expand Down
7 changes: 7 additions & 0 deletions .github/workflows/test-eks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,12 +92,19 @@ jobs:
run: uds run -f tasks/iac.yaml create-iac --no-progress --set K8S_DISTRO=eks --set CLOUD=aws
timeout-minutes: 20

- name: Configure Cluster DNS
run: uds run -f tasks/utils.yaml eks-coredns-setup --no-progress

- name: Deploy Core Bundle
env:
UDS_CONFIG: .github/bundles/eks/uds-config.yaml
run: uds deploy .github/bundles/eks/uds-bundle-uds-core-eks-nightly-*.tar.zst --confirm
timeout-minutes: 30

- name: Test UDS Core
run: uds run -f tasks/test.yaml uds-core-non-k3d --set EXCLUDED_PACKAGES="metrics-server"
continue-on-error: true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting the continue-on-error: true here. EKS VPC CNI permits outbound traffic to the internet for the network tests, despite not having a NetworkPolicy to explicitly allow this. Ignoring failures for now until we decide how to proceed.

- name: Debug Output
if: ${{ always() }}
uses: ./.github/actions/debug-output
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/test-rke2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -95,12 +95,18 @@ jobs:
run: uds run -f tasks/iac.yaml rke2-cluster-ready --no-progress
timeout-minutes: 20

- name: Configure Cluster DNS
run: uds run -f tasks/utils.yaml rke2-coredns-setup --no-progress

- name: Deploy Core Bundle
env:
UDS_CONFIG: .github/bundles/rke2/uds-config.yaml
run: uds deploy .github/bundles/rke2/uds-bundle-uds-core-rke2-nightly-*.tar.zst --confirm
timeout-minutes: 30

- name: Test UDS Core
run: uds run -f tasks/test.yaml uds-core-non-k3d

- name: Debug Output
if: ${{ always() }}
uses: ./.github/actions/debug-output
Expand Down
2 changes: 1 addition & 1 deletion src/istio/tasks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ tasks:
inputs:
validate_passthrough:
description: Whether to validate the passthrough gateway
default: "true"
default: "false"
actions:
- description: Validate the Istio Admin Gateway
wait:
Expand Down
13 changes: 11 additions & 2 deletions src/test/tasks.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Copyright 2024 Defense Unicorns
# SPDX-License-Identifier: AGPL-3.0-or-later OR LicenseRef-Defense-Unicorns-Commercial

includes:
- utils: ../../tasks/utils.yaml

tasks:
- name: validate
actions:
Expand All @@ -20,9 +23,14 @@ tasks:

- name: create-deploy
description: Test app used for UDS Core validation
inputs:
architecture:
description: "System architecture that the test-apps package should be built for."
default: ${UDS_ARCH}

actions:
- description: Create zarf package for the test resources
cmd: "uds zarf package create src/test --confirm --no-progress --skip-sbom"
cmd: uds zarf package create src/test --confirm --no-progress --skip-sbom -a ${{ index .inputs "architecture" }}

- description: Deploy the test resources
cmd: "uds zarf package deploy build/zarf-package-uds-core-test-apps-*.zst --confirm --no-progress"
Expand Down Expand Up @@ -116,9 +124,10 @@ tasks:

- description: Verify the authservice tenant app is protected by checking redirect
maxRetries: 3
task: utils:tenant-gw-ip
cmd: |
set -e
SSO_REDIRECT=$(uds zarf tools kubectl run curl-test --image=cgr.dev/chainguard/curl:latest -q --restart=Never --rm -i -- -Ls -o /dev/null -w %{url_effective} "https://protected.uds.dev")
SSO_REDIRECT=$(uds zarf tools kubectl run curl-test --image=cgr.dev/chainguard/curl:latest -q --restart=Never --rm -i -- --resolve 'protected.uds.dev:$TENANT_GW_IP:443' -Ls -o /dev/null -w %{url_effective} "https://protected.uds.dev")

case "${SSO_REDIRECT}" in
"https://sso.uds.dev"*)
Expand Down
10 changes: 5 additions & 5 deletions src/velero/tasks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -47,26 +47,26 @@ tasks:
- description: wait for the backup object
wait:
cluster:
kind: Backup
kind: backup.velero.io
name: ${BACKUP_NAME}
namespace: velero
- description: check the status of the backup object
cmd: |-
STATUS=$(uds zarf tools kubectl get backups -n velero ${BACKUP_NAME} -o jsonpath='{.status.phase}')
STATUS=$(uds zarf tools kubectl get backup.velero.io -n velero ${BACKUP_NAME} -o jsonpath='{.status.phase}')
if [ ${STATUS} != "Completed" ]; then
echo "Status is '$STATUS'... waiting to see if it changes"

# local testing indicates the status is "Finalizing" for a few seconds after completion
sleep 30

# check again...
STATUS=$(uds zarf tools kubectl get backups -n velero ${BACKUP_NAME} -o jsonpath='{.status.phase}')
STATUS=$(uds zarf tools kubectl get backup.velero.io -n velero ${BACKUP_NAME} -o jsonpath='{.status.phase}')
if [ ${STATUS} != "Completed" ]; then
echo "Status is $STATUS... something isn't right.."

# get backup object
uds zarf tools kubectl get backups -n velero ${BACKUP_NAME} -o yaml
uds zarf tools kubectl get backups -A -o yaml
uds zarf tools kubectl get backup.velero.io -n velero ${BACKUP_NAME} -o yaml
uds zarf tools kubectl get backup.velero.io -A -o yaml
echo "::endgroup::"

# get backupstoragelocations
Expand Down
9 changes: 6 additions & 3 deletions tasks/iac.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Copyright 2024 Defense Unicorns
# SPDX-License-Identifier: AGPL-3.0-or-later OR LicenseRef-Defense-Unicorns-Commercial

includes:
- util: ./utils.yaml

variables:
- name: CLUSTER_NAME
Expand Down Expand Up @@ -118,18 +120,19 @@ tasks:
break
fi
done

# wait for cluster components
while true; do
if [ $(uds zarf tools kubectl get po,job -A --no-headers=true | egrep -v 'Running|Complete' | wc -l) -gt 0 ]; then
if [ $(uds zarf tools kubectl get po,job -A --no-headers=true | egrep -v 'helm-install|Running|Complete' | wc -l) -gt 0 ]; then
echo "Waiting for cluster components to be ready...";
sleep 5;
else
echo "Cluster is ready"
break
fi
done
uds zarf tools kubectl apply -f ./metallb.yaml
#uds zarf tools kubectl apply -f ./metallb.yaml
- task: util:rke2-coredns-setup
- task: util:rke2-allow-prom-kube-dns
dir: .github/test-infra/aws/rke2/
maxTotalSeconds: 600

Expand Down
Loading
Loading