The following is an example on how to deploy an Azure Kubernetes Service cluster within an Azure subscription.
This configuration deploys an AKS cluster (primary) with two node pools:
- The default node pool (
sys) - contains 2 system mode nodes - A User node pool (
usr) - contains 3 user mode nodes
The resource group name is generated with the "random_pet" resource from the hashicorp/random provider; and are prefixed with example
The VM Size for both pools are using Standard_D5_v2 in this example. You can change this to suit your needs by editing the vm_size values in main.tf.
The standard azure networking profile is used.
- Azure CLI installed: az cli install instructions
- Terraform installed: terraform install instructions
- Kubernetes CLI installed: kubectl install instructions
gettextpackage- An active Azure subscription.
- An account in Azure Entra ID.
- Access credentials to the competition Tailscale tailnet.
- Minikube installed, if you want to test the full CRS locally (linux/amd64 system recommended).
This section covers deploying the CRS locally using minikube on macOS or Linux.
- Docker Desktop (macOS/Windows), Colima (macOS/Linux), or Docker Engine (Linux)
- docker-buildx: Required for building images (
brew install docker-buildxon macOS) - minikube:
brew install minikube(macOS) or see minikube install docs - helm:
brew install helm(macOS) - kubectl:
brew install kubectl(macOS)
The Docker runtime must have enough resources allocated for the full CRS deployment.
Docker Desktop (macOS/Windows):
- Open Docker Desktop > Settings > Resources
- Allocate at least:
- CPUs: 4-6 (depending on your machine)
- Memory: 8-12 GB (10GB recommended for full deployment)
- Disk: 80+ GB
- Click "Apply & Restart"
Colima (macOS/Linux):
colima start --cpu 6 --memory 10 --disk 80After starting Colima, ensure the buildx plugin is linked:
brew install docker-buildxdocker-buildx is a Docker plugin. For Docker to find the plugin, add "cliPluginsExtraDirs" to ~/.docker/config.json:
"cliPluginsExtraDirs": [
"/opt/homebrew/lib/docker/cli-plugins"
]Important: Minikube cannot use more resources than the Docker runtime allocates. If env.template specifies MINIKUBE_CPU=6 and MINIKUBE_MEMORY_GB=10, but Docker only has 4 CPUs and 8GB RAM, minikube will fail to start or pods will remain Pending.
cd deployment
cp env.template env
# Edit env to configure:
# - MINIKUBE_CPU/MINIKUBE_MEMORY_GB (match your Docker Desktop resources)
# - OPENAI_API_KEY and/or ANTHROPIC_API_KEY
# - Comment out GHCR_AUTH if building locally
make upCause: Minikube is requesting more resources than Docker Desktop has available.
Solution: Either increase Docker Desktop resources or reduce minikube resources in env:
export MINIKUBE_CPU=4
export MINIKUBE_MEMORY_GB=7Cause: Kubernetes cannot schedule pods due to insufficient CPU or memory.
Solution: This is expected with limited resources. Check which pods are pending:
kubectl get pods -n crs | grep Pending
kubectl describe pod <pod-name> -n crs | grep -A5 EventsFor local development, core services (redis, task-server, scheduler) should run. Fuzzer bots and multiple replicas may remain Pending.
Cause: GHCR_AUTH is set to the placeholder value <your-ghcr-base64-auth>.
Solution: Either set a valid base64-encoded GitHub token or comment out the line:
# export GHCR_AUTH="<your-ghcr-base64-auth>"For local builds, GHCR authentication is not required.
With 8GB RAM allocated to Docker Desktop:
| Resource Setting | Expected Result |
|---|---|
| MINIKUBE_MEMORY_GB=10 | Minikube fails to start |
| MINIKUBE_MEMORY_GB=7 | Minikube starts, some pods Pending |
| MINIKUBE_MEMORY_GB=6 | More pods Pending, core services run |
- Minikube uses QEMU emulation for amd64 images on ARM64 Macs
- Some components may run slower due to emulation overhead
- The
gcr.io/oss-fuzz-base/base-runnerimage is amd64-only - For best performance, use a linux/amd64 machine or cloud VM
# Check all pods
kubectl get pods -n crs
# Check core services are running
kubectl get pods -n crs | grep -E "(redis|task-server|scheduler)"
# View logs for a specific service
kubectl logs -n crs -l app=scheduler --tail=50
# Port forward to access the API locally
kubectl port-forward -n crs service/buttercup-competition-api 31323:1323az login --tenant <your-tenant> - will open authentication in a browser
Show current tenant and subscription name:
az account show --query "{SubscriptionID:id, Tenant:tenantId}" --output table
Example output:
SubscriptionID Tenant
------------------------------------ ------------------------------------
<YOUR-SUBSCRIPTION-ID> c67d49bd-f3ec-4c7f-b9ec-653480365699A service principal account (SPA) is required to automate the creation of resources and objects within your subscription.
- You can create a SPA several ways, the following describes using Azure cli.
az ad sp create-for-rbac --name "ExampleSPA" --role Contributor --scopes /subscriptions/<YOUR-SUBSCRIPTION-ID>Replace "ExampleSPA" with the name of the SPA you wish to create.
Replace<YOUR-SUBSCRIPTION-ID>with your Azure subscription ID.
If using resource group locks, additional configuration may be necessary which is out of scope of this example; e.g. adding the roleMicrosoft.Authorization/locks/for write, read and delete to the SPA.
On successful SPA creation, you will receive output similar to the following:
{
"appId": "34df5g78-dsda1-7754-b9a3-ee699876d876",
"displayName": "ExampleSPA",
"password": "jfhn6~lrQQSH124jfuy96ksv_ILa2q128fhn8s",
"tenant": "n475hfjk-g7hj-77jk-juh7-1234567890ab"
}Make note of these values, they will be needed in the AKS deployment as the following environment variables:
TF_ARM_TENANT_ID="<tenant-value>"
TF_ARM_CLIENT_ID="<appID-value>"
TF_ARM_CLIENT_SECRET="<password-value>"
TF_ARM_SUBSCRIPTION_ID="<YOUR-SUBSCRIPTION-ID>"Copy env.template to env and modify appropriately, following the comments there.
Based on the environment variables specified, you can do different kind of deployments, such as production, staging, local with minikube, etc.
WIP (Current example is using the jmalloc/echo-server image as a PoC)
The crs-webapp image expects the following environment variables to be passed to it for HTTP basic authentication:
CRS_KEY_ID- The CRS's username/IDCRS_KEY_TOKEN- The CRS's passwordCOMPETITION_API_KEY_ID- The competition APIs username/IDCOMPETITION_API_KEY_TOKEN- The competition APIs password
These values can be generated with the following python calls:
key_id:
python3 -c 'import uuid; print(str(uuid.uuid4()))'
key_token:
python3 -c 'import secrets, string; print("".join(secrets.choice(string.ascii_uppercase + string.ascii_lowercase + string.digits) for _ in range(32)))'WIP (Current example is using the jmalloc/echo-server image as a PoC)
You will need to have a GitHub personal access token (PAT) scoped to at least read:packages.
To create the PAT, go to your account, Settings > Developer settings > Personal access tokens, and generate a Token (classic) with the scopes needed for your use case.
For this example, the read:packages scope is required.
Once you have your PAT, you will need to base64 encode it for use within secrets.tf:
echo -n "ghcr_username:ghcr_token" | base64 --wrap=0replace
ghcr_usernameandghcr_tokenwith your GitHub username and your PAT respectively.
Add your base64 encoded credentials to GHCR_AUTH
By default, terraform stores its state locally. It is best practice to store terraform state in a remote location.
This can help with collaboration, security, recovery and scalability. To do this within Azure, you need to create resources to do so.
The following is an example of how to create the resources needed for remote state configuration.
These resources will be used in the backend.tf configuration file.
- Create remote state resource group.
az group create --name example-tfstate-rg --location eastus- Create storage account for remote state.
az storage account create --resource-group example-tfstate-rg --name examplestorageaccountname --sku Standard_LRS --encryption-services blob- Create storage container for remote state
az storage container create --name tfstate --account-name examplestorageaccountname --auth-mode loginReplace the values for resource_group_name, storage_account_name, container_name with the ones you created above.
terraform {
backend "azurerm" {
resource_group_name = "example-tfstate-rg"
storage_account_name = "examplestorageaccountname"
container_name = "tfstate"
key = "terraform.tfstate"
}
}The deployment of the AKS cluster and its resources are performed by the Makefile, which leverages crs-architecture.sh. This wrapper utilizes several mechanisms to properly configure both the teraform and kubernetes environments.
- Log into your Azure tenant with
az login --tenant<your-tenant> - Clone this repository if needed:
git clone git@github.com:tob-challenges/example-crs-architecture.git /<local_dir> - Make required changes to
backend.tf - Make any wanted changes to
main.tf,outputs.tf,providers.tf, andvariables.tf - Update
./envwith accurate values for each variable - Run
make upfrom within theexample-crs-architecturedirectory This will execute a deployment of your cluster via a combination of terraform and kubectl based on your unique values in./env
az aks get-credentials --name <your-cluster-name> --resource-group <your-resource-group>- retrieves access credentials and updates kubeconfig for the current clusterkubectl get namespaces- lists all namespaceskubectl get -n crs-webservice all- lists all resources within the crs-webservice namespacekubectl get -n tailscale all- lists all resources within the tailscale namespacekubectl get pods -A- lists all pods in all namespaceskubectl config get-contexts- lists the current contexts in your kubeconfigkubectl describe deployment -n crs-webservice crs-webapp- print detailed information about the deployment, crs-webappkubectl logs <podName>- retrieves the stdout and stderr streams from the containers within the specified podkubectl get svc -A- lists status of all serviceskubectl get -n crs-webservice ingress- lists the tailscale ingress address of your APIkubectl port-forward -n crs service/buttercup-competition-api 31323:1323- make competition-api service available locally at port 31323
terraform state list- lists all resources in the deployment.terraform state show '<resource>'- replace<resource>with the resource you want to view from thelistcommand
To tear down your AKS cluster run the following:
- Run
make downfrom within theexample-crs-architecturedirectory
To reset or redeploy a running CRS cluster:
- Run
make down && make upfrom within theexample-crs-architecturedirectory
Optionally, you can run make clean to remove any generated files create from the included templates at runtime. This action is executed during make up.
See CLEAN.md.