/kind bug
1. What kops version are you running? The command kops version, will display
this information.
Client version: 1.35.0 (git-v1.35.0)
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
Tested using multiple version of Kubernetes, including v1.32.11 and v1.34.5
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops replace -f cluster.yaml --force
kops update cluster --yes --admin
kops validate cluster --wait 30m
5. What happened after the commands executed?
Fresh cluster never completes validation. In the cluster, the two pods for aws-load-balancer-controller are stuck at ContainerCreating with the Event MountVolume.SetUp failed for volume "cert" : secret "aws-load-balancer-webhook-tls" not found message in the pod events list
Sometimes it works without intervention (some sort of race condition?)
If I leave the aws LB controller disabled (spec.awsLoadBalancerController.enabled = false) then update to set this to true and rollout after validation completes the pods then come up without issue.
6. What did you expect to happen?
Cluster to create successfully without having to do subsequent edits of the cluster config after the cluster is spun up.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
apiVersion: kops.k8s.io/v1alpha2
kind: Cluster
metadata:
name: kops.mydomain.com
spec:
awsLoadBalancerController:
enabled: true
cpuRequest: "100m"
cpuLimit: "200m"
memoryRequest: "200Mi"
memoryLimit: "500Mi"
certManager:
enabled: true
managed: true
clusterAutoscaler:
enabled: true
expander: least-waste
balanceSimilarNodeGroups: false
emitPerNodegroupMetrics: false
awsUseStaticInstanceList: false
scaleDownUtilizationThreshold: "0.5"
skipNodesWithCustomControllerPods: true
skipNodesWithLocalStorage: true
skipNodesWithSystemPods: true
newPodScaleUpDelay: 0s
scaleDownDelayAfterAdd: 10m0s
scaleDownUnneededTime: 10m0s
scaleDownUnreadyTime: 20m0s
cpuRequest: "100m"
memoryRequest: "300Mi"
metricsServer:
enabled: true
insecure: true
nodeTerminationHandler:
cpuRequest: 200m
enabled: true
enableSQSTerminationDraining: true
managedASGTag: "aws-node-termination-handler/managed"
prometheusEnable: true
api:
loadBalancer:
class: Network
type: Public
authorization:
rbac: {}
channel: stable
cloudProvider: aws
configBase: s3://kops-bucket/kops.mydomain.com
dnsZone: mydomain.com
etcdClusters:
- cpuRequest: 200m
etcdMembers:
- encryptedVolume: true
instanceGroup: control-plane-eu-west-2a
name: a
- encryptedVolume: true
instanceGroup: control-plane-eu-west-2b
name: b
- encryptedVolume: true
instanceGroup: control-plane-eu-west-2c
name: c
manager:
backupRetentionDays: 90
memoryRequest: 100Mi
name: main
- cpuRequest: 100m
etcdMembers:
- encryptedVolume: true
instanceGroup: control-plane-eu-west-2a
name: a
- encryptedVolume: true
instanceGroup: control-plane-eu-west-2b
name: b
- encryptedVolume: true
instanceGroup: control-plane-eu-west-2c
name: c
manager:
backupRetentionDays: 90
memoryRequest: 100Mi
name: events
iam:
allowContainerRegistry: true
legacy: false
kubeProxy:
enabled: false
kubelet:
anonymousAuth: false
kubernetesApiAccess:
- 0.0.0.0/0
- ::/0
# Use 1.32.11 is known good
# kubernetesVersion: 1.34.5
kubernetesVersion: 1.32.11
masterPublicName: api.kops.mydomain.com
networkCIDR: 172.20.0.0/16
networking:
cilium:
enableNodePort: true
nonMasqueradeCIDR: 100.64.0.0/10
sshAccess:
- 0.0.0.0/0
- ::/0
subnets:
- cidr: 172.20.64.0/18
name: eu-west-2a
type: Private
zone: eu-west-2a
- cidr: 172.20.128.0/18
name: eu-west-2b
type: Private
zone: eu-west-2b
- cidr: 172.20.192.0/18
name: eu-west-2c
type: Private
zone: eu-west-2c
- cidr: 172.20.0.0/21
name: utility-eu-west-2a
type: Utility
zone: eu-west-2a
- cidr: 172.20.8.0/21
name: utility-eu-west-2b
type: Utility
zone: eu-west-2b
- cidr: 172.20.16.0/21
name: utility-eu-west-2c
type: Utility
zone: eu-west-2c
topology:
bastion:
bastionPublicName: bastion.kops.mydomain.com
dns:
type: Public
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: kops.mydomain.com
name: control-plane-eu-west-2a
spec:
mixedInstancesPolicy:
onDemandAboveBase: 0
onDemandBase: 0
spotAllocationStrategy: price-capacity-optimized
instances:
- t3.small
- t3.medium
- t3.large
- t3.xlarge
# maxPrice: "0.025"
associatePublicIp: true
image: 099720109477/ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20251212
machineType: t3.small
maxSize: 1
minSize: 1
role: Master
subnets:
- eu-west-2a
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: kops.mydomain.com
name: control-plane-eu-west-2b
spec:
mixedInstancesPolicy:
onDemandAboveBase: 0
onDemandBase: 0
spotAllocationStrategy: price-capacity-optimized
instances:
- t3.small
- t3.medium
- t3.large
- t3.xlarge
# maxPrice: "0.025"
associatePublicIp: true
image: 099720109477/ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20251212
machineType: t3.small
maxSize: 1
minSize: 1
role: Master
subnets:
- eu-west-2b
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: kops.mydomain.com
name: control-plane-eu-west-2c
spec:
mixedInstancesPolicy:
onDemandAboveBase: 0
onDemandBase: 0
spotAllocationStrategy: price-capacity-optimized
instances:
- t3.small
- t3.medium
- t3.large
- t3.xlarge
# maxPrice: "0.025"
associatePublicIp: true
image: 099720109477/ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20251212
machineType: t3.small
maxSize: 1
minSize: 1
role: Master
subnets:
- eu-west-2c
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: kops.mydomain.com
name: nodes-eu-west-2a
spec:
mixedInstancesPolicy:
onDemandAboveBase: 0
onDemandBase: 0
spotAllocationStrategy: price-capacity-optimized
instances:
- t3.medium
- t3.large
- t3.xlarge
# maxPrice: "0.025"
associatePublicIp: true
image: 099720109477/ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20251212
machineType: t3.medium
maxSize: 5
minSize: 1
role: Node
subnets:
- eu-west-2a
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: kops.mydomain.com
name: nodes-eu-west-2b
spec:
mixedInstancesPolicy:
onDemandAboveBase: 0
onDemandBase: 0
spotAllocationStrategy: price-capacity-optimized
instances:
- t3.medium
- t3.large
- t3.xlarge
# maxPrice: "0.025"
associatePublicIp: true
image: 099720109477/ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20251212
machineType: t3.medium
maxSize: 5
minSize: 1
role: Node
subnets:
- eu-west-2b
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: kops.mydomain.com
name: nodes-eu-west-2c
spec:
mixedInstancesPolicy:
onDemandAboveBase: 0
onDemandBase: 0
spotAllocationStrategy: price-capacity-optimized
instances:
- t3.medium
- t3.large
- t3.xlarge
# maxPrice: "0.025"
associatePublicIp: true
image: 099720109477/ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20251212
machineType: t3.medium
maxSize: 5
minSize: 1
role: Node
subnets:
- eu-west-2c
---
apiVersion: kops.k8s.io/v1alpha2
kind: InstanceGroup
metadata:
labels:
kops.k8s.io/cluster: kops.mydomain.com
name: bastions
spec:
mixedInstancesPolicy:
onDemandAboveBase: 0
onDemandBase: 0
spotAllocationStrategy: price-capacity-optimized
instances:
- t3.micro
- t3.small
- t3.medium
- t3.large
- t3.xlarge
# maxPrice: "0.025"
associatePublicIp: true
image: 099720109477/ubuntu/images/hvm-ssd-gp3/ubuntu-noble-24.04-amd64-server-20251212
machineType: t3.micro
maxSize: 5
minSize: 1
role: Bastion
subnets:
- eu-west-2a
- eu-west-2b
- eu-west-2c
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
Available if needed
9. Anything else do we need to know?
/kind bug
1. What
kopsversion are you running? The commandkops version, will displaythis information.
2. What Kubernetes version are you running?
kubectl versionwill print theversion if a cluster is running or provide the Kubernetes version specified as
a
kopsflag.3. What cloud provider are you using?
4. What commands did you run? What is the simplest way to reproduce this issue?
5. What happened after the commands executed?
6. What did you expect to happen?
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yamlto display your cluster manifest.You may want to remove your cluster name and other sensitive information.
8. Please run the commands with most verbose logging by adding the
-v 10flag.Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know?