Why minDomains Is Critical for TopologySpreadConstraints on EKS with EBS
Background
When deploying a StatefulSet (e.g., a 3-replica etcd cluster) on EKS with Karpenter and EBS-backed PVCs, topologySpreadConstraints with DoNotSchedule may silently fail to distribute pods across Availability Zones. All pods and their EBS volumes can end up in a single AZ, completely defeating the high-availability intent.
The Configuration That Failed
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/name: etcd
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotScheduleExpected: 3 pods spread across 3 AZs (1-1-1). Actual: All 3 pods scheduled in us-east-1a.
Root Cause: minDomains Defaults to 1
The Kubernetes scheduler only counts topology domains from nodes that already exist. When Karpenter batch-creates nodes in a single AZ, the scheduler sees only one eligible domain.
With one domain, maxSkew is calculated as:
skew = max_pods_in_any_zone - min_pods_in_any_zone = 3 - 3 = 0A skew of 0 always satisfies maxSkew: 1, so DoNotSchedule never fires. The constraint is technically met — it just doesn't do what you expect.
This happens because minDomains defaults to 1. The scheduler only requires at least 1 topology domain to exist, and with all nodes in a single AZ, that requirement is trivially satisfied.
The Fix: Set minDomains Explicitly
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/name: etcd
maxSkew: 1
minDomains: 3
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotScheduleWith minDomains: 3, the scheduler requires at least 3 zone domains before it considers the constraint satisfiable. If only 1 AZ has nodes, the second pod cannot be scheduled — it stays Pending. Karpenter then sees the Pending pod and is forced to launch a node in a different AZ.
Why This Specifically Affects EBS + Karpenter
Three factors combine to create this problem:
-
Karpenter batch-creates nodes. When multiple pods go Pending simultaneously, Karpenter may launch all nodes in whichever AZ is cheapest or has the most capacity.
-
The scheduler only knows existing domains. Unlike
podAntiAffinity, which rejects a node outright,topologySpreadConstraintscalculates skew across known domains. No node in a zone = that zone doesn't exist to the scheduler. -
EBS volumes are zone-locked. Once a PVC binds to an EBS volume in
us-east-1a, that pod can never move to another AZ. Even withWaitForFirstConsumer, if the first scheduling decision is wrong, the volume permanently pins the pod.
Alternative: Use podAntiAffinity with Zone TopologyKey
If minDomains is not available (requires Kubernetes 1.30+ with the feature gate enabled), podAntiAffinity is a stronger and simpler guarantee:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/name: etcd
topologyKey: topology.kubernetes.io/zoneThis tells the scheduler: "no two etcd pods in the same zone, period." The scheduler will reject any node in an AZ that already has an etcd pod, which forces Karpenter to provision nodes in other AZs.
Comparison
| Aspect | topologySpreadConstraints (no minDomains) | topologySpreadConstraints (with minDomains) | podAntiAffinity (zone) |
|---|---|---|---|
| Guarantees cross-AZ spread | No | Yes | Yes |
| Works with single-AZ nodes | Silently allows all pods in 1 AZ | Pods stay Pending until more AZs available | Pods stay Pending until more AZs available |
| Flexibility for >3 replicas | Allows uneven but bounded spread | Allows uneven but bounded spread | Strictly 1 pod per AZ (max = number of AZs) |
| Karpenter awareness | Weak signal | Strong signal (Pending pods) | Strongest signal (hard node rejection) |
Recommended Configuration for StatefulSets with EBS
spec:
template:
spec:
topologySpreadConstraints:
- labelSelector:
matchLabels:
app.kubernetes.io/name: etcd
maxSkew: 1
minDomains: 3
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/name: etcd
topologyKey: kubernetes.io/hostnameThis combines minDomains: 3 for cross-AZ spread with hostname anti-affinity for cross-node spread, providing the strongest possible scheduling guarantee for EBS-backed StatefulSets.