Kubernetes gives you enormous flexibility — but that flexibility comes with a cost trap. Clusters that start small balloon quickly when teams over-provision, leave idle workloads running, or skip resource limits entirely.
This post covers the exact strategies we use at CogniVeu to help clients cut their Kubernetes cloud bills by 30–40% without sacrificing reliability.
1. Right-size your resource requests
The single biggest source of waste is over-provisioned requests. If every pod requests 2 CPU but only uses 0.3, you are paying for 6x more nodes than you need.
How to fix it:
- Deploy Goldilocks — it runs VPA in recommendation mode and surfaces right-size suggestions per workload in a dashboard.
- Set
requeststo p95 of actual usage, not peak. Usekubectl top podsor Prometheuscontainer_cpu_usage_seconds_totalto measure. - Set
limitsonly when you need hard caps (latency-sensitive services). For batch workloads, skip limits to allow bursting on spare capacity.
# Before: over-provisioned
resources:
requests:
cpu: "2"
memory: "4Gi"
# After: right-sized
resources:
requests:
cpu: "300m"
memory: "512Mi"2. Use Spot / Preemptible nodes for batch and stateless workloads
Spot instances on AWS (or Preemptible on GKE) cost 60–80% less than On-Demand. Most stateless services, CI runners, and batch jobs can tolerate interruptions.
Karpenter makes this easy:
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: spot-pool
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m5.large", "m5.xlarge", "m5a.large"]
nodeClassRef:
name: default
disruption:
consolidationPolicy: WhenUnderutilizedKarpenter will prefer Spot and fall back to On-Demand automatically.
3. Enable cluster autoscaler consolidation
Idle nodes are expensive. Both Cluster Autoscaler and Karpenter can consolidate underutilised nodes.
- Karpenter: Set
consolidationPolicy: WhenUnderutilized(shown above) — it will drain and terminate nodes that are below 10% utilisation. - Cluster Autoscaler: Set
--scale-down-utilization-threshold=0.5to trigger scale-down when node utilisation drops below 50%.
4. Schedule non-critical workloads with low-priority classes
Use PriorityClass to ensure dev/staging workloads are evicted first during node pressure, making room for production without adding nodes.
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority
value: 100
preemptionPolicy: Never
globalDefault: false5. Namespace-level chargeback with Kubecost
You can't optimise what you can't measure. Kubecost gives you per-namespace, per-team, per-label cost breakdowns with no code changes.
Once teams see their own costs, over-provisioning drops dramatically.
Results
Applying these five techniques across a 40-node EKS cluster for one of our clients produced:
| Technique | Monthly Saving |
|---|---|
| Right-sizing requests | ~$1,200 |
| Spot instance migration (70% of workloads) | ~$2,800 |
| Consolidation (removed 8 idle nodes) | ~$960 |
| Total | ~$4,960 / month |
Conclusion
Kubernetes cost optimisation is not a one-time exercise — it is a continuous practice. Start with right-sizing (highest ROI, lowest risk), add Karpenter for intelligent node provisioning, and use Kubecost to maintain visibility as the cluster grows.
Need help implementing this for your cluster? Get in touch with CogniVeu.
