The Kubernetes Production Readiness Checklist Engineers Actually Use
A practitioner's checklist for taking a Kubernetes cluster from “it works on my laptop” to “I am happy to be on call for this.”

Control plane and node baselines
Use a managed control plane unless you have a specific reason not to. The operational overhead of self-managed control planes is rarely justified outside hyperscaler-adjacent organizations. Standardize on three node pools: a small system pool, a general workload pool, and at least one pool with the specialized hardware or taints your workloads need.
Pin Kubernetes versions intentionally and plan upgrades quarterly. Skipping more than two minor versions turns a routine upgrade into a project.
Identity, secrets, and admission control
Workload identity federation removes the need for long-lived service account keys. Combine it with a secrets manager — AWS Secrets Manager, Azure Key Vault, or HashiCorp Vault — surfaced through the Secrets Store CSI driver.
Use an admission controller — OPA Gatekeeper or Kyverno — to enforce baseline policies: no privileged containers, required resource limits, required liveness and readiness probes, mandatory labels for cost allocation.
Observability
Three pillars apply: metrics with Prometheus and a long-term store such as Thanos, Mimir, or a managed equivalent; logs to a centralized destination with retention that matches your incident response needs; traces with OpenTelemetry instrumentation.
Dashboards are a starting point. The investment that pays off is service-level objectives with alerting tied to error budget burn rate, not to raw error counts.
Networking and ingress
Use the Gateway API, not legacy Ingress, for new clusters. It gives you cleaner separation of concerns and broader support for advanced routing.
Decide explicitly whether you need a service mesh. If service-to-service mTLS, identity-based authorization, or fine-grained traffic shaping are requirements, the operational cost of Istio or Linkerd is justified. If they are not, do not pay it.
Disaster recovery and backups
Cluster state lives in etcd, but workload state lives in persistent volumes and external systems. Back up both. Test the restore quarterly against a fresh cluster — a backup you have never restored is not a backup.
Reader questions, answered
Do we need a service mesh?+
Only if you have a concrete requirement for mTLS, identity-based authorization, or advanced traffic management. The operational cost is real.
Self-managed or managed Kubernetes?+
Managed unless you have a hyperscaler-class operations team or a regulatory constraint that forces self-managed.

Raza Ahmad is a technology author and IT infrastructure specialist based in Melbourne, Australia. He writes practitioner-grade guides on cloud computing (Azure and AWS), cybersecurity, enterprise networking with Cisco platforms, Linux administration, DevOps, and virtualization. His work focuses on translating complex infrastructure topics into clear, accurate guidance that engineers, system administrators, and IT decision makers can put to work in production environments. Every article published under his byline is fact-checked against current vendor documentation, official standards, and Raza's own hands-on experience operating the technologies he covers.
More from Programming & Development

Terraform vs Pulumi: Which Infrastructure-as-Code Tool Should You Choose?
A working engineer's comparison of the two leading IaC platforms based on real deployments at scale.

The Complete Linux Administration Guide for Production Servers
A working systems administrator's reference for installing, hardening, monitoring, and troubleshooting Linux servers in real production environments.

The Complete DevOps Guide for Modern Engineering Teams
A pragmatic DevOps reference covering CI/CD, infrastructure as code, observability, and the cultural practices that separate high-performing teams from struggling ones.
One email. The technology stories that actually matter for engineers.
A curated digest of the week's most useful tutorials, reviews, and analysis — no clickbait, no AI summaries of someone else's work.
Free. Unsubscribe anytime. See our privacy policy.