Introduction: The morning I forgot to back‑up the database
A few years ago, around 7 a.m., I spilled coffee on my keyboard while frantically trying to kick‑off a manual database dump before the workday stampede. I succeeded—but only after the caffeine trickled into the space bar. That was the day I muttered: “Never again. Let the robots remember the schedule.”
![What Exactly Is a CronJob in Kubernetes?]()
If you’ve ever shared that panic, Kubernetes CronJobs are your new best friend. In essence, a CronJob is the Kubernetes‑native way to run a Job—a one‑off Pod that exits when work is finished—on a recurring schedule you define with familiar cron
syntax. CronJobs graduated to the stable batch/v1
API back in v1.21 and have been quietly gaining creature comforts (like native time‑zone support) ever since.
Why CronJobs matter (even if you already know crontab)
Traditional crontab
entries live on a single node. If that box restarts—or you rebuild your infrastructure—your tasks vanish. CronJobs, by contrast, live in etcd along with the rest of your cluster state. They survive node failures, cluster upgrades, and even a coffee tsunami on your laptop. Kubernetes handles when to start each Job, tracks history, enforces concurrency rules, and lets you scale the underlying Pods like any other workload.
Core concepts in plain English
Setting |
What it does |
Real‑world analogy |
schedule |
Five‑field cron expression that tells Kubernetes when to run. |
Setting an alarm on your phone. |
concurrencyPolicy |
Allow , Forbid , or Replace —Controls what happens if a new run collides with one already in progress. |
Do you brew a second pot of coffee if the first pot isn’t finished? |
startingDeadlineSeconds |
Time window to catch up if the cluster misses a start time. |
Snooze window after a missed alarm. |
successfulJobsHistoryLimit / failedJobsHistoryLimit |
How many past runs Kubernetes keeps. |
Cleaning out yesterday’s to‑do list. |
suspend |
Pauses future runs without deleting the CronJob. |
Sliding the alarm toggle off while on vacation. |
timeZone (v1.27+) |
Let's you pin the schedule to a specific zone. |
Ensuring a meeting invite shows up at 10 a.m. your time, not UTC. |
(We’ll peek at each of these in action in a minute.)
A quick YAML tour
Below is a trimmed example I use in workshops to prune old log files. Copy‑paste it into a file named log‑cleanup.yaml
, tweak the image, and apply with kubectl apply -f log‑cleanup.yaml
.
apiVersion: batch/v1
kind: CronJob
metadata:
name: log-cleanup
spec:
schedule: "0 2 * * *" # 2 a.m. every day
timeZone: "Asia/Kolkata" # new in v1.27+
startingDeadlineSeconds: 300 # give it 5 min if it slips
concurrencyPolicy: Forbid # skip if yesterday is still running
successfulJobsHistoryLimit: 5
failedJobsHistoryLimit: 10
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: cleanup
image: alpine:3.20
command: ["sh","-c","find /var/log -type f -mtime +7 -delete"]
Notice how the Time‑Zone line eliminates mental gymnastics when daylight‑saving rules change.
How the controller thinks
- Schedules are calculated in advance, and CronJobs don’t poll every second; they pre‑compute the next run.
- At tick‑time, a Job is created; the Job controller then spins up Pods.
- If a prior run is still active,
concurrencyPolicy
decides the outcome:
Allow:
start anyway (default).
Forbid:
Skip this run.
Replace:
Terminate the old Job and launch a fresh one.
- Missed windows—If the cluster hiccups and your Job starts late, Kubernetes checks
startingDeadlineSeconds
. Exceeded? The run is abandoned and recorded as “missed.”
- Garbage collection—Once the run ends, Kubernetes prunes old Job objects based on your history limits, keeping your API clean.
Personal anecdote #2: The accidental billing storm
I once set concurrencyPolicy: Allow
on a nightly reporting CronJob that exported data to BigQuery. A network blip slowed one run, and the next day’s job started on time—doubling my cost and clogging downstream dashboards. Lesson learned: Forbid is your friend when jobs are expensive or stateful.
Three beginner‑friendly use cases
-
Database Backups
Schedule: 0 */6 * * *
(every six hours)
Gotcha: pipe mysqldump
straight into gzip
to shrink network egress.
-
Static site rebuilds
Schedule: */15 * * * *
(quarter‑hour) for near‑real‑time blogs.
Tip: add an environment variable like GIT_COMMIT=$(date)
So the job’s Pod gets a new hash each run and avoids image‑pull caching issues.
-
Data cleanup
Schedule: 0 3 * * 0
(Sunday 3 a.m.)
Insight: combine with Replace
to guarantee exactly one cleaning pass, even if the last one hung.
These patterns work the same whether you’re on vanilla Kubernetes, GKE, EKS, or OpenShift—the object spec is universal.
New goodies in modern clusters
Feature |
Introduced |
Why you care |
Native Time‑Zone field (spec.timeZone ) |
GA in v1.27 |
No more guessing if UTC means “tomorrow” or “today” when daylight saving flips. |
Sidecar containers for Jobs |
v1.28 |
Perfect for shipping logs from each run to S3 without custom bash glue. |
CronJob metrics in kube‑state‑metrics |
v1.29 |
View missed schedules and active runs in Prometheus/Grafana dashboards. |
Controller performance tweaks |
Ongoing (v1.30‑1.32) |
Reduced API‑server calls; large clusters with thousands of CronJobs now stay responsive. |
Common pitfalls & how to dodge them
Pitfall |
Symptom |
Fix |
Forgot restartPolicy: Never |
Job Pods keep restarting on failure loops. |
Set restartPolicy: OnFailure or Never . |
Overlapping runs |
Duplicate data, ballooning resource usage. |
Use concurrencyPolicy: Forbid/Replace . |
Missed runs after suspend |
CronJob wakes up but skips several schedules. |
Acceptable; missed windows are counted, not replayed. |
Pods never cleaned up |
Thousands of completed Pods clutter kubectl get pods . |
Tune history limits or use a namespace‑level TTL. |
Wrong time zone |
Job fires an hour late/early. |
Add timeZone: or ensure cluster‑wide offset matches expectation. |
Monitoring and observability
- Prometheus: The
kube_cronjob_next_schedule_time
and kube_cronjob_status_last_schedule_time
metrics let you alert on delays > 5 min.
- kubectl:
kubectl get cj log-cleanup -o yaml
surfaces lastScheduleTime
, active
, and lastSuccessfulTime
.
- Logs: Each Job owns its own Pod(s), so
kubectl logs job/log-cleanup-2748293
remains the quickest debugging path.
When not to use a CronJob
If your workload is:
- Millisecond‑sensitive (e.g., financial trades),
- Continuously streaming (ETL pipelines), or
- Longer than the interval between runs,
…consider a dedicated controller or a message queue instead. CronJobs excel at discrete, periodic tasks, not infinite loops.
Conclusion: Your cluster’s dependable timekeeper
CronJobs turn Kubernetes from a 24/7 application runner into a full‑blown scheduling platform. By declaring what you want done and when, you hand operational drudgery to the control plane—and reclaim your mornings from coffee‑soaked keyboard chaos.
Whether you’re rotating logs once a week, rebuilding Docker images nightly, or distributing swag to conference attendees every quarter hour (true story!), the pattern is the same: write a tiny YAML, commit it to Git, and let Kubernetes remember instead of you.
So the next time someone asks, “Did we run the cleanup script?” you can smile, sip that coffee calmly, and answer: “My cluster has already handled it—right on schedule.”
Reference