Part 9. Distributed environment deployment: Leader Election, Kubernetes CronJob, and lock strategy comparison
We present criteria for selecting an execution entity/lock strategy to prevent duplicate execution of batches in multi-instances and control operational complexity.
Series: Spring Boot 배치 전략 완전 정복
총 12편 구성. 현재 9편을 보고 있습니다.
- 01Part 1. Nature and classification of batches: schedule, event, manual, bulk, near-real-time
- 02Part 2. @Scheduled in action: The price of simplicity and the multi-instance pitfalls
- 03Part 3. Quartz cluster architecture: JobStore, Misfire, large-scale schedule management
- 04Part 4. Spring Batch core: Chunk, transaction boundary, restartable job design
- 05Part 5. Spring Batch Extension: Tradeoff between Partition and Multi-threaded Step
- 06Part 6. Manual Deployment Strategy: REST Triggers, Admin UI, Parameter Reprocessing, Rollback
- 07Part 7. DB Bulk Search Strategy: OFFSET/LIMIT Limits and Keyset, ID Range, Covering Index
- 08Part 8. OpenSearch/Elasticsearch Deployment Strategy: Scroll, Search After, PIT, Bulk, Rollover
- 09Part 9. Distributed environment deployment: Leader Election, Kubernetes CronJob, and lock strategy comparisonCURRENT
- 10Part 10. Performance Optimization: Batch Size, Commit Interval, JVM Memory, Backpressure
- 11Part 11. Failure response architecture: Partial Failure, Poison Data, DLQ, Retry, Idempotence
- 12Part 12. Integrated reference architecture and final selection guide

Source: Pexels - Row of blue shipping containers
Based on version
- Java 21
- Spring Boot 3.3.x
- Spring Batch 5.2.x
- Quartz 2.3.x
- PostgreSQL 15
- OpenSearch 2.x
1) Raise a problem
The most common deployment failure in a distributed environment is “duplicate execution.” The moment an application scales to 10 pods, the deployment triggers can be multiplied by 10. This problem arises not because of technology selection, but because the “executing agent” is not clearly defined.
There are two key questions.
- Who will trigger the deployment? (Kubernetes vs application internal)
- Where do we create a guarantee that only one will run at the same time? (Lock/Leader election)
2) Summary of key concepts
Kubernetes CronJob vs in-app scheduler
| standards | Kubernetes CronJob | In-app scheduler (@Scheduled/Quartz) |
|---|---|---|
| Execution entity | Platform (K8s) | Application |
| Deployment/Restart Independence | High | Significant impact on app life cycle |
| Code Proximity | Low (External Job Pod) | High (same codebase) |
| Operational Difficulty | K8s dependency | App code/lock design dependencies |
| Recommended Situation | Simple independent deployment, infrastructure standardization | Placement closely related to domain logic |
Comparison of lock/leader election strategies
| Strategy | Advantages | Disadvantages | Recommended Situation |
|---|---|---|---|
| DB lock | Strong consistency, no additional infrastructure | Increased DB contention | Low trigger frequency and DB-centric |
| Redis lock | Fast and scalable | TTL/split brain considerations | High-frequency operation |
| Zookeeper/etcd elected | Strong in leader election | High operational complexity | Large platform team |
| K8s Lease | Kubernetes friendly | K8s dependent | K8s standard environment |
Distributed execution control diagram

Source: Pexels - Container ships at cargo port
3) Code example
Example A: Kubernetes CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: settlement-job
spec:
schedule: "*/5 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 5
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: settlement
image: my-registry/batch:1.0.0
args: ["--job=settlement"]
Example B: DB-based leader lock
CREATE TABLE batch_leader_lock (
lock_name VARCHAR(100) PRIMARY KEY,
holder_id VARCHAR(100) NOT NULL,
expires_at TIMESTAMP NOT NULL
);
-- 리더 선점 시도
INSERT INTO batch_leader_lock (lock_name, holder_id, expires_at)
VALUES ('settlement', :holder_id, NOW() + INTERVAL '30 second')
ON CONFLICT (lock_name)
DO UPDATE SET holder_id = EXCLUDED.holder_id,
expires_at = EXCLUDED.expires_at
WHERE batch_leader_lock.expires_at < NOW();
Example C: Check and execute lock in Spring service
public void runIfLeader() {
boolean acquired = leaderLockRepository.tryAcquire("settlement", instanceId, Duration.ofSeconds(30));
if (!acquired) {
return;
}
try {
settlementService.execute();
} finally {
leaderLockRepository.release("settlement", instanceId);
}
}
Example D: Execution history keyset query
SELECT id, job_name, instance_id, status, started_at
FROM batch_job_execution
WHERE job_name = 'settlement'
AND id > :last_id
ORDER BY id ASC
LIMIT 200;
4) Real-world failure/operational scenarios
Situation: Redis lock update failed due to network split, and both instances decided to be the leaders and executed the batch simultaneously (split-brain).
Cause:
- There was only lock acquisition and no fencing token.
- The child writer did not verify the “Only the latest token is valid” rule.
- The lock TTL was short, so re-election occurred frequently during temporary delays.
Improvements:
- Store monotonically increasing tokens when issuing a leader lock.
- Reject low token operation after token verification in Writer.
- Reset lock TTL and heartbeat interval based on network delay P99.
5) Design Checklist
- Has the execution entity been clearly defined as either K8s CronJob or within the app?
-
concurrencyPolicyOr is concurrent execution blocked by a distributed lock? - Are fencing tokens applied to leader election?
- Is there an interruption and recovery procedure in case of lock expiration/renewal failure?
- Do you track execution history and leader change history?
- Did you select a locking technology (DB/Redis/Zookeeper) that suits the platform team capabilities?
6) Summary
The core of distributed deployment design is not “execute only once,” but a structure that is “safe even if duplicate execution occurs.” Operational risks can be controlled by designing the execution entity, lock, idempotency, and fencing tokens together.
7) Next episode preview
The next part deals with performance optimization. Batch size, commit interval, JVM memory calculation, and backpressure design are explained with actual numerical models.
Reference link
- Spring Batch Reference
- Quartz Scheduler Documentation
- PostgreSQL Transaction Isolation
- 블로그: Idempotency Key API 설계