Part 12. Integrated reference architecture and final selection guide
We present a final decision matrix and reference architecture that combines @Scheduled, Quartz, Spring Batch, and manual deployment to suit actual operating conditions.
Series: Spring Boot 배치 전략 완전 정복
총 12편 구성. 현재 12편을 보고 있습니다.
- 01Part 1. Nature and classification of batches: schedule, event, manual, bulk, near-real-time
- 02Part 2. @Scheduled in action: The price of simplicity and the multi-instance pitfalls
- 03Part 3. Quartz cluster architecture: JobStore, Misfire, large-scale schedule management
- 04Part 4. Spring Batch core: Chunk, transaction boundary, restartable job design
- 05Part 5. Spring Batch Extension: Tradeoff between Partition and Multi-threaded Step
- 06Part 6. Manual Deployment Strategy: REST Triggers, Admin UI, Parameter Reprocessing, Rollback
- 07Part 7. DB Bulk Search Strategy: OFFSET/LIMIT Limits and Keyset, ID Range, Covering Index
- 08Part 8. OpenSearch/Elasticsearch Deployment Strategy: Scroll, Search After, PIT, Bulk, Rollover
- 09Part 9. Distributed environment deployment: Leader Election, Kubernetes CronJob, and lock strategy comparison
- 10Part 10. Performance Optimization: Batch Size, Commit Interval, JVM Memory, Backpressure
- 11Part 11. Failure response architecture: Partial Failure, Poison Data, DLQ, Retry, Idempotence
- 12Part 12. Integrated reference architecture and final selection guideCURRENT

Source: Pexels - Server racks on data center
Based on version
- Java 21
- Spring Boot 3.3.x
- Spring Batch 5.2.x
- Quartz 2.3.x
- PostgreSQL 15
- OpenSearch 2.x
1) Raise a problem
In practice, common failures begin with “attempting to solve all deployments with one tool,” rather than with the technology choice itself. On-time settlement, re-indexing, operator reprocessing, and event correction have different requirements. Therefore, there is no single correct answer. The answer is a combination.
The goals of this final episode are two-fold.
- Quickly classify requirements and select an appropriate execution model.
- Provide realistic standards that consider failure/performance/operational complexity together.
2) Summary of key concepts
Integrated Selection Matrix
| Requirements | 1st choice | 2nd supplement | Remarks |
|---|---|---|---|
| Simple cycle task (instance 1-2) | @Scheduled | Idempotent key + execution history | Advantages for quick start |
| Complex Schedule/Calendar/Misfire | Quartz | JDBCJobStore + Dashboard | Strong operational control |
| Bulk data conversion/restart required | Spring Batch | Keyset Reader + Chunk Tuning | Select Standard |
| Operator Intervention Reprocessing | Manual Deployment API/UI | Permissions/Audit/Rollback | Securing control/reproducibility |
| Bulk Search Index | Spring Batch + OpenSearch Bulk | PIT + Search After | Search cluster limit reflection |
Principle of separation of execution layers
- Trigger layer:
@Scheduled/Quartz/K8s CronJob - Processing Layer: Spring Batch Step/Chunk
- Data layer: DB Keyset/Range + OpenSearch Bulk
- Control layer: lock/leader election/idempotency
- Operational Layer: Observability/Alerts/DLQ/Manual Reprocessing
Reference Architecture Diagram

Source: Pexels - Security control room team
3) Code example
Example A: Unified Orchestrator Interface
public interface BatchOrchestrator {
Long launchScheduled(String jobName, LocalDateTime scheduledAt);
Long launchManual(String jobName, Map<String, String> params, String requestedBy);
Long launchRecovery(String jobName, Long failedExecutionId);
}
@Service
@RequiredArgsConstructor
public class DefaultBatchOrchestrator implements BatchOrchestrator {
private final ExecutionGuard executionGuard;
private final JobLauncher jobLauncher;
@Override
public Long launchScheduled(String jobName, LocalDateTime scheduledAt) {
String dedupKey = jobName + ":" + scheduledAt.toString();
executionGuard.assertNotRunning(dedupKey);
return run(jobName, Map.of("scheduledAt", scheduledAt.toString(), "dedupKey", dedupKey));
}
private Long run(String jobName, Map<String, String> params) {
// 실행 이력 저장 후 Spring Batch Job launch
return 1L;
}
}
Example B: SQL for Operations Dashboard
SELECT job_name,
COUNT(*) FILTER (WHERE status = 'SUCCEEDED') AS success_count,
COUNT(*) FILTER (WHERE status = 'FAILED') AS failed_count,
AVG(duration_ms) AS avg_duration_ms,
PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY duration_ms) AS p99_duration_ms
FROM batch_job_execution
WHERE started_at >= NOW() - INTERVAL '24 hour'
GROUP BY job_name
ORDER BY failed_count DESC, p99_duration_ms DESC;
Example C: Keyset query for reprocessing
SELECT id, job_name, error_code, started_at
FROM batch_job_execution
WHERE status = 'FAILED'
AND id > :last_id
ORDER BY id ASC
LIMIT 300;
Example D: OpenSearch Incremental Synchronization Query
POST /_search
{
"size": 1000,
"sort": [
{ "updated_at": "asc" },
{ "_id": "asc" }
],
"search_after": ["2026-03-03T08:00:00Z", "product-9988"],
"query": {
"range": {
"updated_at": { "gte": "2026-03-03T00:00:00Z" }
}
}
}
4) Real-world failure/operational scenarios
Situation: Order settlement (Quartz), product indexing (Spring Batch+OpenSearch), and operator reprocessing API simultaneously increased during Black Friday traffic. The common DB connection pool was depleted, the entire deployment was delayed, and some failed due to timeouts.
Cause:
- Each batch was individually optimized, so there was no overall system limit model.
- Resource priority (Settlement > Reindex > Report) is not set.
- Failed batch retries were performed immediately, increasing the load.
Improvements:
- Fixed resource budget (CPU/DB pool/IO) by batch type.
- Introducing priority queues and automatic delay of low-priority tasks.
- Apply exponential backoff + no execution window during rush hour for failure retry.
5) Design Checklist
- Do you first classify batch requirements into schedule/event/manual/bulk/NRT?
- Has the execution entity (
@Scheduled/Quartz/K8s/Manual) been clearly defined? - Was the DB search strategy designed around Keyset/Range?
- Does OpenSearch use PIT + Search After + Bulk control?
- Are lock/leader election/idempotency/fencing tokens designed together?
- Are DLQ and reprocessing operating procedures documented?
- Are performance goals and failure recovery goals (RTO/RPO) managed numerically?
6) Summary
The conclusion of this series is simple. Deployment is an operating system issue, not a technology issue. @Scheduled, Quartz, Spring Batch, and manual deployment are not a competitive relationship, but a role sharing relationship. It must be combined according to requirements, failure models, and operational capabilities.
7) Next episode preview
The series ends with this episode. In a follow-up article, we will cover the implementation repo structure and observation dashboard template to “coexist four deployment models in a single service” based on an actual sample project.
Reference link
- Spring Batch Reference
- Quartz Scheduler Documentation
- PostgreSQL Transaction Isolation
- 블로그: Idempotency Key API 설계
Series navigation
- Previous post: Part 11. 장애 대응 아키텍처: Partial Failure, Poison Data, DLQ, 재시도, 멱등성
- Next post: None (last part of this series)