3 min read

Part 12. Integrated reference architecture and final selection guide

We present a final decision matrix and reference architecture that combines @Scheduled, Quartz, Spring Batch, and manual deployment to suit actual operating conditions.

Series: Spring Boot 배치 전략 완전 정복

12편 구성. 현재 12편을 보고 있습니다.

Progress100% Complete
SERIES COMPLETE

썸네일 - 운영 아키텍처 통합
썸네일 - 운영 아키텍처 통합

Source: Pexels - Server racks on data center

Based on version

  • Java 21
  • Spring Boot 3.3.x
  • Spring Batch 5.2.x
  • Quartz 2.3.x
  • PostgreSQL 15
  • OpenSearch 2.x

1) Raise a problem

In practice, common failures begin with “attempting to solve all deployments with one tool,” rather than with the technology choice itself. On-time settlement, re-indexing, operator reprocessing, and event correction have different requirements. Therefore, there is no single correct answer. The answer is a combination.

The goals of this final episode are two-fold.

  • Quickly classify requirements and select an appropriate execution model.
  • Provide realistic standards that consider failure/performance/operational complexity together.

2) Summary of key concepts

Integrated Selection Matrix

Requirements1st choice2nd supplementRemarks
Simple cycle task (instance 1-2)@ScheduledIdempotent key + execution historyAdvantages for quick start
Complex Schedule/Calendar/MisfireQuartzJDBCJobStore + DashboardStrong operational control
Bulk data conversion/restart requiredSpring BatchKeyset Reader + Chunk TuningSelect Standard
Operator Intervention ReprocessingManual Deployment API/UIPermissions/Audit/RollbackSecuring control/reproducibility
Bulk Search IndexSpring Batch + OpenSearch BulkPIT + Search AfterSearch cluster limit reflection

Principle of separation of execution layers

  • Trigger layer: @Scheduled/Quartz/K8s CronJob
  • Processing Layer: Spring Batch Step/Chunk
  • Data layer: DB Keyset/Range + OpenSearch Bulk
  • Control layer: lock/leader election/idempotency
  • Operational Layer: Observability/Alerts/DLQ/Manual Reprocessing

Reference Architecture Diagram

Mermaid diagram rendering...

본문 이미지 - 운영팀 협업
본문 이미지 - 운영팀 협업

Source: Pexels - Security control room team

3) Code example

Example A: Unified Orchestrator Interface

public interface BatchOrchestrator {
    Long launchScheduled(String jobName, LocalDateTime scheduledAt);
    Long launchManual(String jobName, Map<String, String> params, String requestedBy);
    Long launchRecovery(String jobName, Long failedExecutionId);
}

@Service
@RequiredArgsConstructor
public class DefaultBatchOrchestrator implements BatchOrchestrator {

    private final ExecutionGuard executionGuard;
    private final JobLauncher jobLauncher;

    @Override
    public Long launchScheduled(String jobName, LocalDateTime scheduledAt) {
        String dedupKey = jobName + ":" + scheduledAt.toString();
        executionGuard.assertNotRunning(dedupKey);
        return run(jobName, Map.of("scheduledAt", scheduledAt.toString(), "dedupKey", dedupKey));
    }

    private Long run(String jobName, Map<String, String> params) {
        // 실행 이력 저장 후 Spring Batch Job launch
        return 1L;
    }
}

Example B: SQL for Operations Dashboard

SELECT job_name,
       COUNT(*) FILTER (WHERE status = 'SUCCEEDED') AS success_count,
       COUNT(*) FILTER (WHERE status = 'FAILED') AS failed_count,
       AVG(duration_ms) AS avg_duration_ms,
       PERCENTILE_CONT(0.99) WITHIN GROUP (ORDER BY duration_ms) AS p99_duration_ms
FROM batch_job_execution
WHERE started_at >= NOW() - INTERVAL '24 hour'
GROUP BY job_name
ORDER BY failed_count DESC, p99_duration_ms DESC;

Example C: Keyset query for reprocessing

SELECT id, job_name, error_code, started_at
FROM batch_job_execution
WHERE status = 'FAILED'
  AND id > :last_id
ORDER BY id ASC
LIMIT 300;

Example D: OpenSearch Incremental Synchronization Query

POST /_search
{
  "size": 1000,
  "sort": [
    { "updated_at": "asc" },
    { "_id": "asc" }
  ],
  "search_after": ["2026-03-03T08:00:00Z", "product-9988"],
  "query": {
    "range": {
      "updated_at": { "gte": "2026-03-03T00:00:00Z" }
    }
  }
}

4) Real-world failure/operational scenarios

Situation: Order settlement (Quartz), product indexing (Spring Batch+OpenSearch), and operator reprocessing API simultaneously increased during Black Friday traffic. The common DB connection pool was depleted, the entire deployment was delayed, and some failed due to timeouts.

Cause:

  • Each batch was individually optimized, so there was no overall system limit model.
  • Resource priority (Settlement > Reindex > Report) is not set.
  • Failed batch retries were performed immediately, increasing the load.

Improvements:

  1. Fixed resource budget (CPU/DB pool/IO) by batch type.
  2. Introducing priority queues and automatic delay of low-priority tasks.
  3. Apply exponential backoff + no execution window during rush hour for failure retry.

5) Design Checklist

  • Do you first classify batch requirements into schedule/event/manual/bulk/NRT?
  • Has the execution entity (@Scheduled/Quartz/K8s/Manual) been clearly defined?
  • Was the DB search strategy designed around Keyset/Range?
  • Does OpenSearch use PIT + Search After + Bulk control?
  • Are lock/leader election/idempotency/fencing tokens designed together?
  • Are DLQ and reprocessing operating procedures documented?
  • Are performance goals and failure recovery goals (RTO/RPO) managed numerically?

6) Summary

The conclusion of this series is simple. Deployment is an operating system issue, not a technology issue. @Scheduled, Quartz, Spring Batch, and manual deployment are not a competitive relationship, but a role sharing relationship. It must be combined according to requirements, failure models, and operational capabilities.

7) Next episode preview

The series ends with this episode. In a follow-up article, we will cover the implementation repo structure and observation dashboard template to “coexist four deployment models in a single service” based on an actual sample project.

Series navigation

Comments