Part 1. Nature and classification of batches: schedule, event, manual, bulk, near-real-time

Source: Pexels - Server Racks on Data Center

Based on version

Java 21
Spring Boot 3.3.x
Spring Batch 5.2.x
Quartz 2.3.x
PostgreSQL 15
OpenSearch 2.x

1) Raise a problem

In practice, "deployment" is not a single technique, but a bundle of multiple execution models with different failure models. This is why @Scheduled, Quartz, Spring Batch, and manual execution APIs exist simultaneously within the same Spring Boot application. The problem is that many teams choose them solely for “ease of implementation.”

Minute-by-minute delays are not allowed for on-time settlement.
Point recalculation may be delayed by several hours, but data consistency is a priority.
Operator reprocessing prioritizes reproducibility and audit trail over speed.
Index synchronization is more cost-effective in “reflection within minutes” than in complete real-time.

In other words, deployment strategy should be determined by SLA, RPO, failover method, and data volume, not code style.

2) Summary of key concepts

Decision-making becomes simpler if you first categorize the placement into five categories.

Category	trigger	Representative technology	Strengths	Key risks
Schedule based	time (cron)	`@Scheduled`, Quartz	Predictability	Multi-instance redundant execution
event-based	Message/Change Event	Kafka Consumer, CDC	low delay	Back pressure required in case of runaway
Manual Run	operator trigger	Admin API/UI	Control, reprocessing flexibility	Risk of missing authority/audit
Bulk processing	High-capacity range scanning	Spring Batch Chunk	Transaction Management/Restart	Memory/Lock Competition
Near-real-time	Short cycle + incremental lookup	Schedule+event mix	Reflectivity compared to cost	Dual path consistency

The transaction isolation level should be set in line with the batch classification. For example, READ COMMITTED is sufficient for an aggregate batch, but for an inventory calculation batch that processes the same conditions by repeatedly querying, REPEATABLE READ or snapshot read should be considered. As the isolation level increases, lock contention and delay increase, so choosing a high level will fail.

In mass deployments, JVM memory is also an architectural variable. For example, if the average number of objects processed per transaction is 4KB and the number of chunks is 2,000, at least 8MB + object overhead + serialization buffer is required in one step. Multiply this by multi-threaded steps (e.g. 8 threads) and you get straight heap pressure.

Classification decision diagram

Mermaid diagram rendering...

Source: Pexels - Workplace with program code

3) Code example

Example A: Policy routing by deployment type

public enum BatchType {
    SCHEDULED,
    EVENT_DRIVEN,
    MANUAL,
    BULK,
    NEAR_REAL_TIME
}

public record BatchExecutionPolicy(
    Duration timeout,
    int maxRetry,
    Isolation isolation,
    int chunkSize
) {}

public class BatchPolicyResolver {
    public BatchExecutionPolicy resolve(BatchType type) {
        return switch (type) {
            case SCHEDULED -> new BatchExecutionPolicy(Duration.ofMinutes(30), 1, Isolation.READ_COMMITTED, 500);
            case EVENT_DRIVEN -> new BatchExecutionPolicy(Duration.ofMinutes(5), 5, Isolation.READ_COMMITTED, 200);
            case MANUAL -> new BatchExecutionPolicy(Duration.ofHours(2), 0, Isolation.REPEATABLE_READ, 1000);
            case BULK -> new BatchExecutionPolicy(Duration.ofHours(6), 2, Isolation.REPEATABLE_READ, 2000);
            case NEAR_REAL_TIME -> new BatchExecutionPolicy(Duration.ofMinutes(2), 3, Isolation.READ_COMMITTED, 100);
        };
    }
}

Example B: Execution history table and query SQL

CREATE TABLE batch_job_execution (
    id BIGSERIAL PRIMARY KEY,
    job_name VARCHAR(100) NOT NULL,
    batch_type VARCHAR(30) NOT NULL,
    requested_by VARCHAR(100) NULL,
    status VARCHAR(20) NOT NULL,
    started_at TIMESTAMP NOT NULL,
    ended_at TIMESTAMP NULL,
    error_code VARCHAR(50) NULL,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_bje_job_started ON batch_job_execution (job_name, started_at DESC);
CREATE INDEX idx_bje_status_started ON batch_job_execution (status, started_at DESC);

-- 최근 24시간 실패한 배치 확인
SELECT id, job_name, batch_type, status, started_at, ended_at, error_code
FROM batch_job_execution
WHERE status = 'FAILED'
  AND started_at >= NOW() - INTERVAL '24 hour'
ORDER BY started_at DESC
LIMIT 200;

Example C: Keyset-based reprocessing target query

SELECT id, job_name, started_at
FROM batch_job_execution
WHERE status = 'FAILED'
  AND id > :last_id
ORDER BY id ASC
LIMIT 500;

4) Real-world failure/operational scenarios

Situation: At 00:00 at the end of the month, the settlement batch (@Scheduled) and the operator manual reprocessing API updated the same order table at the same time. Because both routes were searched with the “today not settled” condition, the same order was settled twice.

Cause:

Although the execution model was different, there was no idempotent key for the same business key.
The transaction isolation level was READ COMMITTED, and there was no FOR UPDATE SKIP LOCKED in the reprocessing query.
The batch history was recorded, but “what input parameters it was run with” was not saved.

Recovery:

Add UNIQUE(order_id, settlement_date) to the settlement table.
Modify reprocessing query to SELECT ... FOR UPDATE SKIP LOCKED.
Add request_hash column to execution history to block duplicate requests.

5) Design Checklist

Did you first classify the batch into 5 types (schedule/event/manual/bulk/NRT)?
Are the SLA, allowable delay, and retry policies for each type documented?
Has the transaction isolation level been selected to suit the purpose of the deployment?
Is there an idempotent key design (업무키+일자, etc.) for each batch?
Are parameters, requester, number of retries, and error codes left in the execution history?
Have you calculated the chunk size and JVM heap usage in bulk for large batches?

6) Summary

The starting point for deployment design is not “which libraries to use.” You must first decide what failures to deal with and what consistency to ensure. This standard must be met first to make the roles of @Scheduled, Quartz, Spring Batch, and manual deployment clear.

7) Next episode preview

The next part covers how to use @Scheduled safely in an operating environment. In particular, issues such as multi-instance redundant execution, distributed locking (Redis/DB), and redundant restart processing in case of failure are organized with a focus on actual patterns.

Reference link

Previous post: None (starter of this series)
Next post: Part 2. @Scheduled 실전 운영: 단순함의 대가와 멀티 인스턴스 함정