Part 10. Performance Optimization: Batch Size, Commit Interval, JVM Memory, Backpressure

Source: Pexels - Server racks on data center

Based on version

Java 21
Spring Boot 3.3.x
Spring Batch 5.2.x
Quartz 2.3.x
PostgreSQL 15
OpenSearch 2.x

1) Raise a problem

The most dangerous behavior in batch performance tuning is the single prescription, “It’s slow, so increase the chunks.” In reality, CPU, DB IOPS, network, and external API limits are different, so even if one point is pushed, the other point will collapse first.

The goal of performance optimization is not the highest TPS, but securing a stable range.

Meet completion time within SLA
Predictable recovery time in case of failure
Can coexist with OLTP traffic

2) Summary of key concepts

Criteria for determining batch size

Measurement of single item processing time t (Reader+Processor+Writer).
Calculate target throughput R (total number of cases/allowed time).
Reflect DB/external system allowed concurrency C.
Start with chunk = (R / C) * 안전계수 and then adjust to P95 indicator.

Commit interval tradeoffs

Increase commit interval: Reduce transaction overhead, increase rollback cost.
Reduce commit interval: increase stability, decrease TPS.

JVM memory calculation example

Record average object size: 6KB
chunk: 1,000
Simultaneous threads: 6
Expected object memory: 6KB x 1,000 x 6 = 약 36MB
In reality, it requires 2 to 3 times the overhead (array, string, serialization buffer, cache)

In other words, it is safe to secure at least 100 MB as a batch working set and control it to within 30-40% of the heap.

Backpressure control loop

Mermaid diagram rendering...

Source: Pexels - Multi monitors workspace

3) Code example

Example A: Throughput-based dynamic chunk calculation

public class ChunkSizer {

    public int calculate(long totalCount, Duration deadline, int maxConcurrency) {
        long seconds = Math.max(deadline.getSeconds(), 1);
        long targetPerSecond = Math.max(totalCount / seconds, 1);
        long perWorker = Math.max(targetPerSecond / Math.max(maxConcurrency, 1), 1);

        int chunk = (int) Math.min(Math.max(perWorker * 2, 100), 5000);
        return chunk;
    }
}

Example B: TaskExecutor applying backpressure

@Bean
public ThreadPoolTaskExecutor batchTaskExecutor() {
    ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
    executor.setCorePoolSize(4);
    executor.setMaxPoolSize(8);
    executor.setQueueCapacity(100);
    executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
    executor.initialize();
    return executor;
}

Example C: Performance Metric Query SQL

SELECT job_name,
       date_trunc('minute', started_at) AS minute_bucket,
       AVG(duration_ms) AS avg_duration_ms,
       PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration_ms) AS p95_duration_ms,
       SUM(processed_count) AS total_processed
FROM batch_execution_metrics
WHERE started_at >= NOW() - INTERVAL '1 hour'
GROUP BY job_name, date_trunc('minute', started_at)
ORDER BY minute_bucket DESC;

Example D: Keyset-based processing window query

SELECT id, payload
FROM event_queue
WHERE id > :last_id
ORDER BY id ASC
LIMIT :window_size;

4) Real-world failure/operational scenarios

Situation: To reduce batch completion time, chunks were increased from 1,000 to 10,000 and workers were expanded from 4 to 12. TPS increased for the first 10 minutes, but after that, Full GC was repeated and DB write delay increased, so the overall completion time actually increased.

Cause:

Heap working set increases rapidly, increasing GC frequency and pause time.
Lock holding time becomes longer due to excessive commit units.
Ignore the downstream (OpenSearch Bulk) 429 increase and push it in at the same speed.

Improvements:

Chunk 2,000, scale down to 6 workers.
If the 429 rate exceeds 2%, the concurrency is lowered by 1 level.
Monitor P95 processing time and GC pause on the same dashboard.

5) Design Checklist

Have you calculated the required throughput based on the target completion time and total number of cases?
Did you compare the P95 metrics before and after the chunk/concurrency increase?
Have you calculated the heap working set estimate and secured a free buffer?
Doesn’t the commit interval excessively increase the lock holding time?
Does back pressure operate according to the downstream error rate (429, timeout)?
Does performance optimization not violate API traffic SLAs?

6) Summary

Performance optimization is not “bigger, more,” but “reliable within limits.” Throughput model, memory model, and backpressure model must be designed together to achieve sustainable performance in operation.

7) Next episode preview

The next section covers failure response architecture. Completes “recoverable deployment even if it fails” through Partial Failure, Poison Data, DLQ, and retry/idempotency design.