3 min read

Part 10. Performance Optimization: Batch Size, Commit Interval, JVM Memory, Backpressure

We present practical tuning standards that stably optimize batch throughput within system limits rather than blindly increasing it.

Series: Spring Boot 배치 전략 완전 정복

12편 구성. 현재 10편을 보고 있습니다.

썸네일 - 대규모 서버 인프라
썸네일 - 대규모 서버 인프라

Source: Pexels - Server racks on data center

Based on version

  • Java 21
  • Spring Boot 3.3.x
  • Spring Batch 5.2.x
  • Quartz 2.3.x
  • PostgreSQL 15
  • OpenSearch 2.x

1) Raise a problem

The most dangerous behavior in batch performance tuning is the single prescription, “It’s slow, so increase the chunks.” In reality, CPU, DB IOPS, network, and external API limits are different, so even if one point is pushed, the other point will collapse first.

The goal of performance optimization is not the highest TPS, but securing a stable range.

  • Meet completion time within SLA
  • Predictable recovery time in case of failure
  • Can coexist with OLTP traffic

2) Summary of key concepts

Criteria for determining batch size

  1. Measurement of single item processing time t (Reader+Processor+Writer).
  2. Calculate target throughput R (total number of cases/allowed time).
  3. Reflect DB/external system allowed concurrency C.
  4. Start with chunk = (R / C) * 안전계수 and then adjust to P95 indicator.

Commit interval tradeoffs

  • Increase commit interval: Reduce transaction overhead, increase rollback cost.
  • Reduce commit interval: increase stability, decrease TPS.

JVM memory calculation example

  • Record average object size: 6KB
  • chunk: 1,000
  • Simultaneous threads: 6
  • Expected object memory: 6KB x 1,000 x 6 = 약 36MB
  • In reality, it requires 2 to 3 times the overhead (array, string, serialization buffer, cache)

In other words, it is safe to secure at least 100 MB as a batch working set and control it to within 30-40% of the heap.

Backpressure control loop

Mermaid diagram rendering...

본문 이미지 - 운영/성능 모니터링
본문 이미지 - 운영/성능 모니터링

Source: Pexels - Multi monitors workspace

3) Code example

Example A: Throughput-based dynamic chunk calculation

public class ChunkSizer {

    public int calculate(long totalCount, Duration deadline, int maxConcurrency) {
        long seconds = Math.max(deadline.getSeconds(), 1);
        long targetPerSecond = Math.max(totalCount / seconds, 1);
        long perWorker = Math.max(targetPerSecond / Math.max(maxConcurrency, 1), 1);

        int chunk = (int) Math.min(Math.max(perWorker * 2, 100), 5000);
        return chunk;
    }
}

Example B: TaskExecutor applying backpressure

@Bean
public ThreadPoolTaskExecutor batchTaskExecutor() {
    ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
    executor.setCorePoolSize(4);
    executor.setMaxPoolSize(8);
    executor.setQueueCapacity(100);
    executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
    executor.initialize();
    return executor;
}

Example C: Performance Metric Query SQL

SELECT job_name,
       date_trunc('minute', started_at) AS minute_bucket,
       AVG(duration_ms) AS avg_duration_ms,
       PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration_ms) AS p95_duration_ms,
       SUM(processed_count) AS total_processed
FROM batch_execution_metrics
WHERE started_at >= NOW() - INTERVAL '1 hour'
GROUP BY job_name, date_trunc('minute', started_at)
ORDER BY minute_bucket DESC;

Example D: Keyset-based processing window query

SELECT id, payload
FROM event_queue
WHERE id > :last_id
ORDER BY id ASC
LIMIT :window_size;

4) Real-world failure/operational scenarios

Situation: To reduce batch completion time, chunks were increased from 1,000 to 10,000 and workers were expanded from 4 to 12. TPS increased for the first 10 minutes, but after that, Full GC was repeated and DB write delay increased, so the overall completion time actually increased.

Cause:

  • Heap working set increases rapidly, increasing GC frequency and pause time.
  • Lock holding time becomes longer due to excessive commit units.
  • Ignore the downstream (OpenSearch Bulk) 429 increase and push it in at the same speed.

Improvements:

  1. Chunk 2,000, scale down to 6 workers.
  2. If the 429 rate exceeds 2%, the concurrency is lowered by 1 level.
  3. Monitor P95 processing time and GC pause on the same dashboard.

5) Design Checklist

  • Have you calculated the required throughput based on the target completion time and total number of cases?
  • Did you compare the P95 metrics before and after the chunk/concurrency increase?
  • Have you calculated the heap working set estimate and secured a free buffer?
  • Doesn’t the commit interval excessively increase the lock holding time?
  • Does back pressure operate according to the downstream error rate (429, timeout)?
  • Does performance optimization not violate API traffic SLAs?

6) Summary

Performance optimization is not “bigger, more,” but “reliable within limits.” Throughput model, memory model, and backpressure model must be designed together to achieve sustainable performance in operation.

7) Next episode preview

The next section covers failure response architecture. Completes “recoverable deployment even if it fails” through Partial Failure, Poison Data, DLQ, and retry/idempotency design.

Series navigation

Comments