3 min read

Part 2. @Scheduled in action: The price of simplicity and the multi-instance pitfalls

The redundant execution, distributed locks, and failure recovery trade-offs that occur when applying @Scheduled to the operating environment are organized based on architecture.

Series: Spring Boot 배치 전략 완전 정복

12편 구성. 현재 2편을 보고 있습니다.

썸네일 - 스케줄과 시간
썸네일 - 스케줄과 시간

Source: Pexels - Modern Workspace with Digital Clock

Based on version

  • Java 21
  • Spring Boot 3.3.x
  • Spring Batch 5.2.x
  • Quartz 2.3.x
  • PostgreSQL 15
  • OpenSearch 2.x

1) Raise a problem

@Scheduled has the lowest implementation cost. It runs immediately inside a Spring Boot application, and there is virtually no learning curve. So in the beginning, most of them start with @Scheduled. However, as soon as the number of service instances increases from 1 to 2, the problem of the same cron running twice at the same time almost inevitably occurs.

The important question in operations is simple.

  • Is it safe to execute this task repeatedly?
  • Can the execution time be longer than the schedule interval?
  • When restarting after an instance failure, can previous work overlap?

@Scheduled itself does not answer this question. The answer comes from locking, idempotence, and execution history architecture.

2) Summary of key concepts

Advantages of @Scheduled

  • Code readability is high and adoption speed is fast.
  • Can use the same DI/transaction model as the application context.
  • Efficient in operating “1 to 5 tasks” in a small team.

Operating Limits

  • There is no concept of leader in a cluster environment.
  • Misfire policy (e.g. correction after failure to run on time) is not provided by default.
  • Execution history, retries, and parallel control must be implemented directly.

Lock Strategy Comparison (Summary)

StrategyAdvantagesDisadvantagesRecommended Situation
DB lock (FOR UPDATE/advisory lock)No additional infrastructure required, strong consistencyDB load increaseAlready RDBMS-centered operation
Redis lock (SET NX PX)Fast lock acquisition, suitable for distributed environmentNeed to consider TTL/network splitHigh-frequency scheduling, multiple apps
No lock + idempotencySimple to implement, highly fault tolerantDuplicate execution itself causesAggregation/synchronization without side effects

The transaction isolation level usually starts with READ COMMITTED, but if "same condition re-lookup" is present, REPEATABLE READ or a lock hint is used together.

Execution flow diagram

Mermaid diagram rendering...

본문 이미지 - 운영 콘솔
본문 이미지 - 운영 콘솔

Source: Pexels - A Man Looking at Multiple Monitors

3) Code example

Example A: @Scheduled + distributed lock wrapper

@Component
@RequiredArgsConstructor
public class SettlementScheduler {

    private final DistributedLockService lockService;
    private final SettlementService settlementService;

    @Scheduled(cron = "0 */5 * * * *")
    public void run() {
        String lockKey = "batch:settlement:5m";
        String lockToken = UUID.randomUUID().toString();

        if (!lockService.tryLock(lockKey, lockToken, Duration.ofMinutes(4))) {
            return;
        }

        try {
            settlementService.processPending(LocalDate.now());
        } finally {
            lockService.unlock(lockKey, lockToken);
        }
    }
}

Example B: Redis lock implementation core

@Service
@RequiredArgsConstructor
public class RedisDistributedLockService implements DistributedLockService {

    private final StringRedisTemplate redisTemplate;

    @Override
    public boolean tryLock(String key, String token, Duration ttl) {
        Boolean result = redisTemplate.opsForValue().setIfAbsent(key, token, ttl);
        return Boolean.TRUE.equals(result);
    }

    @Override
    public void unlock(String key, String token) {
        String current = redisTemplate.opsForValue().get(key);
        if (token.equals(current)) {
            redisTemplate.delete(key);
        }
    }
}

Example C: SQL using DB advisory lock (PostgreSQL)

-- job 이름을 hash로 변환해 advisory lock 사용
SELECT pg_try_advisory_lock(hashtext('batch:settlement:5m')) AS acquired;

-- 처리 완료 후 해제
SELECT pg_advisory_unlock(hashtext('batch:settlement:5m'));

Example D: Keyset-based failed execution history query

SELECT id, job_name, started_at, ended_at, error_code
FROM batch_job_execution
WHERE job_name = 'settlement'
  AND status = 'FAILED'
  AND id > :last_id
ORDER BY id ASC
LIMIT 100;

4) Real-world failure/operational scenarios

Situation: In an environment where a 5-minute cycle settlement batch ends in 4 minutes and 50 seconds, the lock update heartbeat was interrupted due to a GC pause (about 20 seconds). When the Redis TTL expired, another instance re-executed the same task, resulting in duplicate settlement.

Cause:

  • Lock TTL was set only based on “average execution time”.
  • The long Stop-The-World (G1 mixed GC) section was not considered in the design.
  • There was no business level idempotence constraint (UNIQUE(order_id, settlement_date)).

Response:

  1. Recalculate TTL as P99 execution time + GC slack time (e.g. 4m50s -> 8m).
  2. In case of heartbeat failure, immediately stop the operation and save the re-entrancy prevention flag.
  3. Add idempotent unique index to domain table.

5) Design Checklist

  • Have you specified a strategy (locking or idempotency) to prevent duplicate execution in multi-instances?
  • Did you calculate the lock TTL based on P99 + GC margin rather than average?
  • Is it safe to re-run the same Windows when a failure occurs?
  • Is the schedule execution history (success/failure/time required) saved?
  • Have you intentionally selected the DB isolation level and lock hint (SKIP LOCKED, etc.)?
  • Have you measured the impact of JVM heap usage and GC pause on schedule interval?

6) Summary

@Scheduled is very powerful for “small and simple deployments,” but cluster stability cannot be guaranteed without separate design. Locking alone is not enough; idempotency/execution history/failure recovery must be combined to create an operational deployment.

7) Next episode preview

In the next part, we will cover Quartz. From a practical perspective, we summarize the criteria for selecting Quartz when operating cluster mode, JobStore (RAM vs JDBC), Misfire policy, and hundreds to thousands of schedules.

Series navigation

Comments