Part 7. DB Bulk Search Strategy: OFFSET/LIMIT Limits and Keyset, ID Range, Covering Index
We present paging/index/isolation level design standards to simultaneously secure DB query performance and consistency in mass deployment.
Series: Spring Boot 배치 전략 완전 정복
총 12편 구성. 현재 7편을 보고 있습니다.
- 01Part 1. Nature and classification of batches: schedule, event, manual, bulk, near-real-time
- 02Part 2. @Scheduled in action: The price of simplicity and the multi-instance pitfalls
- 03Part 3. Quartz cluster architecture: JobStore, Misfire, large-scale schedule management
- 04Part 4. Spring Batch core: Chunk, transaction boundary, restartable job design
- 05Part 5. Spring Batch Extension: Tradeoff between Partition and Multi-threaded Step
- 06Part 6. Manual Deployment Strategy: REST Triggers, Admin UI, Parameter Reprocessing, Rollback
- 07Part 7. DB Bulk Search Strategy: OFFSET/LIMIT Limits and Keyset, ID Range, Covering IndexCURRENT
- 08Part 8. OpenSearch/Elasticsearch Deployment Strategy: Scroll, Search After, PIT, Bulk, Rollover
- 09Part 9. Distributed environment deployment: Leader Election, Kubernetes CronJob, and lock strategy comparison
- 10Part 10. Performance Optimization: Batch Size, Commit Interval, JVM Memory, Backpressure
- 11Part 11. Failure response architecture: Partial Failure, Poison Data, DLQ, Retry, Idempotence
- 12Part 12. Integrated reference architecture and final selection guide

Source: Pexels - Computer server in data center room
Based on version
- Java 21
- Spring Boot 3.3.x
- Spring Batch 5.2.x
- Quartz 2.3.x
- PostgreSQL 15
- OpenSearch 2.x
1) Raise a problem
More than half of batch performance issues arise from query methods, not business logic. In particular, the pattern of repeating OFFSET/LIMIT with a fixed rule such as “1,000 per page” slows down non-linearly, not linearly, as the data grows.
In operations, the problem does not end with poor performance.
- When new data is inserted between pages, duplication/omission occurs.
- Long scans pollute the buffer cache and slow down the OLTP API.
- If you increase the isolation level, the lock wait increases rapidly.
2) Summary of key concepts
OFFSET/LIMIT vs Keyset comparison table
| Item | OFFSET/LIMIT | Keyset Pagination |
|---|---|---|
| Time Complexity | It gets worse as the page number gets bigger | relatively constant |
| Consistency | Duplication/omission is possible when inserting/deletion in the middle | Stable based on standard key |
| Index Utilization | Inefficiency due to bulk skip | sort key index efficient |
| Ease of Restart | Page number needs to be saved | Simple with last key storage |
| Recommended use | Admin UI Small Data | Batch/Streaming Bulk Processing |
ID Range Batch
If the ID increases monotonically, it is easy to divide it based on range and perform parallel processing.
- Advantages: High partition independence and clear reprocessing range designation.
- Disadvantage: Partition imbalance when ID distribution is skewed.
Covering Index
If the batch is read-only, table access can be reduced with an “index containing only necessary columns.” PostgreSQL uses the INCLUDE column to reduce heap access.
Read isolation levels and snapshots
READ COMMITTED: Typical deployment default.REPEATABLE READ: Ensuring consistency of repeated scans under the same conditions.- Snapshot-based reads (fixed transaction start time) are good for consistency over long scans, but can cause vacuum delays and version increases.
Lookup Pipeline Diagram

Source: Pexels - Program code on screen
3) Code example
Example A: OFFSET/LIMIT (not recommended pattern)
SELECT id, user_id, amount, created_at
FROM payment_events
WHERE created_at >= :from
ORDER BY created_at, id
OFFSET :offset
LIMIT :limit;
Example B: Keyset Pagination (recommended)
SELECT id, user_id, amount, created_at
FROM payment_events
WHERE created_at >= :from
AND (created_at, id) > (:last_created_at, :last_id)
ORDER BY created_at, id
LIMIT :limit;
Example C: ID Range Batch
SELECT id, user_id, amount
FROM payment_events
WHERE id BETWEEN :start_id AND :end_id
ORDER BY id ASC
LIMIT :limit;
Example D: Covering Index Design
CREATE INDEX idx_payment_events_batch
ON payment_events (created_at, id)
INCLUDE (user_id, amount);
Example E: Minimizing lock contention (SKIP LOCKED)
WITH cte AS (
SELECT id
FROM payout_queue
WHERE status = 'READY'
ORDER BY id
FOR UPDATE SKIP LOCKED
LIMIT 500
)
UPDATE payout_queue q
SET status = 'PROCESSING'
FROM cte
WHERE q.id = cte.id
RETURNING q.id, q.user_id, q.amount;
4) Real-world failure/operational scenarios
Situation: The payment settlement batch scanned 30 million items based on OFFSET. After two hours, the query response time increased from 50ms to 4s towards the latter part of the page, and some records were settled twice due to newly introduced events intervening.
Cause:
- We underestimated the cost of bulk skipping.
- Because the sort key was single
created_at, the order was unstable in simultaneous data. - The batch restart pointer was stored as a page number, making it vulnerable to data changes.
Improvements:
- Switch to
(created_at, id)composite keyset. - Change the index to
(created_at, id) INCLUDE (user_id, amount). - Save the restart pointer as
(last_created_at, last_id).
5) Design Checklist
- Have you removed OFFSET-based bulk scanning?
- Keyset Does the sort key guarantee uniqueness and monotonicity?
- Have you designed a covering index suitable for the batch query column?
- Did you apply a lock contention mitigation strategy such as [ ]
SKIP LOCKED? - Does the choice of isolation level satisfy both consistency and performance requirements?
- Is the restart pointer stored as a key value rather than a page number?
6) Summary
The DB mass query strategy determines the deployment success rate. OFFSET/LIMIT is convenient in the beginning, but as the scale increases, costs and consistency risks rapidly increase. The practical default is to store Keyset + appropriate index + restart pointer.
7) Next episode preview
The next part covers OpenSearch/Elasticsearch deployment strategy. Provides Scroll, Search After, PIT, Bulk API, Refresh policy, and rollover index operation standards and comparison tables.
Reference link
- Spring Batch Reference
- Quartz Scheduler Documentation
- PostgreSQL Transaction Isolation
- 블로그: Idempotency Key API 설계