Part 7. DB Bulk Search Strategy: OFFSET/LIMIT Limits and Keyset, ID Range, Covering Index

Source: Pexels - Computer server in data center room

Based on version

Java 21
Spring Boot 3.3.x
Spring Batch 5.2.x
Quartz 2.3.x
PostgreSQL 15
OpenSearch 2.x

1) Raise a problem

More than half of batch performance issues arise from query methods, not business logic. In particular, the pattern of repeating OFFSET/LIMIT with a fixed rule such as “1,000 per page” slows down non-linearly, not linearly, as the data grows.

In operations, the problem does not end with poor performance.

When new data is inserted between pages, duplication/omission occurs.
Long scans pollute the buffer cache and slow down the OLTP API.
If you increase the isolation level, the lock wait increases rapidly.

2) Summary of key concepts

OFFSET/LIMIT vs Keyset comparison table

Item	OFFSET/LIMIT	Keyset Pagination
Time Complexity	It gets worse as the page number gets bigger	relatively constant
Consistency	Duplication/omission is possible when inserting/deletion in the middle	Stable based on standard key
Index Utilization	Inefficiency due to bulk skip	sort key index efficient
Ease of Restart	Page number needs to be saved	Simple with last key storage
Recommended use	Admin UI Small Data	Batch/Streaming Bulk Processing

ID Range Batch

If the ID increases monotonically, it is easy to divide it based on range and perform parallel processing.

Advantages: High partition independence and clear reprocessing range designation.
Disadvantage: Partition imbalance when ID distribution is skewed.

Covering Index

If the batch is read-only, table access can be reduced with an “index containing only necessary columns.” PostgreSQL uses the INCLUDE column to reduce heap access.

Read isolation levels and snapshots

READ COMMITTED: Typical deployment default.
REPEATABLE READ: Ensuring consistency of repeated scans under the same conditions.
Snapshot-based reads (fixed transaction start time) are good for consistency over long scans, but can cause vacuum delays and version increases.

Lookup Pipeline Diagram

Mermaid diagram rendering...

Source: Pexels - Program code on screen

3) Code example

Example A: OFFSET/LIMIT (not recommended pattern)

SELECT id, user_id, amount, created_at
FROM payment_events
WHERE created_at >= :from
ORDER BY created_at, id
OFFSET :offset
LIMIT :limit;

Example B: Keyset Pagination (recommended)

SELECT id, user_id, amount, created_at
FROM payment_events
WHERE created_at >= :from
  AND (created_at, id) > (:last_created_at, :last_id)
ORDER BY created_at, id
LIMIT :limit;

Example C: ID Range Batch

SELECT id, user_id, amount
FROM payment_events
WHERE id BETWEEN :start_id AND :end_id
ORDER BY id ASC
LIMIT :limit;

Example D: Covering Index Design

CREATE INDEX idx_payment_events_batch
ON payment_events (created_at, id)
INCLUDE (user_id, amount);

Example E: Minimizing lock contention (`SKIP LOCKED`)

WITH cte AS (
  SELECT id
  FROM payout_queue
  WHERE status = 'READY'
  ORDER BY id
  FOR UPDATE SKIP LOCKED
  LIMIT 500
)
UPDATE payout_queue q
SET status = 'PROCESSING'
FROM cte
WHERE q.id = cte.id
RETURNING q.id, q.user_id, q.amount;

4) Real-world failure/operational scenarios

Situation: The payment settlement batch scanned 30 million items based on OFFSET. After two hours, the query response time increased from 50ms to 4s towards the latter part of the page, and some records were settled twice due to newly introduced events intervening.

Cause:

We underestimated the cost of bulk skipping.
Because the sort key was single created_at, the order was unstable in simultaneous data.
The batch restart pointer was stored as a page number, making it vulnerable to data changes.

Improvements:

Switch to (created_at, id) composite keyset.
Change the index to (created_at, id) INCLUDE (user_id, amount).
Save the restart pointer as (last_created_at, last_id).

5) Design Checklist

Have you removed OFFSET-based bulk scanning?
Keyset Does the sort key guarantee uniqueness and monotonicity?
Have you designed a covering index suitable for the batch query column?
Did you apply a lock contention mitigation strategy such as [ ] SKIP LOCKED?
Does the choice of isolation level satisfy both consistency and performance requirements?
Is the restart pointer stored as a key value rather than a page number?

6) Summary

The DB mass query strategy determines the deployment success rate. OFFSET/LIMIT is convenient in the beginning, but as the scale increases, costs and consistency risks rapidly increase. The practical default is to store Keyset + appropriate index + restart pointer.

7) Next episode preview

The next part covers OpenSearch/Elasticsearch deployment strategy. Provides Scroll, Search After, PIT, Bulk API, Refresh policy, and rollover index operation standards and comparison tables.

Part 7. DB Bulk Search Strategy: OFFSET/LIMIT Limits and Keyset, ID Range, Covering Index

Series: Spring Boot 배치 전략 완전 정복

Based on version

1) Raise a problem

2) Summary of key concepts

OFFSET/LIMIT vs Keyset comparison table

ID Range Batch

Covering Index

Read isolation levels and snapshots

Lookup Pipeline Diagram

3) Code example

Example A: OFFSET/LIMIT (not recommended pattern)

Example C: ID Range Batch

Example D: Covering Index Design

Example E: Minimizing lock contention (`SKIP LOCKED`)

4) Real-world failure/operational scenarios

5) Design Checklist

6) Summary

7) Next episode preview

Reference link

Series navigation

Comments

Based on version

1) Raise a problem

2) Summary of key concepts

OFFSET/LIMIT vs Keyset comparison table

ID Range Batch

Covering Index

Read isolation levels and snapshots

Lookup Pipeline Diagram

3) Code example

Example A: OFFSET/LIMIT (not recommended pattern)

Example B: Keyset Pagination (recommended)

Example C: ID Range Batch

Example D: Covering Index Design

Example E: Minimizing lock contention (SKIP LOCKED)

4) Real-world failure/operational scenarios

5) Design Checklist

6) Summary

7) Next episode preview

Reference link

Series navigation

Comments

Example E: Minimizing lock contention (`SKIP LOCKED`)