3 min read

Part 8. OpenSearch/Elasticsearch Deployment Strategy: Scroll, Search After, PIT, Bulk, Rollover

We summarize the OpenSearch deployment design criteria and operational trade-offs considering both bulk querying and mass indexing from a practical perspective.

Series: Spring Boot 배치 전략 완전 정복

12편 구성. 현재 8편을 보고 있습니다.

썸네일 - 모니터링 대시보드
썸네일 - 모니터링 대시보드

Source: Pexels - Multi monitor workspace

Based on version

  • Java 21
  • Spring Boot 3.3.x
  • Spring Batch 5.2.x
  • Quartz 2.3.x
  • PostgreSQL 15
  • OpenSearch 2.x

1) Raise a problem

Search deployment has different failure patterns than DB deployment. DB is based on transactions, but in OpenSearch, segment/refresh/shard status determines performance and consistency. In particular, the following problem occurs frequently with "full reindex":

  • Cluster memory pressure due to Scroll context leak
  • Search After sort key missing/duplicate due to poor design
  • 429 (Too Many Requests) chain occurs due to excessive bulk size setting
  • Index throughput plummets due to misuse of refresh policy

2) Summary of key concepts

Comparison of inquiry methods

methodAdvantagesDisadvantagesRecommended Situation
Scroll APIA familiar model for bulk sequential lookupsCost of maintaining context is high, occupied for a long timeShort-term batch extraction
Search AfterLow state maintenance cost, excellent scalabilitySort key design requiredContinuous Bulk Processing
PIT + Search AfterConsistent Snapshots + ScalabilityPIT lifespan management requiredOperational Deployment Basic Recommendation

Scroll vs SearchAfter comparison table

ItemScrollSearch After
Maintain server healthHigh (Scroll context)low (client pointer)
Long-term batch stabilityContext Expiration Riskrelatively stable
Alignment Requirementsrelatively simpleStable sort key required
Resource ImpactMemory/File Handle OccupationRelatively lightweight
Modern Operations RecommendedLIMITEDBasic selection (especially PIT combinations)

Index Strategy Core

  • Bulk is adjusted based on “payload bytes” rather than “number of documents” (e.g. 5~15MB).
  • Increase or temporarily disable refresh_interval during bulk indexing.
  • Control segment/shard size with index rollover (max_age, max_docs, max_size).

Pipeline diagram

Mermaid diagram rendering...

본문 이미지 - 서버 인프라
본문 이미지 - 서버 인프라

Source: Pexels - Black server racks

3) Code example

Example A: PIT + Search After Lookup DSL

POST /products/_pit?keep_alive=2m
POST /_search
{
  "size": 1000,
  "pit": {
    "id": "${pit_id}",
    "keep_alive": "2m"
  },
  "sort": [
    { "updated_at": "asc" },
    { "_shard_doc": "asc" }
  ],
  "search_after": ["2026-03-03T00:00:00Z", 120341],
  "query": {
    "range": {
      "updated_at": {
        "gte": "2026-03-01T00:00:00Z"
      }
    }
  }
}

Example B: Bulk API NDJSON

POST /products_v3/_bulk
{ "index": { "_id": "1001" } }
{ "product_id": 1001, "name": "Keyboard", "price": 49000, "updated_at": "2026-03-03T01:02:03Z" }
{ "index": { "_id": "1002" } }
{ "product_id": 1002, "name": "Mouse", "price": 29000, "updated_at": "2026-03-03T01:02:04Z" }

Example C: Source DB incremental query SQL

SELECT id, name, price, updated_at
FROM products
WHERE updated_at > :last_synced_at
ORDER BY updated_at, id
LIMIT 1000;

Example D: Rollover Policy

PUT _ilm/policy/products-rollover
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "40gb",
            "max_age": "7d",
            "max_docs": 50000000
          }
        }
      }
    }
  }
}

4) Real-world failure/operational scenarios

Situation: In nightly re-indexing, Bulk size was set to 50MB and 8 workers indexed simultaneously while maintaining refresh=true. After 10 minutes, the number of 429s exploded and indexing delays accumulated, increasing the batch completion time by 6 times.

Cause:

  • Bulk payload exceeded the shard write buffer.
  • Segment merging costs increased rapidly due to forced refresh.
  • Because the Search After sort key was a single updated_at column, simultaneous document omission occurred.

Improvements:

  1. Adjust bulk payload to around 10MB and limit concurrency to 8 -> 3.
  2. During deployment refresh_interval=30s, perform manual refresh once after completion.
  3. Supplement the sort key with (updated_at, _id) or (updated_at, _shard_doc).

5) Design Checklist

  • Have you considered PIT + Search After as the default rather than Scroll when performing mass searches?
  • Search After Does the sort key guarantee uniqueness and stability?
  • Is bulk size controlled based on bytes?
  • Is there a backoff and concurrency reduction strategy when 429/rejection occurs?
  • Is the refresh policy adjusted to the index mode?
  • Are index rollover and shard size targets managed as operational indicators?

6) Summary

OpenSearch deployment is not a simple API call but a matter of cluster resource control. The practical defaults are PIT + Search After for lookup and small bulk + controlled refresh + rollover policy for index.

7) Next episode preview

The next part covers distributed environment deployment strategies. Preventing multi-instance duplicate execution, leader election, Kubernetes CronJob vs. app internal placement, and DB/Redis/Zookeeper lock comparison are summarized from an architectural perspective.

Series navigation

Comments