Part 8. OpenSearch/Elasticsearch Deployment Strategy: Scroll, Search After, PIT, Bulk, Rollover

Source: Pexels - Multi monitor workspace

Based on version

Java 21
Spring Boot 3.3.x
Spring Batch 5.2.x
Quartz 2.3.x
PostgreSQL 15
OpenSearch 2.x

1) Raise a problem

Search deployment has different failure patterns than DB deployment. DB is based on transactions, but in OpenSearch, segment/refresh/shard status determines performance and consistency. In particular, the following problem occurs frequently with "full reindex":

Cluster memory pressure due to Scroll context leak
Search After sort key missing/duplicate due to poor design
429 (Too Many Requests) chain occurs due to excessive bulk size setting
Index throughput plummets due to misuse of refresh policy

2) Summary of key concepts

Comparison of inquiry methods

method	Advantages	Disadvantages	Recommended Situation
Scroll API	A familiar model for bulk sequential lookups	Cost of maintaining context is high, occupied for a long time	Short-term batch extraction
Search After	Low state maintenance cost, excellent scalability	Sort key design required	Continuous Bulk Processing
PIT + Search After	Consistent Snapshots + Scalability	PIT lifespan management required	Operational Deployment Basic Recommendation

Scroll vs SearchAfter comparison table

Item	Scroll	Search After
Maintain server health	High (Scroll context)	low (client pointer)
Long-term batch stability	Context Expiration Risk	relatively stable
Alignment Requirements	relatively simple	Stable sort key required
Resource Impact	Memory/File Handle Occupation	Relatively lightweight
Modern Operations Recommended	LIMITED	Basic selection (especially PIT combinations)

Index Strategy Core

Bulk is adjusted based on “payload bytes” rather than “number of documents” (e.g. 5~15MB).
Increase or temporarily disable refresh_interval during bulk indexing.
Control segment/shard size with index rollover (max_age, max_docs, max_size).

Pipeline diagram

Mermaid diagram rendering...

Source: Pexels - Black server racks

3) Code example

Example A: PIT + Search After Lookup DSL

POST /products/_pit?keep_alive=2m

POST /_search
{
  "size": 1000,
  "pit": {
    "id": "${pit_id}",
    "keep_alive": "2m"
  },
  "sort": [
    { "updated_at": "asc" },
    { "_shard_doc": "asc" }
  ],
  "search_after": ["2026-03-03T00:00:00Z", 120341],
  "query": {
    "range": {
      "updated_at": {
        "gte": "2026-03-01T00:00:00Z"
      }
    }
  }
}

Example B: Bulk API NDJSON

POST /products_v3/_bulk
{ "index": { "_id": "1001" } }
{ "product_id": 1001, "name": "Keyboard", "price": 49000, "updated_at": "2026-03-03T01:02:03Z" }
{ "index": { "_id": "1002" } }
{ "product_id": 1002, "name": "Mouse", "price": 29000, "updated_at": "2026-03-03T01:02:04Z" }

Example C: Source DB incremental query SQL

SELECT id, name, price, updated_at
FROM products
WHERE updated_at > :last_synced_at
ORDER BY updated_at, id
LIMIT 1000;

Example D: Rollover Policy

PUT _ilm/policy/products-rollover
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "40gb",
            "max_age": "7d",
            "max_docs": 50000000
          }
        }
      }
    }
  }
}

4) Real-world failure/operational scenarios

Situation: In nightly re-indexing, Bulk size was set to 50MB and 8 workers indexed simultaneously while maintaining refresh=true. After 10 minutes, the number of 429s exploded and indexing delays accumulated, increasing the batch completion time by 6 times.

Cause:

Bulk payload exceeded the shard write buffer.
Segment merging costs increased rapidly due to forced refresh.
Because the Search After sort key was a single updated_at column, simultaneous document omission occurred.

Improvements:

Adjust bulk payload to around 10MB and limit concurrency to 8 -> 3.
During deployment refresh_interval=30s, perform manual refresh once after completion.
Supplement the sort key with (updated_at, _id) or (updated_at, _shard_doc).

5) Design Checklist

Have you considered PIT + Search After as the default rather than Scroll when performing mass searches?
Search After Does the sort key guarantee uniqueness and stability?
Is bulk size controlled based on bytes?
Is there a backoff and concurrency reduction strategy when 429/rejection occurs?
Is the refresh policy adjusted to the index mode?
Are index rollover and shard size targets managed as operational indicators?

6) Summary

OpenSearch deployment is not a simple API call but a matter of cluster resource control. The practical defaults are PIT + Search After for lookup and small bulk + controlled refresh + rollover policy for index.

7) Next episode preview

The next part covers distributed environment deployment strategies. Preventing multi-instance duplicate execution, leader election, Kubernetes CronJob vs. app internal placement, and DB/Redis/Zookeeper lock comparison are summarized from an architectural perspective.