2 min read

RAG Chunking Strategy Field Guide

How to improve RAG quality by measuring the correlation between document segmentation rules and search accuracy

RAG Chunking Strategy Field Guide thumbnail

Introduction

When RAG quality is low, many teams change the model or embedding first. However, the actual cause is often the chunking rules. Even with the same dataset, search accuracy varies greatly depending on chunk size, overlap, and metadata policy.

This article explains how to treat “chunking as an experimentable variable.” The key is to choose a chunking strategy based on indicators, not sense.

RAG Chunking 전략 필드 가이드 커버
Wikimedia Commons 기반 무료 이미지

Problem definition

Frequent problems in the early days of RAG introduction:

  • Dividing the document into large chunks increases recall, but the basis for correct answers is blurred.
  • If you break it down too small, you can retrieval, but the quality of the answer decreases due to lack of context.
  • The chunk policy is fixed to the same value for each document type (guide/FAQ/code document).
  • The offline evaluation is good, but the online experience quality is low.

The key is to design “document structure-based chunking” and “question type-based evaluation set” together.

Key concepts

variableDescriptionRecommended starting value
chunk_sizeToken/character length of one chunk350~550 tokens
overlapOverlapping adjacent chunks10~20%
splitterSplit based on paragraph/header/code blockStructure first division
metadatasection, source, updated_atRequired attachment

In operation, a “policy by document type” is more effective than a single optimal value.

Code Example 1: Structure-Based Chunker

type Chunk = {
  id: string;
  text: string;
  source: string;
  section: string;
};

export function chunkBySection(input: {
  source: string;
  markdown: string;
  size: number;
  overlap: number;
}): Chunk[] {
  const sections = input.markdown.split(/^##\s+/m).filter(Boolean);
  const chunks: Chunk[] = [];

  for (const raw of sections) {
    const [sectionTitle, ...rest] = raw.split("\n");
    const body = rest.join("\n").trim();
    let cursor = 0;

    while (cursor < body.length) {
      const end = Math.min(body.length, cursor + input.size);
      const text = body.slice(cursor, end);
      chunks.push({
        id: `${input.source}-${chunks.length + 1}`,
        text,
        source: input.source,
        section: sectionTitle.trim(),
      });
      cursor = Math.max(end - input.overlap, cursor + 1);
    }
  }

  return chunks;
}

Code Example 2: Offline Evaluation (Recall@K)

type Sample = { question: string; expectedSource: string };

export async function evaluateRecallAtK(samples: Sample[], k = 5) {
  let hit = 0;

  for (const sample of samples) {
    const retrieved = await searchChunks(sample.question, { topK: k });
    const ok = retrieved.some((item: { source: string }) => item.source === sample.expectedSource);
    if (ok) hit += 1;
  }

  return {
    total: samples.length,
    recallAtK: samples.length === 0 ? 0 : hit / samples.length,
  };
}

Architecture flow

Mermaid diagram rendering...

Tradeoffs

  • If the chunk is large, context preservation is good, but noise increases.
  • If the chunk is small, accurate matching is good, but the answer basis is fragmented.
  • Increasing overlap increases recall, but storage costs and the possibility of duplicate responses also increase.

Cleanup

In RAG quality tuning, chunking is the cheapest and most effective lever. By adding segmentation that reflects the document structure and a Recall@K-based evaluation loop, quality can be stably improved without model replacement.

Image source

  • Cover: source link
  • License: CC BY-SA 3.0 / Author: Jorge Royan
  • Note: After downloading the free license image from Wikimedia Commons, it was optimized to JPG at 1600px.

Comments