Generative AI & RAG Development

Topics

Chunking Strategies

Learning Objectives

By the end of this session, you will be able to:

Understand what chunking is in RAG systems
Learn why chunking is critical for retrieval quality
Explore different chunking strategies
Understand chunk size trade-offs
Learn about chunk overlap techniques
Identify common chunking mistakes
Select appropriate chunking approaches for different use cases

Introduction

In the previous session, we explored the Data Ingestion Pipeline and learned how documents are transformed into searchable knowledge.

One of the most important steps in that pipeline is:

Chunking

Many beginners assume chunking is simply splitting a document into smaller pieces.

In reality, chunking is one of the most important decisions in RAG engineering.

A poorly designed chunking strategy can cause:

Poor retrieval
Missing context
Incorrect answers
Increased hallucinations

A well-designed chunking strategy can dramatically improve:

Search relevance
Context quality
Answer accuracy
User satisfaction

Many real-world RAG performance problems can be traced back to poor chunking decisions.

This is why experienced AI engineers spend significant time optimizing chunking strategies.

Why This Topic Matters

Imagine a university handbook containing:

Admission Rules
Examination Policies
Scholarship Guidelines
Hostel Regulations

A student asks:

What are the scholarship eligibility requirements?

If chunking is poor:

Scholarship Information
+
Hostel Rules
+
Exam Policies

may be combined into one chunk.

The retrieval system may return irrelevant information.

If chunking is well-designed:

Scholarship Policy Chunk

is retrieved directly.

The answer becomes significantly more accurate.

What Is Chunking?

Chunking is the process of dividing large documents into smaller pieces called chunks.

Example:

Original document:

100 Pages

After chunking:

Chunk 1
Chunk 2
Chunk 3
...
Chunk N

Each chunk becomes an independent unit for:

Embedding generation
Storage
Retrieval

The quality of these chunks directly affects the quality of the RAG system.

Why Not Store Entire Documents?

A common question is:

Why not embed the entire document?

Consider a 200-page handbook.

Problems:

Embedding Quality

Large documents contain multiple topics.

Example:

Leave Policy
Travel Policy
Benefits Policy
Security Policy

One embedding cannot accurately represent all topics.

Retrieval Precision

Users usually ask specific questions.

Example:

How many annual leave days are available?

Retrieving a 200-page document is inefficient.

Context Limits

LLMs have context window limitations.

Sending entire documents increases costs and complexity.

Chunking solves these issues.

How Chunking Fits into RAG

Workflow:

Documents
     ?
Chunking
     ?
Embeddings
     ?
Vector Database
     ?
Retrieval

Chunking serves as the foundation of semantic retrieval.

Characteristics of Good Chunks

Good chunks should be:

Meaningful

Contain a complete idea.

Focused

Cover a single topic when possible.

Searchable

Easy for retrieval systems to locate.

Contextual

Contain enough information to make sense independently.

A chunk should be understandable even when viewed alone.

Example of Poor Chunking

Document:

Leave Policy
Employees receive 24 annual leave days.

Remote Work Policy
Employees may work remotely twice per week.

Travel Policy
Travel expenses must be approved.

Poor chunk:

Leave Policy
Employees receive 24 annual leave days.

Remote Work Policy
Employees may work remotely twice per week.

Travel Policy
Travel expenses must be approved.

Three topics are mixed together.

Retrieval quality decreases.

Example of Better Chunking

Chunk 1:

Leave Policy
Employees receive 24 annual leave days.

Chunk 2:

Remote Work Policy
Employees may work remotely twice per week.

Chunk 3:

Travel Policy
Travel expenses must be approved.

Each chunk now focuses on a specific topic.

Fixed-Size Chunking

The simplest strategy.

Documents are divided based on character count or token count.

Example:

500 Tokens Per Chunk

Workflow:

Document
      ?
500 Tokens
      ?
500 Tokens
      ?
500 Tokens

Advantages:

Easy implementation
Fast processing
Widely supported

Disadvantages:

May split important information
Ignores document structure

Fixed-Size Example

Document:

Page 1
Page 2
Page 3
...

Chunks:

Tokens 1–500
Tokens 501–1000
Tokens 1001–1500

This approach is common in early RAG implementations.

Recursive Chunking

A more advanced strategy.

The system attempts to split documents using natural boundaries.

Example:

Priority:

Paragraph
 ?
Sentence
 ?
Word

Instead of splitting arbitrarily, the system preserves meaning.

Advantages:

Better context preservation
Improved retrieval quality

This is one of the most commonly used approaches today.

Section-Based Chunking

Documents often contain:

Chapter 1
Chapter 2
Chapter 3

Policy A
Policy B
Policy C

Chunking follows document structure.

Example:

Scholarship Policy

becomes one chunk.

Advantages:

Preserves topic boundaries
Easy to understand

Ideal for:

Manuals
Policies
Documentation

Semantic Chunking

Semantic chunking uses meaning rather than size.

Example:

Document:

Topic A
Topic A
Topic A

Topic B
Topic B
Topic B

The system detects topic changes and creates chunks accordingly.

Advantages:

High retrieval quality
Better contextual grouping

Disadvantages:

More computationally expensive

Semantic chunking is increasingly used in advanced RAG systems.

Visual Comparison

Fixed Chunking

Document
 ?
500 Tokens
 ?
500 Tokens
 ?
500 Tokens

Section Chunking

Policy A
Policy B
Policy C

Semantic Chunking

Meaning Group 1
Meaning Group 2
Meaning Group 3

Each strategy serves different needs.

Understanding Chunk Size

Chunk size is one of the most important tuning parameters.

Examples:

200 Tokens
500 Tokens
1000 Tokens

Different chunk sizes produce different retrieval behaviors.

Small Chunks

Example:

100–300 Tokens

Advantages:

Highly focused retrieval
Better precision

Disadvantages:

May lose context
Important information may be fragmented

Example:

Question:

Leave Policy

Retrieved:

Employees receive

Incomplete context.

Large Chunks

Example:

1000–2000 Tokens

Advantages:

More context
Better completeness

Disadvantages:

Lower retrieval precision
More irrelevant information

Example:

Question:

Leave Policy

Retrieved:

Leave Policy
Travel Policy
Benefits Policy
Security Policy

Too much information.

The Chunk Size Trade-Off

Small Chunks
      ?
Higher Precision
Lower Context

Large Chunks
      ?
Lower Precision
Higher Context

Finding the right balance is a key RAG engineering skill.

What Is Chunk Overlap?

Chunk overlap allows neighboring chunks to share content.

Example:

Without overlap:

Chunk 1
A B C D

Chunk 2
E F G H

With overlap:

Chunk 1
A B C D

Chunk 2
C D E F

Some information appears in both chunks.

Why Overlap Matters

Important information often spans boundaries.

Without overlap:

Sentence Start

may appear in one chunk.

Sentence End

may appear in another.

The meaning is lost.

Overlap helps preserve continuity.

Typical Overlap Values

Common values:

10%
20%
30%

Example:

500 Token Chunk
100 Token Overlap

This approach is widely used in production systems.

Example Retrieval Problem

Question:

What are the eligibility criteria for scholarships?

Without overlap:

Eligibility

and

Requirements

may appear in separate chunks.

Retrieval becomes less effective.

With overlap:

Eligibility Requirements

remain together.

Search quality improves.

Chunking Strategies by Use Case

Legal Documents

Recommended:

Section-Based Chunking

Reason:

Legal documents already contain structured sections.

Technical Documentation

Recommended:

Recursive Chunking

Reason:

Preserves logical explanations.

Research Papers

Recommended:

Semantic Chunking

Reason:

Research topics often span multiple paragraphs.

FAQs

Recommended:

Question-Answer Chunking

Reason:

Each FAQ becomes a separate chunk.

Real-World Enterprise Example

A company stores:

Employee Handbook
Benefits Guide
Security Policies

Poor chunking:

Multiple policies mixed together

Result:

Poor retrieval quality

Optimized chunking:

Policy-Based Chunks

Result:

Higher retrieval accuracy

This significantly improves user experience.

Common Chunking Mistakes

Chunks Too Small

Important context lost.

Chunks Too Large

Too much irrelevant information.

No Overlap

Boundary information lost.

Ignoring Document Structure

Reduces retrieval quality.

One Strategy for Every Document

Different document types often require different approaches.

How Chunking Impacts Cost

Consider:

10,000 Documents

Small chunks create:

100,000 Chunks

Large chunks create:

20,000 Chunks

More chunks mean:

More embeddings
More storage
More processing

Chunk size influences infrastructure costs.

Production Chunking Workflow

Documents
      ?
Cleaning
      ?
Chunking
      ?
Overlap
      ?
Embeddings
      ?
Vector Database

Every stage affects retrieval performance.

Enterprise Best Practices

Start Simple

Begin with recursive chunking.

Measure Retrieval Quality

Evaluate actual search results.

Tune Chunk Size

Adjust based on document type.

Use Overlap

Preserve important context.

Test Frequently

Retrieval quality should be validated regularly.

Successful RAG systems evolve through experimentation.

.NET Perspective

Popular .NET tools include:

Semantic Kernel
Azure AI Search
Azure OpenAI

Many enterprise applications implement custom chunking strategies tailored to business documents.

Python Perspective

Popular Python frameworks include:

LangChain
LlamaIndex
ChromaDB
Unstructured

These frameworks provide built-in chunking utilities and support multiple chunking strategies.

Assignment

Practical Exercise

Take a 10-page PDF and create:

Fixed-size chunks
Section-based chunks
Semantic chunks

Compare:

Retrieval quality
Context preservation
Ease of implementation

Design Activity

Choose one domain:

University
Healthcare
Banking
E-Commerce

Recommend a chunking strategy and explain your reasoning.

Key Takeaways

Chunking divides documents into searchable units.
Chunk quality directly impacts retrieval quality.
Fixed-size chunking is simple but may lose context.
Recursive and semantic chunking often produce better results.
Chunk size involves a trade-off between precision and context.
Overlap helps preserve meaning across chunk boundaries.
Effective chunking is one of the most important aspects of RAG engineering.

Module 3 Complete

You have now completed:

What Is Retrieval-Augmented Generation (RAG)?
Why LLMs Hallucinate
How RAG Solves Knowledge Limitations
RAG Architecture Explained
Data Ingestion Pipeline
Chunking Strategies

You now understand the complete foundation of RAG systems and are ready to explore embeddings and vector databases in greater depth.

What's Next?

In Session 19, we begin Module 4: Embeddings and Vector Databases with:

Understanding Embeddings

You will learn what embeddings are, how they work, why they are essential for semantic search, and how they form the backbone of modern RAG systems.

Previous « Data Ingestion PipelinePrevious Next » Understanding EmbeddingsNext