Query Transformation

Learning Objectives

By the end of this session, you will be able to:

  • Understand what Query Transformation is

  • Learn why user queries often need optimization

  • Explore query rewriting techniques

  • Understand query expansion strategies

  • Learn how modern RAG systems improve retrieval before searching

  • Design advanced retrieval pipelines

  • Improve answer quality through better query processing

Introduction

In the previous session, we explored Context Compression and learned how advanced RAG systems reduce unnecessary information before sending context to an LLM.

We learned about:

  • Token optimization

  • Context reduction

  • Query-aware compression

  • Enterprise retrieval architectures

However, another challenge exists even before retrieval begins.

Consider a user asking:

Tell me about leave.

What exactly does the user mean?

Possible interpretations:

Annual Leave

Medical Leave

Maternity Leave

Leave Approval Process

Leave Balance

The query is too vague.

Now consider:

Can I work from another city?

The relevant company document may use the term:

Remote Work Policy

instead of:

Work From Another City

A simple search may miss important information.

This problem introduces:

Query Transformation

Query Transformation improves user queries before retrieval begins.

Why This Topic Matters

Imagine a university assistant.

Student asks:

Scholarship

This query lacks detail.

The system doesn't know whether the student wants:

  • Eligibility criteria

  • Application deadlines

  • Scholarship amounts

  • Available programs

Instead of searching directly, modern systems first improve the query.

Result:

Better Query
      ?
Better Retrieval
      ?
Better Answer

This is why query transformation has become an important part of advanced RAG systems.

What Is Query Transformation?

Query Transformation is the process of modifying, rewriting, expanding, or clarifying a user's query before retrieval occurs.

Instead of:

User Query
      ?
Search

the system performs:

User Query
      ?
Transform Query
      ?
Search

The transformed query is usually more descriptive and retrieval-friendly.

Why User Queries Are Often Difficult

Users rarely ask perfect questions.

Examples:

Too Short

Scholarship

Ambiguous

Leave Policy

Informal

Can I work from another city?

Missing Context

What about eligibility?

These queries may not retrieve the best information.

Query Transformation Workflow

User Query
      ?
Query Transformation
      ?
Optimized Query
      ?
Retrieval
      ?
Results

This extra step often improves retrieval quality significantly.

Example of Query Rewriting

User Query:

Can I work from another city?

Rewritten Query:

What does the company's remote work policy say about employees working from locations outside their assigned office city?

The rewritten query contains:

  • More context

  • Better terminology

  • Improved retrieval signals

Query Rewriting

Query Rewriting is one of the most common transformation techniques.

Goal:

Make The Query Clearer

Example:

Original Query:

Travel

Rewritten Query:

What is the company's travel reimbursement policy?

The search becomes more focused.

Query Expansion

Query Expansion adds related terms to a query.

Example:

Original Query:

Remote Work

Expanded Query:

Remote Work
Work From Home
Hybrid Work
Distributed Work

This increases the chances of finding relevant documents.

Why Query Expansion Works

Documents may use different terminology.

User Query:

Work From Home

Document:

Remote Work Policy

Without expansion:

Potential Miss

With expansion:

Better Match

Retrieval quality improves.

Example: University Assistant

Student Query:

Scholarship

Expanded Query:

Scholarship Eligibility
Scholarship Deadline
Scholarship Application Process
Financial Aid

The system retrieves more useful information.

Query Decomposition

Sometimes questions contain multiple sub-questions.

Example:

What scholarships are available and what hostel benefits do they include?

This question contains:

Question 1:
Available Scholarships

Question 2:
Hostel Benefits

The system may split the query into smaller parts.

This is called:

Query Decomposition

Query Decomposition Workflow

Complex Query
       ?
Sub-Question 1

Sub-Question 2

Sub-Question 3
       ?
Separate Retrieval
       ?
Combined Answer

Many advanced RAG systems use this technique.

Example: Enterprise HR Assistant

Employee Query:

Can I work remotely while traveling internationally?

Decomposed Questions:

What is the remote work policy?

What is the international travel policy?

What compliance requirements exist?

Each question retrieves separate evidence.

The system combines results later.

Query Clarification

Sometimes a query is too ambiguous.

Example:

Leave

Possible meanings:

Annual Leave

Medical Leave

Parental Leave

The assistant may ask:

Which type of leave are you referring to?

This improves retrieval accuracy.

AI-Powered Query Transformation

Modern systems often use LLMs to transform queries.

Workflow:

User Query
      ?
LLM
      ?
Improved Query
      ?
Retrieval

The model acts as a query optimizer.

This approach is increasingly common in enterprise RAG systems.

Real-World Example: Customer Support

User Query:

My internet keeps dropping.

Transformed Query:

Troubleshooting intermittent internet connectivity issues and connection drops.

The system retrieves more relevant support documentation.

Real-World Example: University Assistant

Student Query:

Fees

Transformed Query:

What are the tuition fees, admission fees, and semester charges for MCA students?

Retrieval becomes more effective.

Real-World Example: Enterprise Knowledge Assistant

Employee Query:

Expense claim

Transformed Query:

What is the company's expense reimbursement and claim submission policy?

The system retrieves more precise information.

Query Transformation and Hybrid Search

Query transformation works especially well with hybrid retrieval.

Architecture:

Question
      ?
Query Transformation
      ?
Hybrid Search
      ?
Results
      ?
Re-Ranking
      ?
Answer

Many enterprise systems follow this pattern.

Query Transformation vs Query Expansion

TechniquePurpose
Query RewritingImprove clarity
Query ExpansionAdd related terms
Query DecompositionSplit complex questions
Query ClarificationResolve ambiguity

Each technique solves a different retrieval challenge.

Multi-Query Retrieval

Advanced systems sometimes generate multiple queries.

Example:

Original Query:

Remote Work Policy

Generated Queries:

Remote Work Guidelines

Work From Home Policy

Hybrid Work Rules

Employee Location Policy

Each query performs retrieval independently.

Results are combined later.

This often improves recall.

Enterprise Retrieval Pipeline

Modern enterprise architecture:

User Query
      ?
Query Transformation
      ?
Multi-Query Generation
      ?
Hybrid Search
      ?
Re-Ranking
      ?
Context Compression
      ?
LLM
      ?
Answer

This architecture is becoming increasingly common.

Benefits of Query Transformation

Better Retrieval Accuracy

Queries become more descriptive.

Improved Recall

More relevant documents found.

Better User Experience

Users ask naturally.

Reduced Retrieval Failures

Ambiguous queries are improved.

Enterprise Readiness

Supports large knowledge bases.

These benefits make query transformation highly valuable.

Challenges in Query Transformation

Over-Expansion

Too many terms may introduce noise.

Incorrect Assumptions

The system may misunderstand intent.

Increased Processing Time

Additional transformation stage.

Complexity

Requires more sophisticated architecture.

These trade-offs must be managed carefully.

Query Transformation in Popular Frameworks

Many frameworks support query transformation.

Examples:

  • LangChain

  • LlamaIndex

  • Haystack

  • Semantic Kernel

These frameworks provide built-in capabilities for advanced retrieval pipelines.

Future of Query Transformation

Industry trends include:

Personalized Queries

Transformation based on user roles.

Conversational Query Optimization

Using chat history.

Agentic Retrieval

AI agents selecting transformation strategies.

Self-Improving Retrieval

Systems learning from user behavior.

These innovations will continue improving RAG performance.

Enterprise Use Cases

Knowledge Assistants

Policy retrieval.

Customer Support

Troubleshooting queries.

Research Systems

Literature discovery.

Legal Assistants

Regulation retrieval.

Educational Assistants

Student information retrieval.

All of these systems benefit from query transformation.

.NET Perspective

Popular technologies include:

  • Semantic Kernel

  • Azure AI Search

  • Azure OpenAI

  • ASP.NET Core

These tools support query rewriting and advanced retrieval workflows.

Python Perspective

Common frameworks include:

  • LangChain

  • LlamaIndex

  • Haystack

  • OpenAI SDK

Python ecosystems provide extensive support for query transformation techniques.

Assignment

Design Exercise

Design a query transformation pipeline for:

University Knowledge Assistant

Include:

  • Query Rewriting

  • Query Expansion

  • Query Decomposition

  • Retrieval

Explain how each stage improves answer quality.

Research Activity

Compare:

  • Query Rewriting

  • Query Expansion

  • Query Decomposition

Analyze:

  • Accuracy

  • Complexity

  • Retrieval Impact

  • Enterprise Suitability

Key Takeaways

  • Query Transformation improves user queries before retrieval begins.

  • Query rewriting makes questions clearer and more descriptive.

  • Query expansion adds related terms to improve retrieval.

  • Query decomposition breaks complex questions into smaller parts.

  • Multi-query retrieval can improve recall and coverage.

  • Modern enterprise RAG systems often include a query transformation stage.

  • Better queries lead to better retrieval and better answers.

What's Next?

In Session 37, we will explore:

Multi-Step Retrieval

You will learn how advanced AI systems perform retrieval in multiple stages, gather information progressively, reason over retrieved evidence, and support complex questions that cannot be answered through a single retrieval operation.