«Back to Home

Generative AI & RAG Development

Topics

Vector Similarity Search

Learning Objectives

By the end of this session, you will be able to:

Understand what Vector Similarity Search is
Learn how vector retrieval works
Understand how embeddings are compared
Explore similarity scoring techniques
Learn about cosine similarity and distance metrics
Understand how RAG systems retrieve relevant information
Build a strong foundation for working with vector databases

Introduction

In the previous session, we learned how embedding models convert text into vectors.

Example:

Leave Policy

becomes:

[0.12, 0.85, -0.42, ...]

and

Vacation Policy

becomes:

[0.15, 0.82, -0.39, ...]

Because the meanings are similar, the vectors are also similar.

This naturally leads to an important question:

How does a computer determine that two vectors are similar?

The answer is:

Vector Similarity Search

This technology is one of the most important foundations of:

RAG systems
Semantic search
AI assistants
Recommendation engines
Enterprise knowledge platforms

Without similarity search, embeddings would simply be numbers stored in a database.

Similarity search transforms those numbers into meaningful retrieval results.

Why This Topic Matters

Imagine a company knowledge base containing:

100,000 Documents

An employee asks:

How many annual leave days do employees receive?

The system must quickly identify:

Leave Policy Document

from among thousands of possibilities.

Traditional keyword search may struggle.

Vector similarity search finds information based on meaning.

This capability makes modern AI search systems possible.

What Is Vector Similarity Search?

Vector Similarity Search is the process of finding vectors that are closest in meaning to a query vector.

Simple idea:

Question
    ?
Embedding
    ?
Find Similar Vectors
    ?
Retrieve Documents

The retrieval system searches for vectors that are mathematically closest to the query.

The closer the vectors, the more likely the meanings are related.

Understanding the Concept with an Analogy

Imagine a map.

Cities that are physically close are considered neighbors.

Example:

Delhi
Noida
Gurugram

are geographically close.

Similarly:

Vacation Policy
Annual Leave Policy
Paid Time Off Policy

are semantically close.

Vector similarity search measures this semantic closeness.

How Retrieval Works

Suppose we have three document embeddings.

Document A:

Annual Leave Policy

Document B:

Remote Work Guidelines

Document C:

Football Tournament Schedule

User asks:

How much vacation time do employees receive?

The query is converted into an embedding.

The retrieval system compares the query vector against all document vectors.

Results:

Document A ? Very Similar
Document B ? Somewhat Similar
Document C ? Not Similar

Document A is retrieved.

High-Level Retrieval Workflow

User Question
        ?
Generate Query Embedding
        ?
Compare with Stored Vectors
        ?
Calculate Similarity Scores
        ?
Retrieve Top Matches
        ?
Send Context to LLM

This process typically takes milliseconds.

Why Similarity Search Is Better Than Keyword Search

Consider:

Document:

Employees receive annual leave benefits.

User search:

Vacation policy

Keyword search:

No exact match

Vector search:

Vacation
˜
Annual Leave

Result:

Correct document retrieved

This ability to understand meaning is the biggest advantage of vector search.

Similarity Scores

Similarity search produces a score.

Example:

Document	Similarity Score
Annual Leave Policy	0.97
Employee Benefits	0.82
Remote Work Policy	0.55
Football Schedule	0.03

Higher score:

More Similar

Lower score:

Less Similar

The system typically retrieves the highest-ranked results.

What Is Cosine Similarity?

The most common similarity technique is:

Cosine Similarity

Instead of comparing exact values, cosine similarity compares vector direction.

Think of two arrows.

Similar Direction

? ?

Similarity:

High

Different Direction

? ?

Similarity:

Low

Opposite Direction

? ?

Similarity:

Very Low

Cosine similarity measures how closely vectors point in the same direction.

Cosine Similarity Formula

The mathematical formula is:

\text{Cosine Similarity}(A,B)=\frac{A\cdot B}{|A||B|}

Do not worry about memorizing the formula.

For AI engineers, the key idea is:

Higher Cosine Similarity
       ?
More Similar Meaning

Understanding Similarity Values

Typical cosine similarity values:

Score	Interpretation
0.95 – 1.00	Extremely Similar
0.80 – 0.95	Highly Related
0.60 – 0.80	Moderately Related
0.30 – 0.60	Weakly Related
Below 0.30	Mostly Unrelated

These ranges vary depending on the embedding model.

Example

Query:

Remote Work Policy

Documents:

Work From Home Guidelines

Score:

0.96

Document:

Annual Leave Policy

Score:

0.72

Document:

Cricket Match Schedule

Score:

0.08

The first document would be ranked highest.

Euclidean Distance

Another common similarity technique is:

Euclidean Distance

This measures physical distance between vectors.

Example:

Point A
Point B

Distance:

Small Distance
      ?
More Similar

Large distance:

Less Similar

Although useful, cosine similarity is often preferred for text embeddings.

Dot Product Similarity

Some vector databases also use:

Dot Product

This measures how strongly vectors align.

Advantages:

Fast computation
Efficient at scale

Many vector databases support multiple similarity methods.

Similarity Search in RAG

Let's revisit the RAG workflow.

Documents
      ?
Embeddings
      ?
Vector Database

Question
      ?
Embedding
      ?
Similarity Search
      ?
Relevant Chunks
      ?
LLM
      ?
Answer

The similarity search step determines which information reaches the LLM.

Poor retrieval often means poor answers.

Top-K Retrieval

Most systems retrieve multiple results.

Example:

Top 3 Results

Top 5 Results

Top 10 Results

This is called:

Top-K Retrieval

Example:

K = 5

The five most relevant chunks are returned.

Why Multiple Results Matter

Suppose information is spread across several chunks.

Question:

What are the scholarship requirements?

Relevant information may exist in:

Chunk 1
Chunk 3
Chunk 7

Retrieving multiple chunks provides better context.

Similarity Search Example

Knowledge Base:

Admission Policy
Scholarship Policy
Hostel Policy
Examination Rules

Student asks:

What financial support is available?

Query embedding generated.

Similarity scores:

Scholarship Policy ? 0.94
Admission Policy ? 0.65
Hostel Policy ? 0.12

Scholarship Policy is selected.

Real-World Example: HR Assistant

Employee asks:

Can I work remotely?

Similarity search finds:

Remote Work Policy
Hybrid Work Guidelines
Remote Access Rules

The LLM receives all relevant context.

The answer becomes more accurate.

Real-World Example: Customer Support

Customer asks:

How do I reset my account password?

Similarity search retrieves:

Password Reset Guide
Account Recovery Process
Security Verification Instructions

The customer receives a helpful response.

Real-World Example: University Assistant

Student asks:

When is the scholarship application deadline?

Similarity search retrieves:

Scholarship Application Policy
Financial Aid Guidelines
Student Funding Rules

The assistant generates an answer based on official information.

Challenges in Similarity Search

Poor Embeddings

Weak embeddings produce poor matches.

Poor Chunking

Bad chunking reduces retrieval quality.

Too Much Context

Retrieving too many chunks introduces noise.

Too Little Context

Important information may be missed.

Domain-Specific Terminology

Specialized industries may require specialized embedding models.

Similarity Search at Scale

Consider:

10 Million Chunks

Comparing every vector individually would be slow.

Modern vector databases use:

Approximate Nearest Neighbor (ANN)

algorithms.

These algorithms:

Reduce search time
Improve scalability
Maintain high accuracy

ANN is one reason modern vector search remains fast even with millions of documents.

Exact Search vs Approximate Search

Feature	Exact Search	Approximate Search
Accuracy	Highest	Very High
Speed	Slower	Faster
Scalability	Limited	Excellent
Production Usage	Rare	Common

Most enterprise systems use approximate search.

Similarity Search Architecture

User Question
        ?
Embedding Model
        ?
Query Vector
        ?
Vector Database
        ?
Similarity Search
        ?
Top Matching Chunks
        ?
LLM
        ?
Answer

This architecture powers most modern RAG applications.

Why Vector Search Changed AI

Before vector search:

Search
 ?
Keywords
 ?
Limited Understanding

After vector search:

Search
 ?
Meaning
 ?
Semantic Understanding

This shift enabled:

Modern RAG systems
Enterprise AI assistants
Intelligent search platforms
AI-powered recommendation systems

.NET Perspective

Common technologies include:

Azure AI Search
Semantic Kernel
Azure OpenAI Embeddings

Many enterprise .NET applications combine these services to implement semantic search and RAG solutions.

Python Perspective

Popular Python tools include:

LangChain
LlamaIndex
FAISS
ChromaDB
Pinecone
Weaviate

These tools simplify vector search implementation.

Assignment

Practical Exercise

Create 10 example questions and 10 example documents.

Manually identify:

Exact keyword matches
Semantic matches

Compare how keyword search and vector search would behave.

Research Activity

Investigate:

Cosine Similarity
Euclidean Distance
Dot Product Similarity

For each method, explain:

How it works
Advantages
Common use cases

Key Takeaways

Vector similarity search finds information based on meaning rather than exact keywords.
Query embeddings are compared against stored document embeddings.
Cosine similarity is the most commonly used similarity metric.
Similarity scores help rank search results.
Top-K retrieval returns the most relevant chunks.
Vector search is a critical component of modern RAG systems.
Semantic retrieval enables more accurate and intelligent AI applications.

What's Next?

In Session 22, we will explore:

Introduction to Vector Databases

You will learn why traditional databases are not sufficient for semantic retrieval, how vector databases work internally, popular vector database platforms, indexing techniques, and how they power large-scale RAG applications.

Previous « Embeddings Using Modern ModelsPrevious Next » Vector DatabasesNext