AI chatbots are everywhere—but most enterprise teams hesitate to use generic LLM-powered bots because of accuracy, hallucination, and data-sensitivity issues.
So instead of relying entirely on a model like Gemini or GPT, I built a custom AI chatbot architecture using:
This architecture ensures the bot answers strictly from internal knowledge, making it ideal for enterprise, education, ERP systems, CRM, HR, and support bots.
In this article, I’ll walk you through the entire system—from intent storage to embedding generation, similarity scoring, prompt engineering, and final answer generation.
💡 Why Not Use LLM Directly?
Businesses want:
❌ No hallucinations
❌ No internet-based answers
❌ No inconsistent responses
❌ No exposure of internal data
LLMs are powerful but unpredictable. For enterprise applications, we need:
✔ Controlled outputs
✔ Zero hallucinations
✔ Responses only from internal data
✔ Easy updates to knowledge base
✔ Security and traceability
This custom AI chatbot provides exactly that.
🧱 System Architecture Overview
Here is the simplified end-to-end workflow:
+-------------------------+
| User Asks a Question |
+-----------+-------------+
|
v
+------------------------------+
| Convert Question to Embedding|
| (ML.NET FeaturizeText) |
+-----------+------------------+
|
v
+-------------------------------------+
| Compare with Stored Intent Vectors |
| (Cosine Similarity Matching) |
+---------------+---------------------+
|
v
+--------------------------------------+
| Pick Top 3–4 Relevant Answers |
| Build Context Block |
+------------------+-------------------+
|
v
+--------------------------------------------------+
| Send Context + Question to Gemini (Strict Prompt)|
| LLM Generates Final Natural-Language Answer |
+---------------------+----------------------------+
|
v
+------------------------------+
| Return Answer to Client |
+------------------------------+
🗂️ Step 1. Storing Intents in the Database
Each knowledge item (intent) contains:
This creates a structured knowledge base that does not depend on the LLM.
Example DB structure
CREATE TABLE ChatIntents (
Id NVARCHAR(200) PRIMARY KEY,
QuestionText NVARCHAR(MAX),
AnswerText NVARCHAR(MAX),
EmbeddingJson NVARCHAR(MAX)
);
🧠 Step 2. Generating Embeddings with ML.NET
Before the chatbot can respond, we generate embeddings for each stored question.
var pipeline = _mlContext.Transforms.Text.FeaturizeText(
outputColumnName: "Features",
inputColumnName: nameof(TextData.Text));
_embeddingTransformer = pipeline.Fit(dataView);
Then for each intent:
var transformed = _embeddingTransformer.Transform(dataView);
var vectors = transformed.GetColumn<float[]>("Features");
These vectors are stored as JSON.
🔍 Step 3. Matching User Question with Intents
When user asks a question:
1️⃣ Convert question → embedding
2️⃣ Compare against stored embeddings
3️⃣ Find most similar using cosine similarity
var topIntents = intents
.Select(i => new {
Intent = i,
Score = CosineSimilarity(userEmbedding, i.Embedding)
})
.OrderByDescending(x => x.Score)
.Take(4);
Cosine similarity ensures we pick the closest semantic matches.
🔬 Step 4. Passing Context to Gemini AI
Instead of letting the model think freely, we use Strict Context Prompting:
🔐 Hard rule
Gemini can only answer using the context we give.
Example prompt
You are a helpful assistant.
Use ONLY the context below.
Do NOT add any information not present in the context.
Context:
{Top 3–4 answers}
Question: {UserQuestion}
Answer:
This prevents hallucination and ensures controlled responses.
🤖 Step 5. Generating Final Answer with Gemini
var requestBody = new {
contents = new[] {
new {
parts = new[] { new { text = prompt } }
}
}
};
var response = await client.PostAsync(apiUrl,
new StringContent(JsonConvert.SerializeObject(requestBody),
Encoding.UTF8, "application/json"));
Gemini then returns a natural, clean, human-like answer—but only from context.
⚡ Performance Enhancements
To make this production-ready:
✔ In-memory cache for intents
No need to hit DB every time.
✔ Cached ML.NET transformer
No repeated pipeline training.
✔ Async repository calls
Fast DB operations.
✔ On-demand embedding regeneration
You can trigger embedding refresh using this endpoint:
[HttpPost("Update Embedding")]
public async Task<IActionResult> GenerateEmbeddingsAsync()
{
await _chatBotService.GenerateEmbeddingsAsync();
return Ok(new { Answer = "ok" });
}
🛠️ Example .NET Core API (Minimal)
Ask a question
[HttpPost("ask")]
public async Task<IActionResult> Ask(string userQuestion)
{
if (string.IsNullOrEmpty(userQuestion))
return BadRequest("Question cannot be empty.");
var answer = await _chatBotService.GetAnswerAsync(userQuestion);
return Ok(new { Answer = answer });
}
Update embeddings
[HttpPost("update-embeddings")]
public async Task<IActionResult> UpdateEmbeddings()
{
await _chatBotService.GenerateEmbeddingsAsync();
return Ok("Embeddings updated");
}
🎯 Final Outcome — What This Chatbot Can Do
✔ Zero hallucinations
✔ Answers only from your internal knowledge
✔ Fast and scalable
✔ Highly accurate intent matching
✔ Easy to update knowledge base
✔ Ideal for enterprise systems (ERP, CRM, HR, Education, Support)
✔ Secure & predictable
This solution gives you full control of your AI assistant—LLM power with enterprise trust.