Introduction
Search in enterprise applications is rarely just text matching—it's about understanding the business meaning behind data. A search query like:
"Aircraft brake problem invoice last month"
should return:
Work orders related to aircraft components containing the keyword brake
Associated maintenance logs
Supplier invoices
Warranty and compliance documents
— not merely results where those words appear.
A Domain-Aware Search Indexer goes beyond keyword indexing by applying classification, semantic enrichment, and entity relationships, turning raw records into business-aware search assets.
This article provides a full architectural blueprint, data models, tagging strategies, enrichment logic, storage models (ElasticSearch, Postgres JSONB, Azure Search, or OpenSearch), and implementation patterns using Angular UI + .NET backend.
Objectives
Primary goals
Auto-classify records based on schema, data patterns, and metadata.
Extract relationships (entities → documents → transactions → logs).
Maintain unified search index structures with contextual scoring.
Support multi-tenant, multilingual, and version-aware indexing.
Non-goals
Architecture Overview
┌────────────────────┐
│ Angular Admin UI │
└───────────┬────────┘
│ config, review, replay
▼
┌──────────────────────┐
│ Ingestion API (.NET) │
└───────┬──────────────┘
│ raw items
▼
┌───────────────────────┐
│ Classifier Engine │
│ (rules + ML + regex) │
└───────┬────────────────┘
│ enriched items
▼
┌───────────────────────────────┐
│ Relationship Graph Builder │
│ (FK→entity links, similarity) │
└───────┬───────────────────────┘
│ final structured docs
▼
┌─────────────────────────────────┐
│ Search Index Storage │
│ (ElasticSearch / Azure Search) │
└─────────────────────────────────┘
Key Features
1. Auto-Tagging Using Rules and ML
Tagging sources
| Type | Example Method | Examples |
|---|
| Rule-based | Regex, keyword dictionaries | "FAA", "ISO-9001", "Airworthiness" |
| ML models | BERT classifier, NER, fastText | Detect "invoice", "part", "customer" |
| Structural inference | Column names, table meaning | "StockLine → Inventory → Category=Parts" |
Tagged metadata examples:
{"entityType": "Invoice","tags": ["Finance", "Parts", "Supplier", "Compliance"],"confidence": 0.92}
2. Domain Relationship Detection
Use business logic to infer relationships:
| Entity | Relationship Logic |
|---|
| WorkOrder → Aircraft | FK OR part usage history |
| Invoice → PurchaseOrder | matching vendor + documentNo + date proximity |
| Warranty → Component → Part → Vendor | relational transitive chain |
Relationships are stored as a lightweight graph:
{"id": "INV-10045","links": [
{ "type": "references", "target": "PO-5567" },
{ "type": "relatedTo", "target": "WO-782" }]}
3. Vector-Based Semantic Enrichment (Optional)
Use embeddings when exact keywords don’t exist (e.g., "tire" ≈ "wheel" ≈ "landing gear tire").
Store vector fields:
This enables hybrid search: keyword + vector similarity + metadata filters.
4. Contextual Scoring Strategy
Ranking score is a weighted function:
score = (TF-IDF * 0.3) +
(EntityMatchBoost * 0.2) +
(TagMatchBoost * 0.2) +
(VectorSimilarity * 0.2) +
(RecencyBoost * 0.1)
Example boosts
Records linked to "current aircraft" get +20%
Recently updated items receive decay-based scoring
Parent entities boost children entities (documents, line items)
Index Structure
A universal schema for indexing all entities:
{"id": "WO-91235","entityType": "WorkOrder","tenant": "TenantA","title": "Brake Assembly Replacement","body": "Maintenance performed on Boeing 737 brake actuator module.","tags": ["Maintenance", "Brake", "Aircraft"],"relatedEntities": ["PO-5551","INV-948"],"timestamp": "2025-01-14T09:22:11Z","semanticVector": [0.143, -0.551, ...]}
Implementation (Backend .NET)
Classification Pipeline Skeleton
public async Task<IndexedDocument> EnrichAsync(RawDocument doc)
{
var result = new IndexedDocument
{
Id = doc.Id,
Content = doc.Text,
Title = ExtractTitle(doc),
Tenant = doc.Tenant
};
result.Tags = _tagger.GenerateTags(doc);
result.Relationships = await _relationshipService.DetectAsync(doc);
result.Vector = _vectorService.GenerateEmbedding(doc.Text);
return result;
}
Logical Entity Relationship Detection Example
public async Task<IEnumerable<EntityLink>> DetectAsync(RawDocument doc)
{
var links = new List<EntityLink>();
if (doc.Contains("PO-"))
links.Add(new EntityLink("PurchaseOrder", ExtractId(doc, "PO")));
if (doc.VendorCode != null)
links.Add(new EntityLink("Vendor", doc.VendorCode));
return links;
}
Angular UI Capabilities
Search Insights Dashboard
Relationship Graph Explorer (Neo4j visualization or force-graph)
Filter by: tenant, entity type, tags, compliance status, confidence score
Manual override tagging and relationship editing
Reinforcement learning feedback: “Was this correct? Yes / No”
Operational Lifecycle
| Phase | Action |
|---|
| Ingestion | Monitor database changelogs / events |
| Enrichment | Apply tagging, classification, vectorization |
| Index update | Real-time incremental updates |
| Validation | Scoring, quality checks, drift detection |
| Governance | Audit logs, explainability, rejection queue |
Governance & Compliance
Store AI classification confidence → allow human approval workflow
Support redaction and rebuild (e.g., GDPR Right-to-Delete)
Ensure tenant isolation in multi-tenant indexing clusters
Version index schemas and allow rolling upgrades
Common Pitfalls and Solutions
| Pitfall | Fix |
|---|
| Over-indexing irrelevant fields | Use field importance matrix |
| Incorrect relationships from weak FKs | Use multi-signal scoring (keyword + metadata) |
| Search feels random | Apply weighted scoring model tuned to domain |
| Index bloat | Use TTL and tiered storage for old historic entries |
Summary
A Domain-Aware Search Indexer transforms enterprise data from raw disconnected records into a meaningful, contextual, queryable knowledge system.
Key takeaways:
Use rules, ML, and metadata to auto-classify and enrich content.
Build relationship graphs to link business entities.
Combine keyword, vector, metadata, and time-aware scoring for ranking.
Offer human feedback loops to tune accuracy.
Use multi-tenant governance, versioning, and compliance controls.