Introduction
Large enterprise systems evolve over time. New modules are added, integrations change, and migrations happen. With such changes, data inconsistencies gradually appear. Examples include:
A child record exists without a parent (orphan data).
Totals stored for performance become incorrect compared to computed values.
Foreign key references point to deleted or invalid entries.
Duplicate reference relationships appear due to race conditions.
Data sync between microservices or integration pipelines leaves gaps.
Traditional SQL constraints can prevent some mistakes, but many enterprise systems allow soft deletes, asynchronous processing, queued workflows, and multi-system integration pipelines. In such cases, perfect consistency cannot always be enforced in real time.
A Data Auto-Repair Engine solves this problem by continuously scanning the database, detecting inconsistencies, and fixing them safely—either automatically or with human approval.
1. Goals of the System
The objective is to create a controlled, incremental, auditable repair process, not random updates.
Key outcomes:
Identify data mismatches with classification (severity-based).
Provide automated and semi-automated repair actions.
Allow approval workflows before executing fixes.
Track trends to find broken processes, not just broken records.
2. High-Level Architecture
┌──────────────────────────────┐
│ Metadata Catalog │
└─────────────┬────────────────┘
│
┌────────▼────────┐
│ Scanner Job │
└────────┬────────┘
│
┌─────▼─────┐
│ Detector │
└─────┬─────┘
│
┌───────▼────────┐
│ Repair Engine │
└───────┬────────┘
│
┌────────────────▼─────────────────┐
│ Execution Modes: Auto | Manual │
└───────────────┬──────────────────┘
│
┌───────────▼───────────┐
│ Audit + Alert System │
└────────────────────────┘
3. Types of Data Problems Handled
| Category | Example |
|---|
| Orphan data | InvoiceLines exists but Invoice deleted |
| Broken FK reference | ProductID refers to non-existing Product |
| Mismatched totals | Header totals differ from line sum |
| Range violations | DateReceived < DateCreated |
| Duplicate business key | Two active customers with same PAN |
| Referential cascades missing | Deleting a contract did not delete children |
Each issue should be assigned a severity level:
| Level | Action |
|---|
| Info | Report only |
| Warning | Fix after approval |
| Critical | Auto-fix immediately |
4. Metadata-Driven Rules Engine
Instead of writing rigid SQL scripts, define consistency rules in a rules table.
Example Rule Table
| RuleId | RuleType | TargetTable | Expression | Severity | RepairAction |
|---|
| 1 | MissingParent | InvoiceLine | InvoiceId NOT IN (SELECT Id FROM Invoice) | Critical | Delete Line |
| 2 | IncorrectTotal | Invoice | Total != SELECT SUM(LineAmount) | Warning | Recalculate Total |
| 3 | SoftDeleteMismatch | Customer | IsDeleted = 1 AND EXISTS(Orders) | Manual | Cancel Orders or Undelete |
This approach keeps logic flexible.
5. Detection Patterns
5.1 Orphan Detection Query
SELECT l.Id, l.InvoiceId
FROM InvoiceLine l
LEFT JOIN Invoice i ON l.InvoiceId = i.Id
WHERE i.Id IS NULL;
5.2 Incorrect Total Check
SELECT i.Id, i.Total AS StoredTotal, SUM(l.Amount) AS ActualTotal
FROM Invoice i
JOIN InvoiceLine l ON l.InvoiceId = i.Id
GROUP BY i.Id, i.Total
HAVING SUM(l.Amount) <> i.Total;
5.3 FK Violation Trend Tracking
INSERT INTO AnomalyLog (RuleId, RecordId, CreatedOn)
SELECT 10, CustomerId, GETDATE()
FROM Orders o
WHERE CustomerId NOT IN (SELECT Id FROM Customer);
6. Repair Strategies
| Strategy | When Used |
|---|
| Auto-correct | Deterministic fix available (recomputing totals) |
| Cascade delete | Child has no meaning without parent |
| Set default | Null safe fallback when allowed |
| Merge | When duplicates must be consolidated |
| Escalate to workflow | When human decision required |
7. .NET Repair Pipeline Example
public async Task RunRepairAsync()
{
var rules = await ruleRepo.GetActiveRules();
foreach (var rule in rules)
{
var records = await scanner.Execute(rule.Expression);
foreach (var record in records)
{
switch (rule.RepairAction)
{
case RepairAction.Recalculate:
await repairExecutor.RecalculateTotals(record);
break;
case RepairAction.Delete:
await repairExecutor.DeleteRecord(record);
break;
case RepairAction.FlagForManualReview:
await reviewQueue.Add(rule.RuleId, record.Id);
break;
}
}
}
}
8. Angular Admin UI
The UI should:
Show detected anomalies
Show confidence in repair logic
Allow bulk approval or reject
Show before/after preview
UI Flow
Dashboard → Anomaly List → Preview Impact → Approve/Reject → Execute → Audit Log
9. Scheduling and Execution Model
Use incremental scheduling to avoid large locks.
| Mode | Runs |
|---|
| Online | Lightweight constraints check |
| Nightly batch | Deep scans, large repair operations |
| Event-driven | Triggered after migration or integration |
10. Governance, Safety, and Compliance
Store before/after values for every repair
Use versioned snapshot table patterns
Integrate role-based access
Provide read-only dry-run preview mode
Conclusion
A Data Auto-Repair Engine is essential in large distributed systems where data eventually becomes inconsistent. Instead of running manual SQL patches or firefighting data defects, a controlled automatic system ensures the data stays clean.
By using metadata-driven rules, audit logs, and safe execution workflows, enterprises can maintain trustable data without disrupting live operations.