Building a Universal Data Import Wizard — mapping columns, preview, validation

Rajesh Gami
3h
62
0
0

Article

Problem statement and goals
High-level user flow (workflow diagram)
Flowchart (runtime behaviour)
Core concepts and UX considerations
JSON model: mapping and template format
Angular architecture and components
Client-side implementation — parsing, mapping, preview, validation
Server-side considerations — validation, transactions, idempotency
Handling large files and streaming import
Audit, error reporting, and reconciliation
Security and compliance considerations
Testing strategy and developer checklist
Sequence diagram (import lifecycle)
Example: production-ready code snippets (Angular + Node)
Conclusion and recommended next steps

1. Problem statement and goals

Enterprises often need to import data from many external systems or ad-hoc Excel/CSV files. Each file can have different column names, different formats, and dirty data. Building a separate uploader for each format is slow and error-prone.

A Universal Data Import Wizard lets users:

upload a CSV/XLSX file,
map file columns to your domain fields (drag/drop or dropdown),
preview mapped rows with transformations,
run validation (client and server),
fix or ignore invalid rows, and
import data safely (transactionally) with audit logs and rollback.

Goals for this article:

Provide a robust Angular-focused implementation plan.
Give concrete code patterns and architecture.
Cover both small and large file strategies.
Give best practices for UX, security and operations.

2. High-level user flow (workflow diagram)

[User opens Wizard] --> [Upload file] --> [Parse file] --> [Auto-map columns]
       --> [User adjusts mapping] --> [Preview mapped rows]
       --> [Run client validations] --> [User fixes/filters rows]
       --> [Submit to server] --> [Server validation & transform]
       --> [Import transactionally] --> [Result / Audit report]

3. Flowchart (runtime behaviour)

Start
  |
  v
Select file?
  |
  v
Parse file -> Parse success? --No--> Show parse error -> End
  |
 Yes
  |
Auto-detect columns -> Generate suggested mapping
  |
User reviews mapping -> Confirm mapping?
  |                 \
  Yes                No
  |                   |
Preview (first N rows) <- User edits mapping
  |
Run client-side validation
  |
Any critical errors?
  |           \
 No            Yes
  |             |
Submit          User fixes errors -> back to Preview
  |
Server receives mapped rows -> server validation
  |
Validation pass?
  |           \
 Yes           No
  |             |
Import (transaction)  Return validation errors + sample rows
  |
Show import summary + audit log
  |
End

4. Core concepts and UX considerations

Before jumping into code, design is critical.

1. Mapping UX

Auto-detect columns using header matching (case-insensitive, fuzzy match).
Provide a list of target fields grouped by domain (required vs optional).
Allow drag-and-drop or dropdown to map each source column.
Save mappings as named templates (for repeat imports).

2. Preview and sample size

Show top 100 rows by default; allow sampling from random offsets for larger files.
Show original value, transformed value, and validation status per cell.
Allow inline editing for quick fixes.

3. Validation levels

Soft validation: warnings (e.g., inconsistent casing) — user can proceed.
Hard validation: errors that block import (e.g., missing required field, foreign key not found).
Support row-level and cell-level validations.

4. Transformations

Simple inline transforms: trim, uppercase, date parsing with format, number parse, lookup (map text to id).
Support expression-based transforms but restrict to a safe expression language or pre-approved functions.

5. Error handling & recovery

Allow users to download failed rows as CSV for offline correction.
Offer “import partial data” when errors are limited and user accepts.

6. Reusability

Mapping templates must be exportable/importable.
Provide per-tenant templates if multi-tenant product.

5. JSON model: mapping and template format

Define a compact mapping model you’ll persist:

{
  "templateName": "Employee Import v1",
  "fileSettings": {
    "delimiter": ",",
    "hasHeader": true,
    "encoding": "utf-8"
  },
  "mappings": [
    {
      "source": "First Name",
      "target": "employee.firstName",
      "transform": "trim",
      "required": true
    },
    {
      "source": "DOB",
      "target": "employee.dateOfBirth",
      "transform": "date:DD/MM/YYYY",
      "required": false
    },
    {
      "source": "Manager Email",
      "target": "employee.managerId",
      "transform": "lookup:users.email->users.id",
      "required": false
    }
  ],
  "options": {
    "skipHeaderRows": 0,
    "previewRows": 100
  }
}

Notes

target can be a dotted path for nested objects.
transform is either a named transform (trim) or date:FORMAT or lookup:ENTITY.field->ENTITY.id. Only allow pre-approved transforms.

6. Angular architecture and components

Deploy a modular structure:

ImportWizardModule (lazy-loaded)
- ImportWizardComponent — host and stepper control
- FileUploadComponent — file input, basic validation
- ParsingService — uses PapaParse / SheetJS to parse files client-side
- MappingEditorComponent — auto-detect headers, map UI, save templates
- PreviewTableComponent — show preview with validation badges and inline editing
- TransformService — apply transforms safely on client preview
- ImportService — handle uploads and server interactions
- ValidationService — client-side validators and rules

Prefer standalone components (Angular 17+). Use Signals/messages for reactive state between steps.

7. Client-side implementation — parsing, mapping, preview, validation

7.1 Parsing files (CSV and Excel)

Use papaparse for CSV and xlsx (SheetJS) for Excel. Example parsing service skeleton:

@Injectable({providedIn: 'root'})
export class ParsingService {
  parseCsv(file: File, options = {}): Promise<{headers: string[], rows: string[][]}> {
    return new Promise((resolve, reject) => {
      Papa.parse(file, {
        header: false,
        dynamicTyping: false,
        skipEmptyLines: true,
        complete: (res) => {
          const rows = res.data as string[][];
          const headers = rows.shift() || []; // if has header
          resolve({ headers, rows });
        },
        error: (err) => reject(err)
      });
    });
  }

  async parseExcel(file: File, sheetIndex = 0) {
    const arrayBuffer = await file.arrayBuffer();
    const workbook = XLSX.read(arrayBuffer, { type: 'array' });
    const sheetName = workbook.SheetNames[sheetIndex];
    const sheet = workbook.Sheets[sheetName];
    const json = XLSX.utils.sheet_to_json(sheet, { header: 1 }); // rows as array
    const headers = json.shift() || [];
    return { headers, rows: json };
  }
}

7.2 Auto-mapping (fuzzy header match)

Implement simple fuzzy matching: strip non-alphanumeric, lowercase, remove common words (id, name), then exact and startsWith matches. For better matching, use levenshtein distance for fallback.

function normalize(s: string) {
  return (s || '').toLowerCase().replace(/[^a-z0-9]/g,'');
}

Map target fields list (from API metadata) into normalized keys and match.

7.3 Preview rendering and transforms

Create a preview pipeline:

For each preview row, build an object by applying mapping.target and transform.
Build validation results for each field using client validators.
Render a table with columns: Source value → Mapped field → Transformed value → Status.

Transforms example:

function applyTransform(value: any, transform: string) {
  if (!transform) return value;
  if (transform === 'trim') return (''+value).trim();
  if (transform.startsWith('date:')) {
    const fmt = transform.split(':')[1];
    return parseDate(value, fmt); // implement parseDate carefully
  }
  if (transform.startsWith('lookup:')) {
    // perform client-side lookup cache if small, otherwise mark for server lookup
  }
  // add allowed transforms only
}

7.4 Client-side validation

Implement common validators:

required
type checks (number, date)
range checks (min/max)
regexp checks (email)
referential existence (for lookups cache)

Return validation object:

interface CellValidation {
  ok: boolean;
  level: 'info'|'warning'|'error';
  messages?: string[];
}

Show UI badges and allow “ignore validation” toggles.

8. Server-side considerations — validation, transactions, idempotency

Client-side validation reduces noise but server must be authoritative.

1. Strong validation

Schema validation (JSON schema or Fluent Validation).
Referential integrity checks (FKs).
Business rules (duplicate detection, uniqueness).

2. Transactional imports

Wrap import operations in a transaction.
For very large imports, use staged tables: load raw rows into staging_import_rows, validate with SQL, then merge into production tables in small batches in a transaction.

3. Idempotency and dedup

Each import should have an importId and unique importRequestId to avoid double processing.
Offer “dry-run” mode: server validates and returns errors without committing.

4. Partial imports

Allow importing only valid rows. Mark invalid rows and provide report. Require explicit opt-in for partial import.

5. Performance

Bulk upserts using COPY (Postgres), BULK INSERT (SQL Server) or batch inserts. Avoid row-by-row inserts.

6. Audit

Persist import metadata: user, file name, template used, row counts, time, errors. Keep failed rows for audit.

9. Handling large files and streaming import

For files > 50 MB, client-side parsing may be slow or impossible in the browser.

Strategies:

Upload first, parse server-side
- Upload file directly to server or object store (S3).
- Server streams and parses in chunks. Use streaming CSV parsers (node streams, fast-csv) or SheetJS node API.
Chunked parsing and validation
- Parse row-by-row or in batches (e.g., 1000 rows). Validate each batch and write staging rows; keep memory usage low.
Progress and resumability
- Provide progress UI (percentage uploaded, rows processed).
- Support resumable uploads (tus or chunked with MD5 checks).
Worker offload
- For heavy validation, dispatch a background worker or job queue (RabbitMQ, AWS SQS). Return an importJobId so users can check status.

10. Audit, error reporting, and reconciliation

1. Import result summary

Rows processed, inserted, updated, skipped, failed.
Downloadable CSV of failed rows with error column.

2. Error categories

Parse error (bad CSV)
Validation error (business rule)
Referential error (FK missing)
System error (timeout)

3. Reconciliation tools

View imported rows with sourceRowNumber and importId.
Provide “undo” feature: track imported row ids and allow rollback within retention window.

11. Security and compliance considerations

File scanning: scan uploaded files for viruses/malware.
Access control: only authorized users can import to target tenant/environment.
Data governance: ensure PII handling, masking and logging follow policy.
Validation on server: never trust client-side validation.
Rate limits and quotas: protect system from large batch abuse.

12. Testing strategy and developer checklist

Unit tests

Parser tests for CSV/Excel edge cases.
Mapping logic tests (fuzzy matching).
Transform functions unit tests.

Integration tests

End-to-end import flow using test files (happy path + errors).
Server-side validation integration with DB.

Performance tests

Large file ingestion tests in staging to measure RAM/CPU, DB bulk loads.

Developer checklist

Mapping templates persisted with version.
Dry-run available for all imports.
Import job audit and rollback.
Error report downloadable.
Limits and quotas enforced.
Security scanning enabled.

13. Sequence diagram (import lifecycle)

User -> UI: Upload file + select template
UI -> ParsingService: parse file
ParsingService -> UI: headers + rows sample
UI -> User: show mapping UI
User -> UI: confirm mapping
UI -> API: POST /imports/dry-run (mapping + file store reference)
API -> DB/ValidateService: validate mapped sample rows
DB -> API: validation result
API -> UI: validation result (errors/warnings)
User -> UI: confirm import
UI -> API: POST /imports/commit
API -> JobQueue: enqueue import job
JobWorker -> Storage: stream file
JobWorker -> ValidateService: full validation
JobWorker -> DB: staging inserts -> merge -> commit
JobWorker -> API: update job status
UI -> API: poll job status
API -> UI: final report (success/failure)

14. Example: production-ready code snippets (Angular + Node)

Below are compact practical examples — adapt to your stack.

14.1 Mapping editor (component skeleton)

@Component({
  selector: 'app-mapping-editor',
  template: `
    <div class="mapping-grid">
      <table>
        <thead><tr><th>Source Column</th><th>Map To</th></tr></thead>
        <tbody>
          <tr *ngFor="let col of headers">
            <td>{{col}}</td>
            <td>
              <select [ngModel]="mappings[col]" (ngModelChange)="onMap(col,$event)">
                <option [ngValue]="null">-- ignore --</option>
                <option *ngFor="let t of targets" [ngValue]="t">{{t.display}}</option>
              </select>
            </td>
          </tr>
        </tbody>
      </table>
    </div>
  `
})
export class MappingEditorComponent {
  @Input() headers: string[] = [];
  @Input() targets: any[] = []; // target fields metadata
  mappings: Record<string,string|null> = {};

  ngOnInit() {
    this.autoMap();
  }

  autoMap() {
    const normalizedTargets = this.targets.map(t => ({ key: normalize(t.name), val: t }));
    for (const h of this.headers) {
      const n = normalize(h);
      const match = normalizedTargets.find(t => t.key === n || t.key.startsWith(n) || n.startsWith(t.key));
      this.mappings[h] = match?.val?.name ?? null;
    }
  }

  onMap(source: string, target: string|null) {
    this.mappings[source] = target;
  }
}

14.2 Server: upload endpoint (Node/Express sketch)

app.post('/imports', upload.single('file'), async (req,res) => {
  const user = req.user;
  const mapping = JSON.parse(req.body.mapping);
  // store file in object store and return reference
  const fileRef = await storeFile(req.file);
  // start dry-run or commit based on flag
  if (req.body.mode === 'dry-run'){
    const sampleRows = await parseSample(fileRef, 100);
    const validation = await validateRows(sampleRows, mapping);
    return res.json({ ok: true, validation });
  } else {
    const importId = createImportRecord(user.id, fileRef, mapping);
    enqueueImportJob(importId);
    return res.json({ ok: true, importId });
  }
});

15. Conclusion and recommended next steps

A Universal Data Import Wizard is an essential tooling feature for enterprise apps. With careful UX, safe transforms, strong server validation, and a scalable import pipeline, you can reduce manual effort and errors dramatically.