SQL Server  

Building Your Own ETL Pipeline Using SQL Server and .NET Background Worker

Introduction

Every modern application needs a way to move data between systems, clean it, transform it, and store it in a structured format. This is where an ETL pipeline (Extract, Transform, Load) becomes useful.
Instead of using heavy tools like SSIS, Informatica, or Azure Data Factory, small and mid-sized teams can build their own lightweight ETL pipeline using:

  • SQL Server for storage, staging, and transformation

  • .NET Background Worker / Hosted Service for scheduled processing

  • Stored procedures for optimized data operations

This article explains how to build a complete ETL pipeline from scratch.

ETL Workflow Diagram (Compact)

          +------------------+
          |   Source Data    |
          | (API/File/DB)    |
          +---------+--------+
                    |
               Extract (C#)
                    |
          +---------v---------+
          |   Staging Table   |
          |     (SQL)         |
          +---------+---------+
                    |
               Transform (SQL)
                    |
          +---------v---------+
          |  Final Target     |
          |  Reporting Table  |
          +---------+---------+
                    |
                 Load (C#)

1. Designing the ETL Structure

Tables Required

a) Staging Table

Used to temporarily store raw data.

CREATE TABLE Staging_Orders
(
    StageId BIGINT IDENTITY(1,1) PRIMARY KEY,
    OrderId VARCHAR(50),
    CustomerName VARCHAR(200),
    Amount DECIMAL(18,2),
    RawJson NVARCHAR(MAX),
    Processed BIT DEFAULT 0,
    CreatedDate DATETIME DEFAULT GETDATE()
);

b) Final Table

CREATE TABLE Fact_Orders
(
    OrderId VARCHAR(50) PRIMARY KEY,
    CustomerName VARCHAR(200),
    CleanedAmount DECIMAL(18,2),
    LoadedDate DATETIME DEFAULT GETDATE()
);

2. Architecture Diagram (Smaller Header)

+-----------------------+
|   Source System       |
| (API/File/Database)   |
+-----------+-----------+
            |
            ↓
+-----------------------+
| .NET Worker Service   |
|  Extract Module       |
+-----------+-----------+
            |
            ↓
+-----------------------+
| SQL Staging Table     |
+-----------+-----------+
            |
            ↓
+-----------------------+
| SQL Transform SP      |
+-----------+-----------+
            |
            ↓
+-----------------------+
| Target Reporting DB   |
+-----------------------+

3. Sequence Diagram

C# Worker → Source API: Fetch orders
Source API → Worker: Return JSON
Worker → SQL: Insert into Staging_Orders
Worker → SQL SP: Execute Transform logic
SQL → Worker: Success response
Worker → SQL: Insert into Fact_Orders

4. Step-by-Step ETL Implementation

Step 1: Extract

A .NET worker fetches data from API/File.

public async Task ExtractAsync()
{
    var data = await _http.GetStringAsync("https://example.com/orders");

    await _repo.InsertStagingAsync(data);
}

Step 2: Load Raw Data into Staging

public async Task InsertStagingAsync(string json)
{
    var orders = JsonConvert.DeserializeObject<List<OrderDto>>(json);

    foreach (var order in orders)
    {
        await _context.Staging_Orders.AddAsync(new StagingOrders {
            OrderId = order.Id,
            CustomerName = order.Name,
            Amount = order.Amount,
            RawJson = JsonConvert.SerializeObject(order)
        });
    }

    await _context.SaveChangesAsync();
}

Step 3: SQL Transformation Logic (Stored Procedure)

CREATE PROCEDURE sp_Transform_Orders
AS
BEGIN
    UPDATE S
    SET 
        CustomerName = UPPER(CustomerName),
        Amount = ISNULL(Amount, 0),
        Processed = 1
    FROM Staging_Orders S
    WHERE Processed = 0;
END

Step 4: Load Final Data

public async Task LoadAsync()
{
    var list = await _context.Staging_Orders
        .Where(x => x.Processed == 1)
        .ToListAsync();

    foreach(var s in list)
    {
        await _context.Fact_Orders.AddAsync(new FactOrders {
            OrderId = s.OrderId,
            CustomerName = s.CustomerName,
            CleanedAmount = s.Amount
        });

        s.Processed = 2; // Mark as completed
    }

    await _context.SaveChangesAsync();
}

5. Creating the .NET Background Worker

public class ETLWorker : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        while(!stoppingToken.IsCancellationRequested)
        {
            await _service.ExtractAsync();
            await _service.TransformAsync();
            await _service.LoadAsync();

            await Task.Delay(TimeSpan.FromMinutes(5)); // Auto scheduler
        }
    }
}

6. Exception Logging Table

CREATE TABLE ETL_ErrorLog(
    Id BIGINT IDENTITY(1,1),
    Step VARCHAR(50),
    ErrorMessage NVARCHAR(MAX),
    CreatedDate DATETIME DEFAULT GETDATE()
);

7. Handling Large Datasets

  • Use table partitioning

  • Use BULK INSERT for large JSON files

  • Use batch processing (500–5000 rows)

  • Use SQL Server Agent for scheduling if not using .NET Worker

8. Dashboard Mockup (Text Version)

------------------------------------------------------------
| ETL JOB DASHBOARD                                        |
------------------------------------------------------------
| Last Run Time: 12:45 PM                                  |
| Status: Success                                           |
| Extracted Records: 450                                    |
| Transformed Records: 448                                  |
| Loaded Records: 448                                       |
| Errors: 2  (View Details)                                 |
------------------------------------------------------------
| [Run ETL Now] [View Logs] [Staging Data] [Final Data]     |
------------------------------------------------------------

9. Adding Email Notification (Optional)

if(errors > 0)
{
    await _email.SendAsync("ETL Errors Found", "Please check error log.");
}

10. Enhancements for Production

  • Implement retry logic

  • Maintain audit trail

  • Add data quality checks

  • Add CDC (Change Data Capture) integration

  • Use Message Queue (RabbitMQ / Azure Service Bus)