Databases & DBA  

ETL Design Pattern to Implement SCD Type 2 Using SQL, SSIS, or dbt

Introduction

In this article, we’ll walk through the ETL design pattern for implementing SCD Type 2 using SQL logic, SSIS (SQL Server Integration Services), and dbt (Data Build Tool).

Objective of SCD Type 2

To maintain historical versions of dimension records by creating new rows when attribute values change.

Typical dimension fields include:

  • Surrogate_Key
  • Natural_Key (e.g., Customer_ID)
  • Attribute_1, Attribute_2, etc.
  • Effective_Date, End_Date
  • Is_Current (Y/N)

✍️ SCD Type 2 Logic in SQL

Here's a simplified SQL-based pattern:

-- Step 1: Detect Changes
SELECT src.*
FROM staging_table src
JOIN dimension_table tgt
  ON src.Customer_ID = tgt.Customer_ID
WHERE src.City <> tgt.City AND tgt.Is_Current = 'Y';

-- Step 2: Mark old row as expired
UPDATE dimension_table
SET End_Date = CURRENT_DATE - 1,
    Is_Current = 'N'
WHERE Customer_ID IN (...);

-- Step 3: Insert new row with new surrogate key
INSERT INTO dimension_table
(Customer_ID, City, Effective_Date, End_Date, Is_Current)
SELECT Customer_ID, City, CURRENT_DATE, NULL, 'Y'
FROM staging_table
WHERE ...;

⚖️ Implementing with SSIS

  1. Source & Lookup: Load source (staging) data and perform a lookup against the target dimension
  2. Change Detection: Use Conditional Split to identify new, unchanged, and changed records
  3. Handle Expired Rows: Update the End_Date and Is_Current using an OLE DB Command
  4. Insert New Rows: Add a Derived Column for Effective_Date, set Is_Current to 'Y', and insert
  5. Surrogate Keys: Use an Identity column or a Sequence

🔧 Using dbt for SCD Type 2

In dbt, this logic is done in a dbt model using incremental strategies:

{{ config(materialized='incremental', unique_key='customer_id') }}

WITH staged AS (
    SELECT * FROM {{ ref('stg_customer') }}
),
changes AS (
    SELECT *
    FROM staged
    WHERE NOT EXISTS (
        SELECT 1 FROM {{ this }}
        WHERE customer_id = staged.customer_id
          AND city = staged.city
          AND is_current = 'Y'
    )
)
SELECT *,
    CURRENT_DATE AS effective_date,
    NULL AS end_date,
    'Y' AS is_current
FROM changes

You also need a separate model or logic to expire current rows (is_current = 'Y')

📊 Best Practices

  • Always define a business key (e.g., Customer_ID)
  • Use audit columns like load_date, batch_id
  • Be mindful of performance; avoid full table scans
  • Keep surrogate keys separate from business logic

🟦 Conclusion

Implementing SCD Type 2 ensures your dimensional data supports full historical reporting. Whether you're using SQL scripts, SSIS workflows, or dbt models, the pattern is similar: detect changes, expire old rows, and insert new ones.

Up Next: We'll explore how to build a Data Mart from scratch, covering dimensional design, fact/dimension identification, ETL planning, and how it all fits into your BI ecosystem.