Introduction
In this article, we’ll walk through the ETL design pattern for implementing SCD Type 2 using SQL logic, SSIS (SQL Server Integration Services), and dbt (Data Build Tool).
Objective of SCD Type 2
To maintain historical versions of dimension records by creating new rows when attribute values change.
Typical dimension fields include:
Surrogate_Key
Natural_Key
(e.g., Customer_ID)
Attribute_1
, Attribute_2
, etc.
Effective_Date
, End_Date
Is_Current
(Y/N)
✍️ SCD Type 2 Logic in SQL
Here's a simplified SQL-based pattern:
-- Step 1: Detect Changes
SELECT src.*
FROM staging_table src
JOIN dimension_table tgt
ON src.Customer_ID = tgt.Customer_ID
WHERE src.City <> tgt.City AND tgt.Is_Current = 'Y';
-- Step 2: Mark old row as expired
UPDATE dimension_table
SET End_Date = CURRENT_DATE - 1,
Is_Current = 'N'
WHERE Customer_ID IN (...);
-- Step 3: Insert new row with new surrogate key
INSERT INTO dimension_table
(Customer_ID, City, Effective_Date, End_Date, Is_Current)
SELECT Customer_ID, City, CURRENT_DATE, NULL, 'Y'
FROM staging_table
WHERE ...;
⚖️ Implementing with SSIS
- Source & Lookup: Load source (staging) data and perform a lookup against the target dimension
- Change Detection: Use Conditional Split to identify new, unchanged, and changed records
- Handle Expired Rows: Update the
End_Date
and Is_Current
using an OLE DB Command
- Insert New Rows: Add a Derived Column for
Effective_Date
, set Is_Current
to 'Y', and insert
- Surrogate Keys: Use an Identity column or a Sequence
🔧 Using dbt for SCD Type 2
In dbt, this logic is done in a dbt model
using incremental strategies:
{{ config(materialized='incremental', unique_key='customer_id') }}
WITH staged AS (
SELECT * FROM {{ ref('stg_customer') }}
),
changes AS (
SELECT *
FROM staged
WHERE NOT EXISTS (
SELECT 1 FROM {{ this }}
WHERE customer_id = staged.customer_id
AND city = staged.city
AND is_current = 'Y'
)
)
SELECT *,
CURRENT_DATE AS effective_date,
NULL AS end_date,
'Y' AS is_current
FROM changes
You also need a separate model or logic to expire current rows (is_current = 'Y'
)
📊 Best Practices
- Always define a business key (e.g., Customer_ID)
- Use audit columns like
load_date
, batch_id
- Be mindful of performance; avoid full table scans
- Keep surrogate keys separate from business logic
🟦 Conclusion
Implementing SCD Type 2 ensures your dimensional data supports full historical reporting. Whether you're using SQL scripts, SSIS workflows, or dbt models, the pattern is similar: detect changes, expire old rows, and insert new ones.
Up Next: We'll explore how to build a Data Mart from scratch, covering dimensional design, fact/dimension identification, ETL planning, and how it all fits into your BI ecosystem.