Prep #07 - Understanding Slowly Changing Dimensions (SCD) in Data Warehousing, and How to Implement Them Right
What is a slowly changing dimension (SCD), and what are different ways to implement it in a data warehouse?
In every data warehouse, change is inevitable. But change is tricky.
When your dimensions change slowly and sporadically, like when a customer moves to a new city or changes their last name, how do you keep track? Do you overwrite the existing data? Do you preserve the old values for historical reporting? That’s where Slowly Changing Dimensions (SCD) come into play.
In this post, we'll break down what SCDs are, why they matter, and how to implement them using different strategies like Type 1, Type 2, and beyond.
Why SCD Matters
Imagine analyzing sales data. If you don’t preserve historical customer addresses, you may inaccurately attribute a sale to the wrong location. Understanding what was true at the time is critical.
SCD lets you manage how dimension data changes over time, whether you care about:
Current state only (Type 1)
Full historical record (Type 2)
Recent change only (Type 3)
The 5 Types of SCD (with Examples)
Type 0: No Change
Immutable attributes (e.g., Date of Birth)
Never updated once loaded
Type 1: Overwrite (No History)
Data is updated in-place
No record of previous value
Useful for corrections (like fixing typos)
UPDATE customer_dimension
SET address = 'New Address'
WHERE customer_id = 123;
Type 2: Full History (New Row)
Adds a new row for each change
Preserves historical data with dates/versioning
Ideal for audit trails and time travel
-- Close old record
UPDATE customer_dimension
SET end_date = CURRENT_DATE, is_current = 'N'
WHERE customer_id = 123 AND is_current = 'Y';
-- Insert new record
INSERT INTO customer_dimension (customer_id, address, start_date, end_date, is_current)
VALUES (123, 'New Address', CURRENT_DATE, NULL, 'Y');
Type 3: Limited History (New Column)
Adds a new column for the previous value
Only the immediate past value is available
Simple, but limited
UPDATE customer_dimension
SET previous_address = address, address = 'New Address'
WHERE customer_id = 123;
Type 6 (Hybrid)
Combines Type 1, 2, and 3
Supports overwrite, history, and previous value
Powerful, but complex to implement
SCD in Modern Data Platforms (Bonus)
Modern tools like Delta Lake make Type 2 easier using MERGE
statements, which automate a lot of the heavy lifting.
MERGE INTO customer_dimension target
USING staging_customer source
ON target.customer_id = source.customer_id AND target.is_current = 'Y'
WHEN MATCHED AND target.address <> source.address THEN
UPDATE SET end_date = CURRENT_DATE, is_current = 'N'
WHEN NOT MATCHED THEN
INSERT (customer_id, address, start_date, end_date, is_current)
VALUES (source.customer_id, source.address, CURRENT_DATE, NULL, 'Y');
Final Thoughts
SCD is a fundamental concept in data warehousing that bridges data integrity with business requirements.
If history doesn’t matter: Use Type 1
If full history matters: Use Type 2
If only recent changes matter: Use Type 3
If flexibility matters: Combine them with Type 6
Knowing when and how to use each makes you a far more thoughtful (and hireable) data engineer.