While building star schemas in a data warehouse, the dimension tables are joined with the fact table. This ensures the tracking of attributes of the dimension.
An example could be customers and their sales activities. We can track sales information using various attributes of a customer such as a city, state, zip code, etc. However, what if the dimensions keep changing over time? Say, if the customer moves from Los Angeles to New York, how would you associate this new city with that customer’s sales and other activities?
It is precisely to tackle this problem that we have 3 common types of SCD techniques that can be implemented while designing the dimension table in the data warehouse.
Achieve business success with our data analytics consulting services
What are the Types of Slowly Changing Dimensions, Actions?
The most popular approaches of how to deal with SCD are as follows
- Type 0 SCD – The Fixed Method
- Type 1 SCD – Overwriting the old value by new values
- Type 2 SCD – Creating a new additional record by row versioning
- Type 3 SCD – Adding a new column to show the previous value
- Type 4 SCD – Using historical table
- Type 6 SCD – Combine approaches of types 1,2,3 (1+2+3=6) or Hybrid SCD
- Type 1 SCD (Overwriting)
The first type of SCD action possible is overwriting. Here, dimension values are overwritten by new values. For example, if our customer Mike moves from Los Angeles to New York, then his city will be updated with the latest value, i.e New York
Original Record
Customer ID | Name | City |
1111 | Mike | Los Angeles |
2222 | Jane | San Francisco |
Updated Record
Customer ID | Name | City |
1111 | Mike | New York |
2222 | Jane | San Francisco |
2. Type 2 SCD (Row versioning)
The second type of SCD action is row versioning. In this type 2 SCD, when values for a current record change, the current record is marked as closed and a new record gets inserted. Then, there will be 2 records associated with Mike in the updated table, but only the latest version will be marked “open”.
Original Record
Customer ID | Master Customer ID | Name | City | IsActive |
1111 | 1111 | Mike | Los Angeles | 1 |
2222 | 2222 | Jane | San Francisco | 1 |
Updated Record
Customer ID | Master Customer ID | Name | City | IsActive |
1111 | 3333 | Mike | Los Angeles | 0 |
2222 | 2222 | Jane | San Francisco | 1 |
3333 | 3333 | Mike | New York | 1 |
3. Type 3 SCD (Adding previous value column)
The third common SCD type is adding a previous value column. Here, the previous and current versions are maintained in a single row. In the customer MIke moving address example we would copy Mike’s ‘Current City’ record into the ‘Previous City’ and overwrite the Current City record with the new city.
Original Record
Customer ID | Name | City |
1111 | Mike | Los Angeles |
2222 | Jane | San Francisco |
Updated Record
Customer ID | Name | Current City | Previous City |
1111 | Mike | New York | Los Angeles |
2222 | Jane | San Francisco |
Other Types
In addition to these three types of SCDs, there are also Type 4 and Type 6. In Type 4, the dimension table has the latest value while its history is maintained in a separate table. In Type 6, a combination of Type 1, 2 & 3 are used to track changes in dimension. Generally, Type 6 is adopted in scenarios where multiple parts of a record are slowly changing dimensions, but using multiple implementations of a single type could lead to issues with rapid inflation of table size.
That is part 1 of the blog post. In part 2, I will be focusing on Type 2 SCD and the issues that are likely to crop up during implementation.