top of page

SCD Types in Data Warehousing

  • Writer: vsowmiya28
    vsowmiya28
  • Nov 21, 2022
  • 4 min read

Updated: Jan 11, 2023

What are Slowly Changing Dimensions?

  1. Dimensions that change slowly over time.

  2. Dimensions that store and manage both current and historical data over time in a Data Warehouse.

  3. Techniques to manage history within data warehouse.

  4. Historical change of data over time becomes important in DWH as it used for analytical purposes.

Example: User - A

Chennai → Bangalore → Mumbai → Delhi

(9th Nov 19) (10th Nov 20) (20th Dec 21) (5th Oct 22)


Suppose User - A has stayed in city Chennai on 9th Nov 19. Then User - A switches to Bangalore on 10th Nov 20, then Mumbai on 20th Dec 21 and currently User - A is in Delhi as on 5th Oct 22.

Here Dimension is the column name - city, date.

Both City and date has been slowly changed over time.


SCD Types in Data Warehouse:


  1. Type 0 - Fixed Dimension.

  2. Type 1 - Overwrite old data and no history retention.

  3. Type 2 - Maintain all history versions of data.

  4. Type 3 - Maintain limited history.

  5. Type 4 - Combination of Type 1 and Type 2.

  6. Type 6 - Hybrid.


Type 0 SCD:

  1. Data in these dimensions are fixed and never changes.

  2. Once the data is loaded into these tables, it remains fixated.

Drawbacks:

  1. There is no history data tracking available.

  2. Data remains static and is not suitable for analytics.

Examples:

  1. Country, TimeZone

  2. Country, Currency

  3. Company, Employee


Type 1 SCD:

  1. Data in these dimensions are over-written i.e., old value is replaced with new value. So, recent and updated information alone is available.

  2. It is mainly useful for correcting errors.

  3. This type is widely used in real-time projects.

Drawbacks:

  1. History won’t be retained.

  2. Auditing of data is not possible.

Example:

Table A - Initial Data





Table A - Updated Data





In this table, if we want to change the city from Bangalore to Chennai, then old city value Bangalore will be deleted and new value Chennai will be written.


Type 2 SCD:

  1. The data column value which is updated is stored as new row and the old value also exists in a new row.

  2. Reporting and analytics can be done if data is stored in this type.

  3. New columns are generated to identify the updated and historical record.

  4. This is widely used in real-time projects.

Drawbacks:

  1. Huge storage is required as all the history versions of data is retained.

  2. Additional column fields needs to be included to identify the updated and historical data.


Here, three columns CHANGE FLAG, EFFECTIVE START DATE, EFFECTIVE END DATE is inserted to track the updated and historical records.

CHANGE FLAG - Indicates whether the particular column value is active or inactive.

Active indicates the current/updated/recent value.

Inactive indicates the historical value.

EFFECTIVE START DATE - Indicates from which date a particular record is active or inactive.

EFFECTIVE END DATE - Indicates till which date a particular record is active or inactive.


TYPE 3 SCD:

  1. Add a new column with updated value instead of new row to reflect the changes in the data.

  2. In this method, only limited history is maintained.

  3. It is easy for reporting as it enables back and forth switching of data.

Drawbacks:

  1. It is not suitable for changing column values like Place, Country, Address, Pincode.

  2. It is suitable for use cases where changes are limited.

Example: User - A Table


In above table, city for Sunit got changed from Chennai to Bangalore and both the versions of city are tracked in the same table.


Type 4 SCD:

  1. Current and historical data are kept in separate tables so that history data & updated data can be captured.

  2. It is a combination of Type 1 and Type 2 SCD as we are overwriting the data in both current and historical tables as well as history is maintained by keeping different tables based on the year/month depending on the use case.

  3. This method is used in real-time projects.

Drawbacks:

  1. More Storage and Maintenance is required for both current and historical tables.

Example:

Current Year - 2022

Table A - Initial Data





Table A - Updated Data





Historical Year - 2021

Table B - Initial Data





Table B - Updated Data





Here, current year 2022 data is stored in Table A and data is overwritten every time in order to keep only the updated data.

Historical year 2021 data is stored in Table B and data is overwritten every time in order to keep only the updated data.

Since, historical data is kept in a separate table, history versions are tracked effectively.


Type 6 SCD:

  1. It is hybrid type which is a combination of Type 1, Type 2 and Type 3.

  2. New column will be inserted to track the updated value - Type 3.

  3. CHANGE FLAG, EFFECTIVE START DATE, EFFECTIVE END DATE columns are inserted to track the current and historical records - Type 2.

  4. Newly created column values are over-written every time to keep only the recent history versions - Type 1.

  5. This is not widely used in real-time projects.

Drawbacks:

  1. More complex to implement and stores a lot of redundant data.

Example:


Here, NEW CITY column will have the updated city values.

Sunit has changed two cities Chennai and Kolkata.

So, for rows containing Chennai and Kolkata as CITY column values, NEW CITY column value has to be updated with Mumbai.

Raju has changed two cities Bangalore and Mumbai.

So, for rows containing Bangalore and Mumbai as CITY column values, NEW CITY column value has to be updated with Kolkata.

CHANGE FLAG, EFFECTIVE START DATE and EFFECTIVE END DATA columns are inserted to track the current and historic records.

Comments


Contact Me

  • LinkedIn
  • Instagram
  • Facebook

Thanks for contacting me😊
I will reach you soon🙂

bottom of page