A B2B sales startup is creating a customer dimension table.
This will be a part of their star schema model.
And it is used to track customer relations.
They collect details like Tax Registration number, mobile phone number, email-id, pincode, etc.
There are some instances where the email-id and mobile phone numbers change.
Whenever the Customer updates the data at source, it should overwrite the value in this table.
Which of the following options says about the most suitable method?
Click on the arrows to vote for the correct answer
A. B. C. D.Correct Answer: B.
In this type of table, there will be some columns where data once entered will not change.
But here, we have to focus on the columns where data will change later.
In the question, we have entities like email-id and phone number that should be changed as per the requirement of the customer.
This is clearly pointing to the usage of a Slowly Changing Dimension (SCD)
Type 1 will reflect the latest changes in the table once it is updated in the source.
Option A is incorrect: Type to SCDs is mainly for versioning.
Option B is correct: This will update the latest values as the source changes.
Option C is incorrect: Its main purpose is to store the data versions in different columns.
Option D is incorrect: This is a combination of Type 1, 2, and 3 and is not a suitable method.
To know more about SCD Types, please refer to the doc below:
The most suitable method for creating the customer dimension table in this scenario is to use a Type 2 Slowly Changing Dimension (SCD2) approach.
Explanation: Slowly Changing Dimensions (SCD) refer to the way in which data changes over time in a data warehouse. In other words, SCDs are used to track changes made to data over a period of time. There are mainly three types of SCDs: Type 1, Type 2, and Type 3.
Type 1 SCD approach involves overwriting the old data with new data, and only the most recent version is kept. This method is useful when the history of the data is not important.
Type 2 SCD approach, on the other hand, keeps a full history of all changes made to the data. In this method, a new record is created for each change and is linked to the original record. Each new record has a unique primary key but shares the same surrogate key as the original record. This method is useful when the history of the data is important.
Type 3 SCD approach stores both old and new data in a single record. This method is useful when only a few attributes of the data are changing over time.
Type 6 SCD approach is an extension of the Type 2 SCD approach. In this method, a new record is created for each change, just like in Type 2 SCD approach. However, instead of linking the new record to the original record, a link is created between the new record and the most recent version of the record. This method is useful when multiple changes are made to the same record in a short period of time.
In this scenario, the customer dimension table needs to track changes made to customer data like email-id and mobile phone number. If the old data is overwritten with new data, as in the Type 1 SCD approach, the historical data will be lost. Hence, Type 1 SCD approach is not suitable.
Type 3 SCD approach is not suitable either, as it stores both old and new data in a single record, and this may not be efficient for large data sets.
Type 6 SCD approach is also not required for the given scenario as there are no multiple changes made to the same record in a short period of time.
Therefore, the most suitable method for this scenario is to use Type 2 SCD approach as it keeps a full history of all changes made to the data and creates a new record for each change, allowing the tracking of changes made to email-id and mobile phone number.