What’s wrong with this data mart model?
The sales fact table has an associated dimension which describes the sale record. There is a surrogate key relationship between the fact and dimension tables but there’s also the business key (SaleNumber) from the source system.
Why is this bad?
- There will be a one-to-one relationship between the fact table and the dimension, which means the dimension is potentially going to be huge. Joining between fact and this dimension will be slow at query time.
- The dimension is very poorly built for Analysis Services. If Status only has three possible values but those three are being repeated on every line instead of normalised into their own dimension, SSAS build times will be bigger than they should be.
- If you have facts that get updated in the source, you are going to have to manage this in two places in the data warehouse. A type 2 SCD with millions of rows in it like this one is going to be painfully slow to populate on your ETL runs.
- Some of the attributes should have their own dimensions – status, payment method, order method, carrier – or maybe a junk dimension. This will compress the dimension to the unique values of each of these fields rather than making it as big as the fact table itself.
- The CarriageWeight field is meaningless as an attribute because no-one would ever want to analyse orders by their carriage weight. So it should be included on the fact table, though in an SSAS cube its AggregationType would be set to None.
- Delivery address fields could go in several places:
- Into a delivery address dimension
- Onto the fact table if they aren’t used for analysis purposes
- Into the customer dimension if each customer only has one delivery address
So – dimensions which describe facts in a one-to-one way should be avoided.