Wednesday, February 26, 2014

Some facts are factless !

Even though the entire point of creating a fact table is to store some sort of measures, there do exist some fact tables without measures. These are called as Factless Fact table. There are situations in which an intersection of dimensions makes sense in data modelling. They are used to capture information and not perform any calculations. Factless fact tables do not capture any numeric or textual facts.

They are usually used in two kind of situations:                    


                To record an event

Factless fact tables are used to track some sort of event such as student attendance or registration or tracking accident related events. These are events or activities that we wish to track but there are no measurements to record. In these scenarios, the grain of the fact table will be the event that occurred. If we track student attendance, the fact tables and dimension tables will look like this:


This dimensional schema can be used to answer many questions such as:

Which classes were the most heavily attended? 
Which teachers taught classes in facilities belonging to other departments? 
Which facilities were the most lightly used? 
What was the average total walking distance of a student in a given day

To record certain conditions

These are mainly for negative analysis report. It is easy to track products that were sold and revenue generated on those products because there is an easily identifiable grain or transaction. But if we want to track the products that were not sold for promotion purposes, we might need Factless tables. It is often referred to as coverage tables. 



The fact table can help us identify these details:

Number of products that have promotions
Number of products that have promotions that sell
Number of products that have promotions that did not sell

These are mostly for administrative purposes. 

References:

http://www.kimballgroup.com/1996/09/02/factless-fact-tables/
http://dwhlaureate.blogspot.com/2012/08/factless-fact-table.html
The Data Warehouse Toolkit Third Edition - Ralph Kimball & Margy Ross

Tuesday, February 11, 2014

Slowly Changing !

After having gone through the basics of dimensional modelling, it is time to get into a bit of details. It’s hard to sequentially go over all the topics of this course because the volume is enormous. Hence I am going to pick one topic (somewhat related to what I spoke about in my last blog) and then go deep into it.

There are certain dimensions that change over time and they are referred to as Slowly Changing Dimensions. Some of the dimensions that could change over time are selling prices, names of people, cities and so on. There are different techniques used to accommodate these changes into our dimensional model. They are called as different types of slowly changing dimensions.

Type 0: Retain original
With type 0, the dimension value never changes and hence the original always remains the same. An example of a type 0 is date dimension. Hence there is no need to change anything in the model.

Type 1: Overwrite
In this method, the new value overwrites the old value. It always reflects the most recent data. Hence no history is maintained. This method is easy to implement.

For example, if the state of a customer changes:

Customer ID
Name
State
C001
James
Tucson

After update:

Customer ID
Name
State
C001
James
Phoenix

Type 2: Add New Row
Type 2 requires us to add an additional row with the updated information. The old information is retained in the table. This helps us to track the history of the data. Three new columns will be added to indicate the latest information.
  • Row effective date or date timestamp
  • Row expiration date or date timestamp
  • Current row indicator

Customer ID
Name
State
C001
James
Tucson

After update:

Customer ID
Name
State
Row effective date
Row expiration date
Current row indicator
C001
James
Tucson
2/2/2014
2/11/2014
0
C002
James
Phoenix
2/11/2014

1


Type 3: Add new attribute
A new attribute is added to the dimension to preserve the old attribute values; the new value overwrites the old value just like in type 1. This is also referred to as alternate reality and this technique is used infrequently.

Customer ID
Name
State
C001
James
Tucson

After update:

Customer ID
Name
Original state
Current state
C001
James
Tucson
Phoenix

Kimball talks about another 4 types of slowly changing dimensions in his book.

Type 4: Mini-dimension
 When a group of attributes are changed suddenly and they kind of form another group from the dimension, it is referred to as mini-dimension. Using this we capture the base dimension and the mini-dimension. Also called as rapidly changing monster dimension. Mini-dimension also requires its own primary key.

Type 5: Add mini-dimension and Type 1 Outrigger
Type 5 builds on type 4 by embedding a current type 1 reference to the mini-dimension in the base table. This technique is used to preserve the historical attribute values.

Type 6:  Add Type 1 Attributes to Type 2 dimension
Similar to type 5, type 6 builds on the type 2 technique by embedding the current type 1 versions of the same attributes in the dimension row so that fact rows can be filtered or grouped.

Type 7: Dual Type 1 and Type 2 Dimensions
This is the final hybrid technique used to support both history and current attribute values. A fact table can be accessed from the type 1 dimension model which will give the current values or from the type 2 dimension model which will give the historical profiles. 

References:

The Data Warehouse Toolkit Third Edition - Ralph Kimball & Margy Ross
http://dwhlaureate.blogspot.com/2012/09/

Tuesday, February 4, 2014

And the fact continues..!

So after about 3 weeks I think I am getting a hang of Business Intelligence. I am going to continue from where I left in my last blog. So as far as I understand, it all comes down to identifying grain, facts and dimensions.

       Grain: Identifying the grain pretty much defines the type of data that we are modelling. The grain establishes exactly what a single fact row table represents. The grain have to be identified before capturing the facts and dimensions required for the modelling. Atomic grain refers to the lowest level at which data is captured by the business.

         Facts: Facts are the things we measure as a result of a business process and is almost always a numeric value. Within a fact table, only facts consistent with the declared grain are allowed. It is the fact table that gives us the insight into all the aggregation and reports. The measures stored in the fact table is used for the calculations. Fact table mostly consists of foreign keys.

        Dimensions: it is a collection of reference information about a measurable fact. The data that doesn’t change very often are stored in the dimension table. Dimension table will have one primary key and all the related data.

Star schema typically consists of a fact table and all the related dimension tables. The facts and dimensions are joined together in a star schema. Below is an example of a star schema. Retail sales is the fact table with dimensions product, promotion, clerk, customer, store and date.



Four-step Dimensional Design Process
The major steps involved during the design phase for dimensional modelling are:
  •  Select the business process: Understanding of the business process is required for us to understand the information that we are trying to model.
  • Declare the grain: After understanding the business, we should be able to identify and declare the grain.
  • Identify the dimensions: Model the data which is going to be either constant or going to vary after a long time. This will be stored in the dimension table. Also assign the primary key.
  • Identify the facts: Understand and identify the measurable data which goes as measures and dimensions which needs to be stored as foreign keys in the fact table.
References:
The Data Warehouse Toolkit Third Edition - Ralph Kimball, Margy Ross