BIntelligence: 2014

Thursday, March 13, 2014

Business Intelligence in HealthCare

One of the industries that seem to be adopting Business Intelligence at a fast rate is the Healthcare industry. Almost all of the large hospitals are using Business Intelligence platforms or tools and/or using information management services for several years. Even though health care industry is very well advanced technically, it is facing some serious issues related to finance. The diagnosis and treatment options available to patients are the most sophisticated in the world. New therapies and life saving pharmaceuticals are coming onto the market every year.

All of the databases in healthcare organizations contain a vast store of clinical, operational and insurance data, including patient histories, supply inventories, vendor invoices, admissions data and unstructured information such as patient verbatims, radiology images and scanned documents.With this data, hospitals are looking to find business solutions related to costs, revenues, quality of patient care and so on. A sample snippet of a healthcare organization's bus matrix.

Standard Architecture of Health-Care Information Systems

Some of the topics that hospitals look to BI tools to answer are:

Reduce costs
Methods to increase revenue
Manage supply chain
Analysis on effective vs non effective patient care
Claims and bill payment

A star schema can be useful in answering the Business related questions pertaining to Health Care. An example of a star schema is as follows. This is an accumulating snapshot fact table for Claims and bill payment.

There are instances where a patient might be diagnosed with multiple issues. Rather than having multiple foreign key references to an "issue" table it is preferred if we had a single diagnosis key which contains information about a patients specific diagnosis.

We model this by using bridge tables

Diagnosis group Key: Many-to many join to a diagnosis group bridge table which contains a separate row for each individual diagnosis in a particular group.

Some of the popular tools used currently in the health care industry are:

IBM Cognos
Microstrategy
Oracle
SAS
Tableau

Wednesday, February 26, 2014

Some facts are factless !

Even though the entire point of creating a fact table is to store some sort of measures, there do exist some fact tables without measures. These are called as Factless Fact table. There are situations in which an intersection of dimensions makes sense in data modelling. They are used to capture information and not perform any calculations. Factless fact tables do not capture any numeric or textual facts.

They are usually used in two kind of situations:

To record an event

Factless fact tables are used to track some sort of event such as student attendance or registration or tracking accident related events. These are events or activities that we wish to track but there are no measurements to record. In these scenarios, the grain of the fact table will be the event that occurred. If we track student attendance, the fact tables and dimension tables will look like this:

This dimensional schema can be used to answer many questions such as:

• Which classes were the most heavily attended?
• Which teachers taught classes in facilities belonging to other departments?
• Which facilities were the most lightly used?
• What was the average total walking distance of a student in a given day

To record certain conditions

These are mainly for negative analysis report. It is easy to track products that were sold and revenue generated on those products because there is an easily identifiable grain or transaction. But if we want to track the products that were not sold for promotion purposes, we might need Factless tables. It is often referred to as coverage tables.

The fact table can help us identify these details:

• Number of products that have promotions
• Number of products that have promotions that sell
• Number of products that have promotions that did not sell

These are mostly for administrative purposes.

References:

http://www.kimballgroup.com/1996/09/02/factless-fact-tables/
http://dwhlaureate.blogspot.com/2012/08/factless-fact-table.html
The Data Warehouse Toolkit Third Edition - Ralph Kimball & Margy Ross

Tuesday, February 11, 2014

Slowly Changing !

After having gone through the basics of dimensional modelling, it is time to get into a bit of details. It’s hard to sequentially go over all the topics of this course because the volume is enormous. Hence I am going to pick one topic (somewhat related to what I spoke about in my last blog) and then go deep into it.

There are certain dimensions that change over time and they are referred to as Slowly Changing Dimensions. Some of the dimensions that could change over time are selling prices, names of people, cities and so on. There are different techniques used to accommodate these changes into our dimensional model. They are called as different types of slowly changing dimensions.

Type 0: Retain original

With type 0, the dimension value never changes and hence the original always remains the same. An example of a type 0 is date dimension. Hence there is no need to change anything in the model.

Type 1: Overwrite

In this method, the new value overwrites the old value. It always reflects the most recent data. Hence no history is maintained. This method is easy to implement.

For example, if the state of a customer changes:

Customer ID	Name	State
C001	James	Tucson

After update:

Customer ID	Name	State
C001	James	Phoenix

Type 2: Add New Row

Type 2 requires us to add an additional row with the updated information. The old information is retained in the table. This helps us to track the history of the data. Three new columns will be added to indicate the latest information.

Row effective date or date timestamp
Row expiration date or date timestamp
Current row indicator

Customer ID	Name	State
C001	James	Tucson

After update:

Customer ID	Name	State	Row effective date	Row expiration date	Current row indicator
C001	James	Tucson	2/2/2014	2/11/2014	0
C002	James	Phoenix	2/11/2014		1

Type 3: Add new attribute

A new attribute is added to the dimension to preserve the old attribute values; the new value overwrites the old value just like in type 1. This is also referred to as alternate reality and this technique is used infrequently.

Customer ID	Name	State
C001	James	Tucson

After update:

Customer ID	Name	Original state	Current state
C001	James	Tucson	Phoenix

Kimball talks about another 4 types of slowly changing dimensions in his book.

Type 4: Mini-dimension

When a group of attributes are changed suddenly and they kind of form another group from the dimension, it is referred to as mini-dimension. Using this we capture the base dimension and the mini-dimension. Also called as rapidly changing monster dimension. Mini-dimension also requires its own primary key.

Type 5: Add mini-dimension and Type 1 Outrigger

Type 5 builds on type 4 by embedding a current type 1 reference to the mini-dimension in the base table. This technique is used to preserve the historical attribute values.

Type 6: Add Type 1 Attributes to Type 2 dimension

Similar to type 5, type 6 builds on the type 2 technique by embedding the current type 1 versions of the same attributes in the dimension row so that fact rows can be filtered or grouped.

Type 7: Dual Type 1 and Type 2 Dimensions

This is the final hybrid technique used to support both history and current attribute values. A fact table can be accessed from the type 1 dimension model which will give the current values or from the type 2 dimension model which will give the historical profiles.

References:

The Data Warehouse Toolkit Third Edition - Ralph Kimball & Margy Ross
http://dwhlaureate.blogspot.com/2012/09/

Tuesday, February 4, 2014

And the fact continues..!

So after about 3 weeks I think I am getting a hang of Business Intelligence. I am going to continue from where I left in my last blog. So as far as I understand, it all comes down to identifying grain, facts and dimensions.

Grain: Identifying the grain pretty much defines the type of data that we are modelling. The grain establishes exactly what a single fact row table represents. The grain have to be identified before capturing the facts and dimensions required for the modelling. Atomic grain refers to the lowest level at which data is captured by the business.

Facts: Facts are the things we measure as a result of a business process and is almost always a numeric value. Within a fact table, only facts consistent with the declared grain are allowed. It is the fact table that gives us the insight into all the aggregation and reports. The measures stored in the fact table is used for the calculations. Fact table mostly consists of foreign keys.

Dimensions: it is a collection of reference information about a measurable fact. The data that doesn’t change very often are stored in the dimension table. Dimension table will have one primary key and all the related data.

Star schema typically consists of a fact table and all the related dimension tables. The facts and dimensions are joined together in a star schema. Below is an example of a star schema. Retail sales is the fact table with dimensions product, promotion, clerk, customer, store and date.

Four-step Dimensional Design Process

The major steps involved during the design phase for dimensional modelling are:

Select the business process: Understanding of the business process is required for us to understand the information that we are trying to model.
Declare the grain: After understanding the business, we should be able to identify and declare the grain.
Identify the dimensions: Model the data which is going to be either constant or going to vary after a long time. This will be stored in the dimension table. Also assign the primary key.
Identify the facts: Understand and identify the measurable data which goes as measures and dimensions which needs to be stored as foreign keys in the fact table.

References:

http://searchdatamanagement.techtarget.com/definition/dimension

The Data Warehouse Toolkit Third Edition - Ralph Kimball, Margy Ross

Friday, January 24, 2014

MIS 587 Simplified .. In my head

There are hundred things going over my head when I think of Data Warehousing and Business Intelligence (DW/BI). But where to start and how to put it all down, I am still not sure.

Anyway, everything starts and ends with data. Data is growing rapidly every minute. How can we convert this data into useful information and use this data to add value to any business? This is the basic purpose of any kind of data analysis. Role of DW/BI is to extract the historical data, compare it with current data and perform any kind of analysis that will answer critical questions for various businesses. I am going to try and put some context here. If I am analyzing the accounting data of a firm, it will give me insights into things like,

Is there a way to reduce the expenses of the firm?
If a firm has given away discounts for a week, has it really made a profit for the firm?
If a subscription has been given away for customers, do they really stay after the subscription has expired?

And these are the questions I am trying to answer for my project as well. So I am pretty much surrounded by data and analytics for a while. These kind of questions can be easily answered using data analytics, DW/BI.

Now that I have explained what is “my understanding” of DW/BI, let us get a bit technical. We need the data in a form that is easy to understand and provides fast query performance. This is achieved using a technique called as “Dimensional Modelling”. It is quite different from Relational Database Management Systems in the sense that, RDBMS looks to remove redundancy by normalizing the data whereas dimensional modelling looks at simpler and faster retrieval of data.

One of the ways Dimensional Models are implemented is using Star Schema. Quite honestly, I used to think star schema is the most complex thing in the world. But once I understood what it was, it changed my perception of BI to being slightly simpler than I thought. Though it is way too early to pass this judgment. Star schema is all about facts and dimensions.

I am going to keep the discussion of grains, facts and dimensions for the next time.

References:

The Data Warehouse Toolkit - Third Edition, Ralph Kimball and Margy Ross

http://www.greenbookblog.org/2012/03/21/big-data-opportunity-or-threat-for-market-research/