Tuesday, May 24, 2011

DWH CONCEPTS


DIMENSION TABLE VS FACT TABLE
 
Ans
DIMENSION TABLE
FACT TABLE
It provides the context /descriptive information for a fact table measurements.
It provides measurement of an enterprise.
Structure of Dimension - Surrogate key , one or more other fields that compose the natural key (nk) and set of Attributes.
Measurement is the amount determined by observation.
Size of Dimension Table is smaller than Fact Table.
Structure of Fact Table - foreign key (fk), Degenerated Dimension and Measurements.
. In a schema more number of dimensions are presented than Fact Table.
Size of Fact Table is larger than Dimension Table.
Surrogate Key is used to prevent the primary key (pk) violation(store historical data).
In a schema less number of Fact Tables observed compared to Dimension Tables.
Provides entry points to data.
Compose of Degenerate Dimension fields act as Primary Key.
Values of fields are in numeric and text representation.
Values of the fields always in numeric or integer form.

What is Normalization?

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalizaton process::

Eliminating redundant data.
Ensuring data dependencies.

First Normal Form

First Normal form(1 NF) sets the very basic rules for an organized database.
  • Eliminate duplicative columns from the same table
  • Create seperate tables for each group of related data and identify each row with a unique column or set of columns(the promary key)
Second Normal Form                                                                                                
Second Normal form(2 NF) further addresses the concept of removing duplicative data.
  • Meet all the requirements of teh first normal form.
  • Remove subsets of data that apply to multiple rows of a table and place them in seperate tables.
  • Create relationships between these new tables and their predecessors through the use of foreign keys.
Third Normal Form

Third Normal form(3 NF) remove columns which are not dependent upon the primary key.
  • Meet all the requirements of the second normal form.
  • Remove columns that are not dependent upon the primary key.

Junk dimension
A Junk Dimension is a dimension table consisting of attributes that do not belong in the fact table or in any of the existing dimension tables. The nature of these attributes is usually text or various flags, e.g. non-generic comments or just simple yes/no or true/false indicators.

Junk dimensions are also appropriate for placing attributes like non-generic comments from the fact table. Such attributes might consist of data from an optional comment field when a customer places an order and as a result will probably be blank in many cases. Therefore the junk dimension should contain a single row representing the blanks as a surrogate key that will be used in the fact table for every row returned with a blank comment field

Degenerate dimension

A dimension key, such as a transaction number, invoice number, ticket number, or bill-of-lading number, that has no attributes and hence does not join to an actual dimension table. Degenerate dimensions are very common when the grain of a fact table represents a single transaction item or line item because the degenerate dimension represents the unique identifier of the parent. Degenerate dimensions often play an integral role in the fact table's primary key.

Role-playing dimensions

Dimensions are often recycled for multiple applications within the same database. For instance, a "Date" dimension can be used for "Date of Sale", as well as "Date of Delivery", or "Date of Hire". This is often referred to as a "role-playing dimension".

STAR SCHEMA

Star schema architecture is the simplest data warehouse design. The main feature of a star schema is a table at the center, called the fact table and the dimension tables which allow browsing of specific categories, summarizing, drill-downs and specifying criteria.
Typically, most of the fact tables in a star schema are in database third normal form, while dimensional tables are de-normalized (second normal form).

Fact table
   The fact table is not a typical relational database table as it is de-normalized on purpose - to enhance query response times. The fact table typically contains records that are ready to explore, usually with ad hoc queries. Records in the fact table are often referred to as events, due to the time-variant nature of a data warehouse environment.
The primary key for the fact table is a composite of all the columns except numeric values / scores (like QUANTITY, TURNOVER, exact invoice date and time).
Typical fact tables in a global enterprise data warehouse are (apart for those, there may be some company or business specific fact tables)
Dimension table
Nearly all of the information in a typical fact table is also present in one or more dimension tables. The main purpose of maintaining Dimension Tables is to allow browsing the categories quickly and easily.
The primary keys of each of the dimension tables are linked together to form the composite primary key of the fact table. In a star schema design, there is only one de-normalized table for a given dimension.
Typical dimension tables in a data warehouse are:

time dimension table
customers dimension table
products dimension table
key account managers (KAM) dimension table
sales office dimension table

4 comments: