Modeling the data. Arrangement of data inside database.
If we don't arrange properly data inside database, it may not be useful up to the mark.
So, arranging data helps to retrive faster (reporting and analysis happen faster)
and reduces storage time (load time decreases).
Textual attributes do not support aggregate operations, where asmeasrable attributes support.
There are types of models
a) E-R Model (Normalization model)
Usage: OLTP systems follow Normalization methodology
Terminology: Entity, attribute, primary key, foreign key etc...
Entity-->Is an object which is differentiated to another object.
Entity holds textual and measuable information.
Ex: Product: prid(int) pname(char) pcode(int) pcost (decimal) psales(decimal)
Organization: orid(int),orname(varchar),orincome (decimal),orgrowthper(int)
Textual info: prid,pname,pcode, orid,orname
Measurable info: pcost,psales,orincome,orgrowthper
Relationships: 1-1: One entity related to another
1-Many : One entity related to many other
Many-Many: Many entities related to many other
b) Denormalization Model (Dimensional model)
Usage: OLAP, DSS and BI applications
Terminology: Dimension, Fact, [composite] primary key, surrogate key etc...
Dimension: Holds only Textual info
Fact: Holds only Measurable info
Relation Ships: Star schema,snow flake schema, Hybrid scheme and Galaxy schema.
Dimension is a textual attribute. If you keep set of textual attributes, then it is a dimension table.
Features:
a) Master table
b) Contains primary key
c) Answer for textual questions such as What, Where, When, Why etc...
d) Rare change table
e) Direct load table
f) More columns [wide table]
g) Less rows
Fact is a measurable attribute.
It has two sections
a) Foreign key references from dimension
b) Measures
Features:
a) Child/ Trnsaction table
b) Contains composite primary key (with set of foreign key combination)
c) Answer for Measurable questions such as How much, How many etc...
d) Frequently change table [based on the business]
e) InDirect load table [a) Dimensions verification first b) Fact load]
f) Less columns
g) More rows and years and years [Deep table]
It is a unique values key which is generated by the technical people in BI projects.
So, we call it as BI Key / DWH Key.
Features:
a) Contain unqiue values
b) Values may be populated from database sequence generation method or manual algorithm
c) Non change of values
Real time usage:
a) In fact table to reduce record identification complexity
b) In dimension table to reduce the natural primary key changes affect in the fact table.
a) Star schema : Fact table surrounded by dimensions
b) Snow flake schema : fact surrounded by dimensions and dimensions having sub dimensions.
c) Mixed schema / Star Flake schema: Combination of star and snow flake
d) Galaxy schema / fact constellation schema : Multiple fact tables share common dimensions.
a) Conformed
b) Junk
c) Degenrated
d) Role playing
e) Dirty
f) SCD--Slowly Changing Dimension
a) Fully additive
b) Semi additive
c) Non additive
A Fact Table without any value added facts.
Situations: Event Covering, Event Recording etc.
Example: Student attendance fact table does not hold any value added facts.
a) Fully additive
b) Semi additive
c) Non additive
My project DWH model is Galaxy or star flake, where my application is in Star schema/ snow flake.
10-15 dimensions.More than this need more joins and decreases performance at the time of retrieval [ reporrting and analysis takes time]
Fact table basis.
If the fact table holds sales data, then it is Sales analytical system.
If it holds shipping data, then it is Shipping analytical system.
Master Dimensions--> Child dimensions-->Fact tables
Master dimension load-->Direct load
Child dimension load--> Validating aginst master and load
Fact load--> Validate against all dimensions and load.
Dimensions data change is rare change.
Fact data change is frequnt change [ Daily load]