DATA MODELING FREQUENTLY ASKED QUESTIONS
Data Modeling Questions
What is Data Model?
Modeling the data. Arrangement of data inside database.
What is the purpose of data model?
If we don't arrange properly data inside database, it may not be useful up to the mark.
So, arranging data helps to retrive faster (reporting and analysis happen faster)
and reduces storage time (load time decreases).
How do you differentiate whether a field is textual / measurable?
Textual attributes do not support aggregate operations, where asmeasrable attributes support.
How many types of data models available?
There are types of models
a) E-R Model (Normalization model)
Usage: OLTP systems follow Normalization methodology
Terminology: Entity, attribute, primary key, foreign key etc...
Entity-->Is an object which is differentiated to another object.
Entity holds textual and measuable information.
Ex: Product: prid(int) pname(char) pcode(int) pcost (decimal) psales(decimal)
Organization: orid(int),orname(varchar),orincome (decimal),orgrowthper(int)
Textual info: prid,pname,pcode, orid,orname
Measurable info: pcost,psales,orincome,orgrowthper
Relationships: 1-1: One entity related to another
1-Many : One entity related to many other
Many-Many: Many entities related to many other
b) Denormalization Model (Dimensional model)
Usage: OLAP, DSS and BI applications
Terminology: Dimension, Fact, [composite] primary key, surrogate key etc...
Dimension: Holds only Textual info
Fact: Holds only Measurable info
Relation Ships: Star schema,snow flake schema, Hybrid scheme and Galaxy schema.
What is dimension and explain dimension table features?
Dimension is a textual attribute. If you keep set of textual attributes, then it is a dimension table.
a) Master table
b) Contains primary key
c) Answer for textual questions such as What, Where, When, Why etc...
d) Rare change table
e) Direct load table
f) More columns [wide table]
g) Less rows
What is Fact and explain Fact table features?
Fact is a measurable attribute.
It has two sections
a) Foreign key references from dimension
a) Child/ Trnsaction table
b) Contains composite primary key (with set of foreign key combination)
c) Answer for Measurable questions such as How much, How many etc...
d) Frequently change table [based on the business]
e) InDirect load table [a) Dimensions verification first b) Fact load]
f) Less columns
g) More rows and years and years [Deep table]
What is surrogate key and where do we use?
It is a unique values key which is generated by the technical people in BI projects.
So, we call it as BI Key / DWH Key.
a) Contain unqiue values
b) Values may be populated from database sequence generation method or manual algorithm
c) Non change of values
Real time usage:/li><
a) In fact table to reduce record identification complexity
b) In dimension table to reduce the natural primary key changes affect in the fact table.
What kind of relationships we can establish in dimensonal model?
a) Star schema : Fact table surrounded by dimensions
b) Snow flake schema : fact surrounded by dimensions and dimensions having sub dimensions.
c) Mixed schema / Star Flake schema: Combination of star and snow flake
d) Galaxy schema / fact constellation schema : Multiple fact tables share common dimensions.
How many types of dimensions available ?
d) Role playing
f) SCD--Slowly Changing Dimension
How many types of facts / measures available?
a) Fully additively
b) Semi additive
c) Non additive
What is Factless fact table?
What is cube and explain the advantages?.
Which model you are using in the project?
My project DWH model is Galaxy or star flake, where my application is in Star schema/ snow flake.
Usually how many dimensions connected to a fact table?
10-15 dimensions.More than this need more joins and decreases performance at the
time of retrieval [ reporrting and analysis takes time]
On what basis you identify the business analysis?
Fact table basis.
If the fact table holds sales data, then it is Sales analytical system.
If it holds shipping data, then it is Shipping analytical system.
Tell me the tables order of load in data model projectts?
Are there any hiccups?
Master Dimensions--> Child dimensions-->Fact tables
Master dimension load-->Direct load
Child dimension load-->Validatingaginst master and load
Fact load--> Validate against all dimensions and load.
Which tables data change freqently and tell me load process in realtime?
Dimensions data change is rare change.
Fact data change is frequnt change [ Daily load]