DATA WAREHOUSE FREQUENTLY ASKED QUESTIONS
Data Warehouse Qestions
Differences between OLTP and OLAP?
Purpose: Day-Day business storage
Process: Front-End, Back-End process
a) Data: Less data
b) Model: Noramlized model (more tables,smaller tables)
c) Joins: More joins (So retrieval it is slower)
d) Access Frequency: High [Milliseconds to seconds]
e) Volatality: Volatile [Read, Ins, update, Delete]
Purpose: For analysis and mining
Process: Layered architecture
a) Data: More data
b) Model: DeNoramlized model (Less tables, complex tables)
c) Joins: Less joins (So retrieval it is faster)
d) Access Frequency: Low [Day/week/month...]
e) Volatality: NonVolatile [Read]
What is Datawarehouse [Single line] and warehouses available in the market?
Huge storage area which is suitable for decision making.
Ex: Teradata,Oracle,Sqlserver,Sybase,Netezza, GreenPlum etc...
Difference between OLAP and BI?
BI is a process which uses OLAP approach to gather, convert and presents the data.
Difference between Analysis and Mining?
Analysis talks about 'current understanding of data' and 'past analysis
of data',where as mining talks about 'future prediction of data'.
Note: Datamining is a knowledge analysis and discovery method.
How many types of decision making available in IT?
a) Strategic-- Past analysis [OLAP data]
b)Tactical--Current and recent [OLTP]
Difference between OLTP and ODS?
OLTP holds realtime data where as ODS holds near real time data[daily 4 times from OLTP]
ODS helps in two ways
a) Input to DWH
b) Incase enterprise database [OLTP] fails, it acts like enterprise db and continue
Note: All top companies use ODS in the ETL flow.
OLTP similar terminologies?
c)Business Processing Systems
d) Transactional systems
Can you describe an End-End typical ETL Flow?
How many life cycles does a DWH project need?
Three life cycles a)ETL b) Semnatic c) Reporting
Famous DWH life cycles: Agile, iterative incremental [Block approach], V model
What is granularity and which grain your project has?
Granularity talks about the level of detailed data maintaining in the project.
Dwh has multiple tables with multiple grains.
Always the lowest grain has lot of burden but improves analytical performance.
Ex: Day/ week/month etc..grains
Day grain is recommended for better analysis
Choose the below are required to implement a DWH Project?
a) Life Cycle
c) Data Model
d) Type of OLAP
f) Historical data
Correct; a, b, c, e
Explain DWH characterstics?
a) Subject oriented: Store data based on operation but not application.
Ex: Savings coount subject area
Application2 : a)Withdraw b)deposit
Subject Area: withdraw+deposit single area
b) Integrated: Integrate from diverse applications by eliminating inconsistencies
and following standards.
c) Nonvolatile: No change of data. [Don't modify the data]
Helps to maintain history.
d) Timevarient: Data store base on timeframes [granularity load / grain]
We can perform the below operations easily.
a) Current understanding
What is datamart and how many types available?
DataMart--Data sub store for specific business/ operation / functionality
May or not not be a subset of DWH.
Three types of datamarts
a)Dependent-- DWH created first and then datamart [subset of DWH]
Ex: ICICI DWH, Savings account dependent datamart
b) Independent-- Directly created from source systems [we don't use DWH here]
Ex: ICICI employee payroll System [No history, not much analysis required]
c) Logical Datamrt--It is replica of another data mart [Holds structure of datamart]
How do we create datamarts practically?
a) By using Complex Views [Materialized views] --Dependent DM
b) By using a complex table --Depedent DM
c) By using a seperate physical storage area --Independent DM
How many types of DWH approaches available, which approach your company following?
a) KIMBall approach--Datamarts-> DWH [Bottom up approach]
b) Inmon approach-- DWH -->Datamarts [Top down approach]
My company using KImbal approach for DWH implementation because of dynamic
decisions and adhod requests.
How many types of OLAP available, which is effective?
OLAP--Online Analytical Processing
It create aggregates(sum,avg, max,minstddev,covarience etc...) for decision making.
ROLAP: Relational OLAP--Data and aggrgates in the relational area [OLTP area]
Latency time less [fresh data anlysis]
Minimal set up
Ex: Small, Medium
MLOAP: Multidimensional OLAP-- Data and aggregates in mutidimensionalarea[ Cube]
Adv: More and detailed analysis this is helpful
Latency time is high [ fresh data analysis is not possible]
Complex set up
HOLAP: Data in relational area and aggreagates in multidimensional area [cube]
Ex: Latency time avg
Avg set up
DOLAP: Desktop OLAP: Here the data and aggregates on PC based desk top applications
Ex: Excel, LOtus, FoxPro,VISICALC etc...
Explain Time hierarchy?
Hour-->Day-->Week-->Fort Night-->Month-->Quarter-->half Year(Semister)-->Year
You have two databases with same size, then dow do you identify which is OLTP or OLAP?
Normalized model is OLTP, Denormalized model is OLAP.