Data Huge Storage.
Only Databases used for DWHs.
Teradata, Oracle, SQL Server , DB2 etc...
See the model of the database, if it is normalized model then OLTP, otherwise OLAP.
OLTP DB:
- Structure / Model: Normalized structure
-More tables available, Small sized tables.
-No duplicates [ no redundancy]
-Suitable for Select, Insert, Update, Delete etc... operations [ Volatality]
-To work with a specific business operation, -multiple tables requred, hence multiple joins required.
-Access frequency is high
Ex: Trains tickets booking store database
-Sizes in real-time: GBs-TB only [<1 TB]
OLAP DB:
-Structure / Model: De-Normalized structure
-Less tables available [but more data volumed tables]
-More duplicates [redundant data], hence we are able to maintain History
-Suitable for Retrieval or Analysis of Data [ Non-Volatality]
-To work with a specific business operations, less tables required, hence less joins sufficient.
-Access frequency is low
Ex: Train tickets historical bookings data store
-Sizes in real-time: TBs-PBs.
-BOA Production DWH size: 3 PB
-DBS Bank: 10 PB
1 Byte= 8 Bits
1 KB= 1024 Bytes
1 MB=1024 KB
1 GB=1024 MB
1 TB=1024 GB
1 PB=1024 TB
etc...
a) Characterstics / Principles [Inmon, Kimball, Kelly etc...]
b) Approach [Topdown Inmon approach / Bottomup Kimball approach]
c) Life Cycle [ Agile Waterfall, Iterative Incremental etc...]
Inmon Characterstics, so we called him with gratitude as "Father of DWH".
He suggested four principles to construct a DWH strongly
a)Subject Oriented
Keep the data in subject area based or business functionality based. So that analysis will be easier and respective team can go for it.
These technically called as "Datamarts".
Ex:
DWH Example: ICICI DWH
Datamart example: Savings account datamart, Current account datamart
b) Integrated
While loading data from different systems into DWH, we should follow some standard rules or protocols.
c) Non-Volatile
Do not change the data and always accumulate [increment] data, so that history maintained, and support detailed analysis.
d) Time varient
If you store data based on timeframes, then it is easier for the below types of analysis.
a) Current understanding
b) Past analysis
c) Future prediction
There are two types of approaches
a) Top-Down approach: DWH first----> Datamart [Inmon approach]
b) Bottom-up approach: Datamarts----> DWH [Kimball approach]
Most of the companies following KIMBALL approach because it is cost / resource / memory effective.
Mostly organizations using "Agile methodology".
Agile: Agility -->Quickly Changing
--Small teams [5-8 members]
--Fastest development and deployment approach
--Cross Functional Teams available
Requirements Gathering
Analysis
Design
Code
Test
Peer-Review
Deploy
--Daily Scrum meeting [ 15 minutes and Standup meeting]
DWH Projects require 3 Life cycles
a) ETL Life cycle [above steps]
b) Semantic Life Cycle [above steps]
c) Reporting Lfe Cycle [above steps]
Few tables loaded daily, few tables weekly, few tables monthly....
This is called "granularity". The level of data you are maintaining in DWH projects.
Low Grain and Recommended: "Day" [So that all upper levels of analysis we can do easily]
Hierarchy contain multiple levels and members. Minimum two levels required.
We usually create on textual columns.
Ex:
Location hierarchy
Continent-->Country-->States-->Districts-->Mandals-->Villages-->Wards
Date hieararchy:
Day-->Week-->Fortnight-->Month-->Quarter--> Semister--> Year-->Quadyear (leap year)--> Decade-->Century etc...
Time Hierarchy:
Milliseconds-->Seconds-->Minutes-->Hours
It is the combination of date and time value. Based on the milliseconds we would recognize the timestamp type.
YYYY-MM-DD HH:MI:SS.NNNNNN
As we have 6 Ns, it is timestmap (6).
Online Analytical processing.
There are 4 types of OLAP
a) ROLAP (Relational OLAP): Data and Aggregated information in relational format.
Ex:
Location wise total business value.
b) MOLAP (Multidimensional OLAP): Data and aggreated information in multidimensional format (cube area)
c) HOLAP (Hybrid OLAP): Data in relational format and aggregated data in multidimensional format
d) DOLAP (Desktop OLAP): Data and aggregated data in the form of documents [excel, foxpro,lotus etc...]
In real-time
DWH and BI projects mostly use MOLAP
Small and medium analytical applications in ROLAP and DOLAP (HR team, Executives, Managers etc...)
Here the warehouse support all types of data [structured, semi structured, and unstructured] and possible for all storages [cloud, on-premises].
Companies using along with regular warehouse, Data lakes for this purpose.
Data Lake: Which stores all types of data, structured data in "relational format", and semi structured and unstructured data in "file stream format"
Popular lakes are a) Azure Data Lake b) Hadoop Data Lake
Simply insight talks about indetailed data presentation.
Quick Insights are automatically generated visuals in Power BI.
What are the famous big data storages in Azure?
Blob storage, Data Lake, AZURE SQL Data warehouse, Snow Flake etc.
Now a days in two places
a) On-Premises
SQL Database
Datawarehouse
Analysis Services Databases [tabular model and multidimensional model]
b) Cloud [Azure]
SQL Database
Datawarehouse
Analysis Services Databases [Tabular Model]
Data Lakes
Blob storage [Binary Large Objects]