You have two databases with same size, then dow do you identify which is OLTP or OLAP?
Normalized model is OLTP, Denormalized model is OLAP
Need of Data warehouse?
1)To Analysis of data and history maintenance.
2)Companies require Strategic information to face the competition in market.
3)The Operation system are not designed for strategic information.
4)To maintain history of data for whole Organization and to have a single place where the entire data stored.
What is data warehousing and Explain Approaches?
Many companies follow either characteristic defined by W.H.Inmon or Sean kelly.
Subjected Oriented, Integrated, Non Volatile,Time Variant
b)Sean Kelly definition
Seperate, Available, Integrated,TimeStamped,Suject Oriented, Non Volatile, Accessible
There are two Approches
1.Top Down by Inmon
2.Bottom Up by Ralph kimbal
Inmon approach:Enterprise datawarehouse structured first and next Datamart created.(TopDown).
Ralph kimbal:Datamart designed first, later Datamarts to Datawarehouse designed.(BottomUp).
What are the responsibilities of a data warehouse consultant/professional?
The basic responsibility of a data warehouse consultant is to ‘publish the right data’.
Some of the other responsibilities of a data warehouse consultant are:
1. Understand the end users by their business area, job responsibilities, and computer
2. Find out the decisions the end users want to make with the help of the data warehouse.
3. Identify the ‘best’ users who will make effective decisions using the data warehouse
4. Find the potential new users and make them aware of the data warehouse.
5. Determining the grain of the data.
6. Make the end user screens and applications much simpler and more template driven.
What are fundamental stages of Data Warehousing?
a)Offline Operational Databases - Data warehouses in this initial stage are developed by simply copying the database of an operational system to an off-line server where the processing load of reporting does not impact on the operational system's performance.
b)Offline Data Warehouse - Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented data structure.
c)Real Time Data Warehouse - Data warehouses at this stage are updated on a transaction or event basis, every time an operational system performs a transaction (e.g an order or a delivery or a booking etc.)
d)Integrated Data Warehouse - Data warehouses at this stage are used to generate activity or transactions that are passed back into the operational systems for use in the daily activity of the organization.
What is Datamart Explain Types?
It is a specific Subject area or Functionality or Task.
It is Designed to facilitate end user Analysis.
Wrong Answer-- It is a subset of warehouse
Types of Datamarts
Dependent, Independent, Logical.
Dependent: Created from Datawarehouse as a seperate physical store.
Independent :Created directly from operational systems to a seperate physical store.
Logical: Exists as a subset of existing Datawarehouse.
SSRS Interview Questions ( Simple, Medium and Complex)
What is Report?
Data display component with visuals is called report
Show the column headings in all pages,by going to advanced.
RepeatOn Newpage: True
It will show table header in every page]
Displaying Fixed Number of Rows per SSRS Report Page
Report, Page--->InteractiveSize---> Height
Change the height properties by 5.5 and you will be able to show 20 rows per page.
PageBreaks: Between each instance of a group
One option is to generate group rows in sql query, for this you can use row_number() or to evenly devide you can use NTILE() in sql query.
Based on that group number, you can create a group and set a page break after each group. So, only 10 rowes will get displayed in each page.
Report, Page--->InteractiveSize---> Height
Change the height properties by 5.5 and you will be able to show 20 rows per page.
Scenario: Display Male when true is choose, Display Female info When False choosen.
a) Create parameter
b) Goto Dataset
Where p.Gender= (Case when @ParBool='True' then 'Male' else 'Female' End)
c) Preview--> True-->View report displays only Male data
Q: How many reports generated in your 3 years of exp?
First check interviewer mood and knowledge [Presence of mind]
I have created and manipulated many reports in the last three years.
It could be around 40-50.
Q: How many days you took to create a report?
We have different types of reporting situations and took time accordingly.
Some of them in a day, in a week, in a month and dash board kind of stuff months.
SSIS Interview Questions ( Simple, Medium and Complex)
Q: Which edition of MSBI is used by the companies?
Ans: Business Intelligence Edition
Q: What is SSIS?
Ans: It is Integration of three services[ETL, Windows Admin,Sqlserver Admin]
Q: What is the previous component of SSIS?
Ans: DTS[Data Tranmsformation Services]
Q: SSIS version differences.
Q: How many engines does SSIS use and explain the operations?
Ans: Two engines SSIS use a) SSIS Runtime Engine--Starts when pkg starts
2) Data Pipeline engine--Starts when DataFlow Task Starts
Q: What is inmemory process?
Ans: DataFlow opertation is Inmemory process. It retrives source data into
SSIS internal memory (inmemory) and perform operations (options) and loads
from the memory to Target.
Q: What is Solution, Project and Package?
Solution--Is a container for one or more projects
Project--Collection of packages and related files
Package--Collection of tasks and containers.
Q: Whare the components a package has and explain in single line?
It has five components.
a) Control flow-- controls various tasks and containers with precedence constraints
b)Data Flow--contains sources, targets and transformations
c) Event Handler--Perfoms action based on event
d) Parameters--Similar to variables but take user interaction, helpful in catalog deployment
e) Package Explorer--Single view of all components inside package
Q: Difference between control flow and data flow?
Ans: COntrol flow is the brain of apckage and controls various tasks and containers.
It has ETL,WindowsAdmin and SqlServer Admin tasks.
Data flow is a component inside control flow and it is heart of ETL process.
It transfors data from sources to destinations by doing operations.
Q: Difference between Variable and Parameter?
a)Take value and support user interaction at runtime
b)Scope is across package
a) Take value and does not support user interaction
b) Scope at Package/ Container/ Task Level
c) Variables of two types 1. System Defined 2. User Defined variables
Q: What is the difference between System defined and User defined variables?
System defined variables crated by the system at the time of package creation
Store under System namespace.
Generally they hold Audit information
Ex: Execution Time,Package Name, User Name etc
User defined variables crated by the user.
Store under User namespace.
Generally they hold user information
Three scopes a)Package b)Container c) Task
Ex: Numeric value 10
String Value MSBIClass etc...
Q: What are the extensions of Solution, Project,Package, Paramter and connection Strings?
Connection Manager: .conmgr
Q: When do we Miscellaneous folder in SSIS?
Ans: Generally Business Docs, LLDs, and other credential docs are maintained here.
Q: Difference between COntainer and Group?
Group is logical object which hold tasks and containers.
No properties associated to group.
Groups are ungrouped.
Container is physical object which hold tasks and containers.
Properties associated to container.
Containers support only deletion.
Q: How do yopu identify package and task functionality easily?
Ans: By using naming conventions and annotations[detailed descriptions]
Q: How many ways we can execute package ?
Ans: 8 ways [Materail]
Q: How many colors a package uses?
Ans: 5 colors
White--Ready to execute
Q: How many types of files SSIS support
Ans: 4 types a) Flatfile b)Excel c) XML d) Raw file
Note: Refer to material for explanation
Q: What is CSV file and how do you open it?
Ans: Comma Seperated Value file, we open in two ways
1. Flatfile 2. Excel
Q: How do you use csv file in SSIS?
Ans: Flatfile source
Q: Which extension of flatfile data you get mostly?
a) txt--text file
c) out--output file
d)csv--comma seperated value file
Q: What is delimiter and explain some dlimiters?
Ans: Data seperator is delimiter.
Ex: , | : $ & @ LF CRLF etc...
Q: What is CRLF?
Ans: Carriage return[First char of a line] line feed/ forward [New line]
New line first character--Enter key
Q: How many types of flatfiles avaulable?
Ans: Two types a) Fixed length b) Delimiter
Q: What is TextQualifier, how is it helpful?
Ans: Qulifies the text with some indicators.
Ex: 'vinay'--single quote is text qualifier
Q: How many settings we need to know before processing flatfile?
Ans: 7 settings [ refer to material]
Q: What is codepage, which code page is useful in realtime to process all types of data?
Ans: Encrypted charset is codepage,in realtime Unicode or UCS [2/4/8] suitable.
Q: Data loading how many ways in real time?
a) Truncate and load
b) Incremental load [it is of three types 1.Direct incremental 2. SCD load 3. CDC Load]
Q: What is Delta Load?
Ans: Differentiated records load is delta load.
Q: What is SCD?
Ans: Slowly changing dimension for implementung actions for new and modified records.
Q: What is CDC?
And: Change Data Capture for taking sources changes and doing some actions.
Source changes are Inserts,updates and deletes.
Q: Can you give alternate terminologies for source, target, import and export?
Ans: Source--> Data producer/provider/feed/input
Target-->Desintation/ Data consumer/ Data output
Q: Can yopu explain me Dataflow task features briefly?
Ans: Moves data from source to destination.
Uses internal buffer while processing [in memory process]
Each data flow task initiates one Execution Tree internally.
Q: Why DataFlow task runs faster than all other tasks in SSSI?
Ans: Seperate process [in memory process] and seperate engine.
Q: What type of default datatypes dows a Flatfile and Excel take?
Ans: Flatfile take : DT_STR Excel ake:DT_Wstr
Q: What is FastParse option in flatfile source?
Ensures faster compilation or validation incase of date and numeric columns.
If we don't mention this option at flatfile source, but you specified to one of the
column as DT_I4, then every value validated for numeric evalution.
Q: What is the diff between External and Output cols?
External cols are source cols, where the propertie should not be modified.
Output cols are columns passed to the next component,where the properties could
Q: Why SQL command option is helpful to process source data?
Ans: We will get required columns and rows from single or multiple tables.
Reduces network traffic, burden and load.
Q: What types of data loadng you have in the project?
Ans: In my DWH, there are hundreds of tables and multiple types of loading.
a) Direct Lod (Truncate and Load)
b) Incrmental Load (Direct b/ Delta Load)
c) Incrmental Load (SCD)
d) Incremental Load (CDC)
Q: How SCD is different from CDC?
Source changes carry forwarding is called as CDC [Inserts,updates and deletions]
SCD handle new and old records by doing Insert and Update operations.
Q: Which SCD method you use most in real time?
Depending on table situation we go for SCD type.
Some tables are Non history, Some tables are full history and some are partial history.
But I found mosty SCD date process for analytical reporting purpose.
Q: Explain about SSIS Isolation Levels?
There are 7 options for this setting.
Unspecified: A different isolation level than the one specified is being used, but the level cannot be
determined. Huh? I think that’s just another way of saying that the isolation level isn’t set for this
container but that it is set for the package, or container that the task resides in.
Chaos: The same as ReadUncommitted but it checks the isolation level of pending transactions during a write
operation so that transactions with more restrictive isolation levels are not overwritten.
ReadUncommitted: Does not lock so a transaction can read uncommitted rows (dirty read).
ReadCommitted: Can only read records from committed transactions and issues a lock during the read. However
it does not prevent records from being added, deleted or changed which can allow for a repeatable read (phantom
RepeatableRead: Locks are held until the transaction completes. It also means that the transaction cannot read
data that has changed but has not been committed (ReadCommitted).
Serializable: Locks the entire data set until the transaction completes. It guarantees that the result set
does not change during the transaction
Snapshot: The transaction takes a snapshot of the read at the beginning of the transaction. As such, data will
not change. This property cannot be set for a package. It is an available option but it throws an error.