Ab initio - Data
Warehouse
A data warehouse is a relational
database that is designed for query and analysis rather than for transaction
processing.
Data Warehouse is a collection of
data to support management’s decision.
Characteristics of a data
warehouse as below:
•
Subject-Oriented -> to help you
analyze data
•
Integrated -> put data from disparate
sources into a consistent format
•
Time-Variant -> focus on change over
time
•
Non-volatile -> once entered into the
warehouse, data should not change
Advantages of Data Warehouses :
• Data warehouses store large volumes of data
which are frequently used by decision Support Systems
•
It is maintained separately from the
organization’s operational databases
•
Data warehouses are relatively static with only
infrequent updates
• A data warehouse is a stand-alone repository of
information, integrated from several, possibly heterogeneous operational
databases
•
High query performance
•
Local processing at source unaffected
•
can operate when source unavailable
•
can query data not stored in a DBMS
Data warehousing process overview:
•
Extract from Source Systems
•
Transform to required data (Staging area)
•
Transfer to Data warehouse
•
Produce reports from Data warehouse
·
Centralized
•
Best for small and mid-size data warehouses in
organization.
•
Maximum size is 40+ Tb
• A Centralized Data Warehouse is a data
warehousing implementation wherein a single data warehouse serves the needs of
several separate business unites simultaneously using a single data model that
spans the needs of multiple business divisions.
• Federated
• Best for very large organization
where development is separated by geography, organizational boundaries, or
where multiple data warehouses exists due to mergers & acquisitions.
• The
logical data warehouse is only virtual.
• Tiered
• The
central data warehouse is physical
• There exist local data marts on
different tiers which store copies or summarization of the previous tier.
- Enterprise
warehouse
Collects all of the
information about subjects spanning the entire organization
- Data Mart
A subset of
corporate-wide data that is of value to a specific groups of users. Its scope
is confined to specific, selected groups, such as marketing data mart
- Virtual
warehouse
A
set of views over operational databases
Only
some of the possible summary views may be materialized
Data Warehouse vs. Operational DBMS
- OLTP (on-line transaction processing)
•
Major task of traditional relational DBMS
•
Day-to-day operations: purchasing, inventory,
banking, manufacturing, payroll,
registration, accounting, etc.
- OLAP (on-line analytical processing)
•
Major task of data warehouse system
•
Data analysis and decision making
- Distinct features (OLTP vs. OLAP):
•
User and system orientation: customer vs. market
•
Data contents: current, detailed vs. historical,
consolidated
•
Database design: ER + application vs. star +
subject
•
View: current, local vs. evolutionary,
integrated
•
Access patterns: update vs. read-only but
complex queries


No comments:
Post a Comment