Thursday, November 27, 2014

Data Warehouse


Ab initio - Data Warehouse

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing.
Data Warehouse is a collection of data to support management’s decision.
Characteristics of a data warehouse as below:
           Subject-Oriented -> to help you analyze data
           Integrated -> put data from disparate sources into a consistent format
           Time-Variant -> focus on change over time
           Non-volatile -> once entered into the warehouse, data should not change


Advantages of Data Warehouses :
    Data warehouses store large volumes of data which are frequently used by decision Support Systems
       It is maintained separately from the organization’s operational databases
       Data warehouses are relatively static with only infrequent updates
      A data warehouse is a stand-alone repository of information, integrated from several, possibly heterogeneous operational databases
       High query performance
       queries not visible outside warehouse
       Local processing at source unaffected
       can operate when source unavailable
       can query data not stored in a DBMS



Data warehousing process overview:
          Extract from Source Systems
          Transform to required data (Staging area)
          Transfer to Data warehouse
          Produce reports from Data warehouse



Data warehouse Architecture:



There are three basic architectures for constructing a data warehouse:



·         Centralized

          Best for small and mid-size data warehouses in organization.
          Maximum size is 40+ Tb
     A Centralized Data Warehouse is a data warehousing implementation wherein a single data warehouse serves the needs of several separate business unites simultaneously using a single data model that spans the needs of multiple business divisions.

•             Federated

•          Best for very large organization where development is separated by geography, organizational boundaries, or where multiple data warehouses exists due to mergers & acquisitions.
•          The logical data warehouse is only virtual.


•             Tiered

•           The central data warehouse is physical
•           There exist local data marts on different tiers which store copies or  summarization of the previous tier.


Data Warehouse Model:

  • Enterprise warehouse
Collects all of the information about subjects spanning the entire organization
  • Data Mart
A subset of corporate-wide data that is of value to a specific groups of users. Its scope is confined to specific, selected groups, such as marketing data mart
  • Virtual warehouse
A set of views over operational databases
Only some of the possible summary views may be materialized


Data Warehouse vs. Operational DBMS
  • OLTP (on-line transaction processing)
          Major task of traditional relational DBMS
          Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll,
registration, accounting, etc.

  • OLAP (on-line analytical processing)
          Major task of data warehouse system
          Data analysis and decision making

  • Distinct features (OLTP vs. OLAP):
          User and system orientation: customer vs. market
          Data contents: current, detailed vs. historical, consolidated
          Database design: ER + application vs. star + subject
          View: current, local vs. evolutionary, integrated

          Access patterns: update vs. read-only but complex queries