What is Data Warehouse?
Ø
Defined in many different ways, but not
rigorously
-
A decision support database that is maintained
separately from the organization’s operational database.
-
A consistent database source that bring together
information from multiple sources for decision support queries.
-
Support information processing by providing a
solid platform of consolidated, historical data for analysis.
History of Data Warehousing
Ø
In the 1990’s executives became less concerned
with the day-to-day business operations and more concerned with overall
business functions
Ø
The data warehouse provided the ability to
support decision making without disrupting the day-to-day operations, because;
-
Operational information is mainly current – does
not include the history for better decision making
-
Issues of quality information
-
Without information history, it is difficult to
tell how and why things change over time
Data warehouse fundamentals
Ø
Data warehouse – A logical collection of
information – gathered from many different operational databases – that
supports business analysis activities and decision-making takes
Ø
The primary purpose of a data warehouse is to
combined information throughout an organization into a single repository for
decision-making purposes – data warehouse support only analytical processing
Data warehouse model
Ø
Extraction, transformation and loading (ETL) – A
process that extracts information from internal and external databases, transforms
the information using a common set of enterprise definitions, and loads the
information into a data warehouse.
Ø
Data warehouse then send subsets of the
information to data mart.
Ø
Data mart – contains a subset of data warehouse
information.
Multidimensional Analysis and Data Mining
Ø
Relational Database contains information in a
series of two-dimensional tables.
Ø
In a data warehouse and data mart, information
is multidimensional, it contains layers of columns and rows
-
Dimension – A particular attribute of
information
Ø
Cube – common term for the representation of
multidimensional information
Ø
Once a cube of information is created, users can
begin to slice and dice the cube to drill down into the information.
Ø
Users can analyze information in a number of
different ways and with number of different dimensions.
Ø
Data Mining – the process of analyzing data to
extract information not offered by the raw data alone. Also known as “knowledge
discovery” – computer-assisted tools and techniques for sifting through and
analyzing vast data stores in order to finds trends, patterns and correlations
that can guide decision making and increase understanding
Ø
To perform data mining users need data-mining
tools
-
Data-mining tool – uses a variety of techniques
to finds patterns and relationships in large volumes of information. Eg:
retailers and use knowledge of these patterns to improve the placement of items
in the layout of a mail-order catalog page or Web page.
Information Cleansing or Scrubbing
Ø
An organization must maintain high-quality data
in the data warehouse
Ø
Information cleansing or scrubbing – A process
that weeds out and fixes or discards inconsistent, incorrect or incomplete
information
Ø
Occurs during ETL process and second on the
information once if is in the data warehouse
Ø
Contract information in an operational system
Ø
Standardizing Customer name from Operational Systems
Ø
Information cleansing activities
-
Missing Records or Attributes
-
Redundant Records
-
Missing Keys or Other Required Data
-
Erroneous Relationships or References
-
Inaccurate Data
Ø
Accurate and complete information
Business Intelligence
Ø
Business Intelligence – refers to applications
and technologies that are used to gather, provides access, analyze data and
information to support decision making efforts
Ø
These systems will illustrate business
intelligence in the areas of customer profiling, customer support, market
research, market segmentation, product profitability, statistical analysis, and
inventory and distribution analysis to name a few
Ø
Eg; Excel, Access
Question?
1.
Describe the roles and purposes of data
warehouse and data marts in an organization
2.
Compare the multidimensional nature of data
warehouses (and data marts) with the
two-dimensional nature of databases
3.
Identify the information of ensuring the
cleanliness of information throughout an organization
4.
Explain the relationship between business
intelligence and a data warehouse