Categories of OLAP
Categories of OLAP tools
MOLAP, ROLAP, HOLAP, DOLAP
OLAP extension to SQL
ROLLUP, CUBE, RANK() OVER, Windowing
2
Categories of OLAP Tools
OLAP tools are categorized
according to the architecture used to store and process
multi-dimensional data.
There are four main categories:
Multi-dimensional OLAP (MOLAP) Relational OLAP (ROLAP)
Hybrid OLAP (HOLAP) Desktop OLAP (DOLAP)
Multi-dimensional OLAP
(MOLAP)
Use specialized data structures and
multi-dimensional Database
Management Systems (MDDBMSs) to organize, navigate, and analyze data.
Data is typically aggregated and
stored according to predicted usage to enhance query performance.
4
Multi-dimensional OLAP
(MOLAP)
Use array technology and efficient
storage techniques that minimize the disk space requirements
through sparse data management.
Provides excellent performance
when data is used as designed, and the focus is on data for a specific
Multi-dimensional OLAP
(MOLAP)
Traditionally, require a tight
coupling with the application layer and presentation layer.
Recent trends segregate the OLAP
from the data structures through the use of published application programming interfaces (APIs).
6
Typical Architecture for
MOLAP Tools
Query on multidimensional database, cubes More storage, less time
MOLAP Tools - Development
Issues
Underlying data structures are
limited in their ability to support multiple subject areas and to
provide access to detailed data.
Navigation and analysis of data is
limited because the data is
designed according to previously determined requirements.
8
MOLAP Tools - Development
Issues
MOLAP products require a different
set of skills and tools to build and maintain the database, thus
increasing the cost and complexity of support.
Relational OLAP (ROLAP)
Fastest-growing style of OLAP
technology due to requirements to analyze ever-increasing amounts of data and the realization that
users cannot store all the data they require in MOLAP databases.
10
Relational OLAP (ROLAP)
Supports RDBMS products using a
metadata layer - avoids need to create a static multi-dimensional data structure - facilitates the
creation of multiple
multi-dimensional views of the two-dimensional relation.
Relational OLAP (ROLAP)
To improve performance, some
products use SQL engines to
support the complexity of
multi-dimensional analysis, while others recommend, or require, the use of highly denormalized database
12
Typical Architecture for
ROLAP Tools
Query on fact and dimension tables Less storage, more time
ROLAP Tools - Development
Issues
Performance problems associated
with the processing of complex
queries that require multiple passes through the relational data.
Middleware to facilitate the
development of multi-dimensional
applications. (Software that converts the two-dimensional relation into a multi-dimensional structure).
14
ROLAP Tools - Development
Issues
Development of an option to create
persistent, multi-dimensional
structures with facilities to assist in the administration of these
Hybrid OLAP (HOLAP)
Provide limited analysis capability,
either directly against RDBMS products, or by using an intermediate MOLAP
server.
Deliver selected data directly from the DBMS or via a MOLAP server to the
desktop (or local server) in the form of a data cube, where it is stored, analyzed, and maintained locally.
16
Hybrid OLAP (HOLAP)
Promoted as being relatively
simple to install and administer with reduced cost and
Typical Architecture for
HOLAP Tools
18
HOLAP Tools - Development
Issues
Architecture results in significant data redundancy and may cause problems for networks that support many users.
Ability of each user to build a custom data cube may cause a lack of data consistency among users.
Only a limited amount of data can be efficiently maintained.
Desktop OLAP (DOLAP)
Store the OLAP data in
client-based files and support multi-dimensional processing using a client multi-dimensional engine.
Requires that relatively small
extracts of data are held on client machines. They may be distributed in advance, or created on demand (possibly through the Web).
20
Desktop OLAP (DOLAP)
As with multi-dimensional
databases on the server, OLAP data may be held on disk or in RAM, however, some DOLAP
products allow only read access.
Most vendors of DOLAP exploit the
power of desktop PC to perform some, if not most,
Desktop OLAP (DOLAP)
The administration of a DOLAP
database is typically performed by a central server or processing
routine that prepares data cubes or sets of data for each user.
Once the basic processing is done,
each user can then access their portion of the data.
22
Typical Architecture for
DOLAP Tools
DOLAP Tools
-Development Issues
Provision of appropriate security controls to support all parts of the
DOLAP environment. Since the data is physically extracted from the system, security is generally implemented by limiting the information compiled into each cube. Once each cube is uploaded to the user's desktop, all additional
meta data becomes the property of the local user.
24
DOLAP Tools
-Development Issues
Reduction in the effort involved in
deploying and maintaining the DOLAP tools. Some DOLAP vendors now
provide a range of alternative ways of deploying OLAP data such as through e-mail, the Web or using traditional
client/server architecture.
Current trends are towards thin client machines.
OLAP Extensions to SQL
Advantages of SQL include that it is easy to learn, non-procedural,
free-format, DBMS-independent, and that it is a recognized international standard. However, major limitation of SQL is the
inability to answer routinely asked
business queries such as computing the percentage change in values between this month and a year ago or to
26
OLAP Extensions to SQL
Answer is ANSI adopted a set of
OLAP functions as an extension to SQL to enable these calculations as well as many others that used to
be impossible or even impractical within SQL.
IBM and Oracle jointly proposed
these extensions early in 1999 and they now form part of the current SQL standard, namely SQL: 2003.
OLAP Extensions to SQL
-RISQL
The extensions are collectively referred to as the ‘OLAP package’ and are described as follows:
Feature T431, ‘Extended Grouping capabilities’
SELECT..GROUP BY ROLLUP(columnlist) SELECT..GROUP BY CUBE(columnlist)
Feature T611, ‘Extended OLAP operators’
RANK() OVER (ORDER BY columnlist)
DENSE_RANK() OVER (ORDER BY columnlist)
28
Extended Grouping
Capabilities
Aggregation is a fundamental part of
OLAP. To improve aggregation capabilities the SQL standard
provides extensions to the GROUP BY clause such as the ROLLUP and CUBE functions.
Extended Grouping
Capabilities
ROLLUP supports calculations using
aggregations such as SUM, COUNT, MAX, MIN, and AVG at increasing levels of
aggregation, from the most detailed up to a grand total.
CUBE is similar to ROLLUP, enabling a
single statement to calculate all possible combinations of aggregations. CUBE can generate the information needed in cross-tabulation reports with a single query.
30
Extended Grouping
Capabilities
ROLLUP and CUBE extensions specify
exactly the groupings of interest in the GROUP BY clause and produces a single result set that is equivalent to a UNION ALL of differently grouped rows.
Extended Grouping
Capabilities
ROLLUP Extension to GROUP BY
enables a SELECT statement to calculate
multiple levels of subtotals across a specified group of dimensions. ROLLUP appears in the GROUP BY clause in a SELECT statement using the following format:
32
Extended Grouping
Capabilities
ROLLUP creates subtotals that roll up from the
most detailed level to a grand total, following a column list specified in the ROLLUP clause.
ROLLUP first calculates the standard
aggregate values specified in the GROUP BY clause and then creates progressively higher level subtotals, moving from right to left
through the column list until finally completing with a grand total.
Extended Grouping
Capabilities
ROLLUP creates subtotals at n + 1 levels, where n is the number of
grouping columns. For instance, if a query specifies ROLLUP on grouping columns of propertyType, yearMonth, and city (n = 3), the result set will
34
Example - Using the ROLLUP
Group Function
Show the totals for sales of flats or
houses by branch offices located in
Aberdeen, Edinburgh, or Glasgow
for the months of September and
October of 2004.
Example - Using the
ROLLUP Group Function
SELECT propertyType, yearMonth, city, SUM(saleAmount) AS sales
FROM Branch, PropertyFor Sale, PropertySale
WHERE Branch.branchNo = PropertySale.branchNo AND PropertyForSale.propertyNo =
PropertySale.propertyNo
AND PropertySale.yearMonth IN ('2004-08', '2004-09') AND Branch.city IN (‘Aberdeen’, ‘Edinburgh’,
‘Glasgow’)
36
Example - Using the
Extended Grouping
Capabilities
CUBE Extension to GROUP BY
CUBE takes a specified set of grouping
columns and creates subtotals for all of the possible combinations. CUBE
appears in the GROUP BY clause in a SELECT statement using the following format:
38
Extended Grouping
Capabilities
CUBE generates all the subtotals that
could be calculated for a data cube with the specified dimensions.
CUBE can be used in any situation
requiring cross-tabular reports. The data needed for cross-tabular reports can be generated with a single SELECT using
CUBE. Like ROLLUP, CUBE can be helpful in generating summary tables.
Extended Grouping
Capabilities
CUBE is typically most suitable
in queries that use columns
from multiple dimensions rather than columns representing
different levels of a single dimension.
40
Example - Using the CUBE
Group Function
Show all possible subtotals for
sales of properties by branches
offices in Aberdeen, Edinburgh,
and Glasgow for the months of
September and October of 2004.
Example - Using the CUBE
Group Function
SELECT propertyType, yearMonth, city, SUM(saleAmount) AS sales
FROM Branch, PropertyFor Sale, PropertySale WHERE Branch.branchNo = PropertySale.branchNo AND PropertyForSale.propertyNo = PropertySale.propertyNo AND PropertySale.yearMonth IN ('2004-08', '2004-09')
AND Branch.city IN (‘Aberdeen’, ‘Edinburgh’, ‘Glasgow’)
42
Example - Using the CUBE Group Function
Elementary OLAP Operators
Supports a variety of operations such as rankings and window calculations. Ranking functions include
cumulative distributions, percent rank, and N-tiles.
Windowing allows the calculation of cumulative and moving aggregations using functions such as SUM, AVG, MIN, and COUNT.
44
Elementary OLAP Operators
Ranking Functions
Computes the rank of a record compared
to other records in the dataset based on the values of a set of measures. There are various types of ranking functions, including RANK and DENSE_RANK. The syntax for each ranking function is:
RANK( ) OVER (ORDER BY columnList) DENSE_RANK( ) OVER (ORDER BY
Elementary OLAP Operators
The difference between RANK
and DENSE_RANK is that
DENSE_RANK leaves no gaps in the sequential ranking sequence when there are ties for a
46
Example - Using the RANK
and DENSE_RANK Functions
Rank the total sales of properties for
branch offices in Edinburgh.
SELECT branchNo, SUM(saleAmount) AS sales,
RANK() OVER (ORDER BY SUM(saleAmount)) DESC AS ranking,
DENSE_RANK() OVER (ORDER BY
SUM(saleAmount)) DESC AS dense_ranking
FROM Branch, PropertySale WHERE Branch.branchNo =
PropertySale.branchNo
AND Branch.city = ‘Edinburgh’
Example - Using the RANK
and DENSE_RANK Functions
48
Elementary OLAP Operators
Supports a variety of operations such as rankings and window calculations. Ranking functions include
cumulative distributions, percent rank, and N-tiles.
Windowing allows the calculation of cumulative and moving aggregations using functions such as SUM, AVG, MIN, and COUNT.
Elementary OLAP Operators
Windowing Calculations
Can be used to compute
cumulative, moving, and centered aggregates. They return a value for each row in the table, which depends on other rows in the
50
Elementary OLAP Operators
Windowing Calculations
Can be used to compute cumulative,
moving, and centered aggregates. They return a value for each row in the table, which depends on other rows in the
corresponding window.
These aggregate functions provide
access to more than one row of a table without a self-join and can be used only in the SELECT and ORDER BY clauses of the query.
Example - Using Windowing
Calculations
Show the monthly figures and
three-month moving averages
and sums for property sales at
branch office B003 for the first
six months of 2004.
52
Example - Using Windowing
Calculations
SELECT yearMonth, SUM(saleAmount) AS monthlySales, AVG(SUM(saleAmount)) OVER (ORDER BY yearMonth, ROWS 2
PRECEDING) AS 3-month moving avg,
SUM(SUM(salesAmount)) OVER (ORDER BY yearMonth ROWS 2 PRECEDING)
AS 3-month moving sum FROM PropertySale
WHERE branchNo = ‘B003’
AND yearMonth BETWEEN ('2004-01' AND '2004-06’)
GROUP BY yearMonth ORDER BY yearMonth;
Example - Using
54
Storage Comparison of
MOLAP & ROLAP
CubeItems
Locations
Tim es
Sales Tim_ID Item_ID L_ID Sales
1 1 1 1795 1 2 1 1006 1 3 1 993 1 1 2 1237 1 2 2 1059 1 3 2 998 1 1 3 987 1 2 3 689 1 3 3 589 2 1 1 963 2 2 1 789 Time_ID Year Item_ID ItemName Location_ID Location 1 2005 2 2006 3 2007 1 Game 2 PC 3 Phone 1 DC 2 NY 3 SF Dimension tables
Storage Comparison of
MOLAP & ROLAP
Time_ID Year Quarter month
Dimension tables ROLAP MOLAP 50 rows Year 2005 2006 Quarter 1 1 Month 1 2
Class Elect Elect Subclass Home Home
Name PC PC
Country USA USA
L_name NY NY
Sales 1003 987
Item_ID Class subclass Name 50
L_ID Country L_Name
50
Fact table
Time_ID Item_ID L_ID Sales