Database myths and legends (Part
7) In this series we're looking at the myths and
legends of the database world; some turn out to be true,
others false. This myth is about why we use OLAP.
If you follow the Inmon model, you use a relational data
warehouse for flexibility and OLAP cubes in the data marts for
the speed. On the other hand, if you follow Kimball, you
simply use OLAP in the core data warehouse. Either way, OLAP
is where you get the incredible query response time that we
need for a good Business Intelligence system. So OLAP is all
about speed.
OK, let's get back to basics for a moment. OLAP stands for
Online Analytical Processing, was originally very well
defined, and is a surprisingly new term. It first appeared a
mere 14 years ago, in a paper entitled Providing OLAP to
User-Analysts: An IT Mandate by E F Codd, S B Codd and C
T Salley, ComputerWorld, July 26 1993.
And yes, E F Codd is the Ted Codd, the Father of
the relational database. After the paper was published it
gained some notoriety because Codd had undertaken consulting
work for Arbour Software (now Hyperion). This was unfortunate
because the paper actively discussed one of Arbour's products,
Essbase. In the end, Computerworld took the unusual step of
retracting the article; nevertheless this paper clearly marks
the start of the term's use. A copy is available on line from
Hyperion here. The paper defines 12 rules for
evaluating OLAP products which are:
- Multi-dimensional conceptual view
- Transparency
- Accessibility
- Consistent reporting performance
- Client-server architecture
- Generic dimensionality
- Dynamic sparse matrix handling
- Multi-user support
- Unrestricted cross-dimensional operations
- Intuitive data manipulation
- Flexible reporting
- Unlimited dimensions and aggregation levels
While Codd never directly says OLAP systems should be fast,
he is clearly very interested in their performance (see rule
4). In addition, almost all OLAP systems do provide a
phenomenal increase in performance over relational systems. So
we can argue from this that the myth is true: OLAP is about
performance.
But it is clear from reading the paper that Codd also sees
the multi-dimensional component of OLAP as essential. Early on
in the paper he says: "This...multi-dimensional conceptual
view appears to be the way most business persons naturally
view their enterprise." And, as you can see, four of the 12
rules directly refer to dimensions, so OLAP is also about the
way users think about, and are allowed to visualise, their
data.
We know that speed is important to OLAP, but exactly how
important is this multi-dimensional aspect?
One easy test of the importance of a property to the
definition of an object is to imagine the object minus that
property. Does it remain essentially the same object without
the property or does the loss turn it into something else? Is
a robin without a red breast still a robin? Are Christians who
loses their faith still Christians? Is OLAP without
multi-dimensionality still OLAP?
Well, imagine a relational data warehouse that is magically
very, very fast. Users can perform any query they like against
it and expect a response time of one second. Would this still
be OLAP? We can be certain that the answer here is "no" for
the simple reason that there is no need for a new term like
OLAP to describe this; what we have here is simply a very fast
relational database. Apart from the speed, it will suffer all
the joys and pains of normal relational databases. It will be
very flexible (it puts no constraints on the queries that can
be posed) but the users will still find it very difficult to
query because, in order to formulate the query, they have to
understand the data structure. Experience suggests that
business users find this very difficult.
So, OLAP without the multi-dimensional structure isn't
OLAP. This is true in the real world of 2007 and it was also
true in Codd's original definition of OLAP. In the paper he
says: "OLAP is the name given to the dynamic enterprise
analysis required to create, manipulate, animate, and
synthesise information from exegetical, contemplative, and
formulaic data analysis models." In other words, OLAP is more
about the data model than the speed.
The problem with the myth is that by focusing on speed it
loses sight of what we are trying to achieve in Business
Intelligence (of which OLAP is a subset). We are trying to
find information in a mass of data. Speed alone (while
eminently desirable) does not provide this; we also need to
layer a framework over the data (Codd's multi-dimensional
conceptual view) to provide an interpretation that users can
understand.
So, the myth is busted. OLAP certainly is about speed but
it isn't all about speed. There is much more to analysis than
rows per second.
Incidentally, this focus within OLAP on the way in which we
think about and view the data is highly relevant to some of
the recent discussions on The Register about
novel approaches to business intelligence.
Take, for example, Kognitio's WX2.
Kognitio has developed technology that allows very rapid
access to relational data (just like the example data
warehouse discussed above). The technology is fascinating and
provides us with another tool that we can add to our armoury
of techniques. It is a great solution for certain classes of
problem. But, since it doesn't provide the multi-dimensional
conceptual view it can never be considered as a substitute for
OLAP.
And, as a final point, albino robins do exist in nature and
are still considered to be robins. As for the non-ecumenical
question; that is probably better left to Father Ted. ®