Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Online analytical processing
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Overview of OLAP systems == At the core of any OLAP system is an [[OLAP cube]] (also called a 'multidimensional cube' or a [[hypercube]]). It consists of numeric facts called ''measures'' that are categorized by ''[[Dimension (data warehouse)|dimensions]]''. The measures are placed at the intersections of the hypercube, which is spanned by the dimensions as a [[vector space]]. The usual interface to manipulate an OLAP cube is a matrix interface, like [[Pivot table]]s in a spreadsheet program, which performs projection operations along the dimensions, such as aggregation or averaging. The cube metadata is typically created from a [[star schema]] or [[snowflake schema]] or [[fact constellation]] of tables in a [[relational database]]. Measures are derived from the records in the [[fact table]] and dimensions are derived from the [[dimension table]]s. Each ''measure'' can be thought of as having a set of ''labels'', or meta-data associated with it. A ''dimension'' is what describes these ''labels''; it provides information about the ''measure''. A simple example would be a cube that contains a store's sales as a ''measure'', and Date/Time as a ''dimension''. Each Sale has a Date/Time ''label'' that describes more about that sale. For example: Sales Fact Table +-------------+----------+ | sale_amount | time_id | +-------------+----------+ Time Dimension | 930.10| 1234 |----+ +---------+-------------------+ +-------------+----------+ | | time_id | timestamp | | +---------+-------------------+ +---->| 1234 | 20080902 12:35:43 | +---------+-------------------+ === Multidimensional databases === Multidimensional structure is defined as "a variation of the relational model that uses multidimensional structures to organize data and express the relationships between data".<ref name="OBrien"/>{{rp|177}} The structure is broken into cubes and the cubes are able to store and access data within the confines of each cube. "Each cell within a multidimensional structure contains aggregated data related to elements along each of its dimensions".<ref name="OBrien"/>{{rp|178}} Even when data is manipulated it remains easy to access and continues to constitute a compact database format. The data still remains interrelated. Multidimensional structure is quite popular for analytical databases that use online analytical processing (OLAP) applications.<ref name="OBrien"/> Analytical databases use these databases because of their ability to deliver answers to complex business queries swiftly. Data can be viewed from different angles, which gives a broader perspective of a problem unlike other models.<ref>Williams, C., Garza, V.R., Tucker, S, Marcus, A.M. (1994, January 24). Multidimensional models boost viewing options. InfoWorld, 16(4)</ref> === Aggregations === It has been claimed that for complex queries OLAP cubes can produce an answer in around 0.1% of the time required for the same query on [[OLTP]] relational data.<ref>{{cite web | author=MicroStrategy, Incorporated | year=1995 | title=The Case for Relational OLAP | url=http://www.cs.bgu.ac.il/~onap052/uploads/Seminar/Relational%20OLAP%20Microstrategy.pdf | access-date=2008-03-20 }}</ref><ref>{{cite journal |author1=Surajit Chaudhuri |author2=Umeshwar Dayal |name-list-style=amp | title = An overview of data warehousing and OLAP technology | journal = SIGMOD Rec. | volume = 26 | issue = 1 | year = 1997 | pages = 65 | doi = 10.1145/248603.248616 |citeseerx=10.1.1.211.7178 |s2cid=8125630 }}</ref> The most important mechanism in OLAP which allows it to achieve such performance is the use of ''aggregations''. Aggregations are built from the fact table by changing the granularity on specific dimensions and aggregating up data along these dimensions, using an [[aggregate function]] (or ''aggregation function''). The number of possible aggregations is determined by every possible combination of dimension granularities. The combination of all possible aggregations and the base data contains the answers to every query which can be answered from the data.<ref>{{cite journal | last1 = Gray | first1 = Jim | author1-link = Jim Gray (computer scientist) | last2 = Chaudhuri | first2 = Surajit | last3 = Layman | first3 = Andrew | last4 = Reichart | first4 = Don | last5 = Venkatrao | first5 = Murali | last6 = Pellow | first6 = Frank | last7 = Pirahesh | first7 = Hamid | title = Data Cube: {A} Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals | journal = J. Data Mining and Knowledge Discovery | volume = 1 | issue = 1 | pages = 29β53 | year = 1997 | url = http://citeseer.ist.psu.edu/gray97data.html | access-date=2008-03-20 | doi = 10.1023/A:1009726021843 | arxiv = cs/0701155 | s2cid = 12502175 }}</ref> Because usually there are many aggregations that can be calculated, often only a predetermined number are fully calculated; the remainder are solved on demand. The problem of deciding which aggregations (views) to calculate is known as the view selection problem. View selection can be constrained by the total size of the selected set of aggregations, the time to update them from changes in the base data, or both. The objective of view selection is typically to minimize the average time to answer OLAP queries, although some studies also minimize the update time. View selection is [[NP-complete]]. Many approaches to the problem have been explored, including [[greedy algorithm]]s, randomized search, [[genetic algorithm]]s and [[A* search algorithm]]. Some aggregation functions can be computed for the entire OLAP cube by [[precomputing]] values for each cell, and then computing the aggregation for a roll-up of cells by aggregating these aggregates, applying a [[divide and conquer algorithm]] to the multidimensional problem to compute them efficiently.{{sfn|Zhang|2017|p=1}} For example, the overall sum of a roll-up is just the sum of the sub-sums in each cell. Functions that can be decomposed in this way are called [[decomposable aggregation function]]s, and include <code>COUNT, MAX, MIN,</code> and <code>SUM</code>, which can be computed for each cell and then directly aggregated; these are known as self-decomposable aggregation functions.{{sfn|Jesus|Baquero|Almeida|2011|loc=2.1 Decomposable functions, pp. 3β4}} In other cases, the aggregate function can be computed by computing auxiliary numbers for cells, aggregating these auxiliary numbers, and finally computing the overall number at the end; examples include <code>AVERAGE</code> (tracking sum and count, dividing at the end) and <code>RANGE</code> (tracking max and min, subtracting at the end). In other cases, the aggregate function cannot be computed without analyzing the entire set at once, though in some cases approximations can be computed; examples include <code>DISTINCT COUNT, MEDIAN,</code> and <code>MODE</code>; for example, the median of a set is not the median of medians of subsets. These latter are difficult to implement efficiently in OLAP, as they require computing the aggregate function on the base data, either computing them online (slow) or precomputing them for possible rollouts (large space).
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)