Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Online analytical processing
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Types== OLAP systems have been traditionally categorized using the following taxonomy.<ref name=Pendse2006>{{cite web|url=http://www.olapreport.com/Architectures.htm |title=OLAP architectures |publisher=OLAP Report |author=Nigel Pendse |date=2006-06-27 |access-date=2008-03-17 |url-status=usurped |archive-url=https://web.archive.org/web/20080124155954/http://www.olapreport.com/Architectures.htm |archive-date=January 24, 2008 }}</ref> ===Multidimensional OLAP (MOLAP)=== MOLAP (multi-dimensional online analytical processing) is the classic form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Some MOLAP tools require the [[pre-computation]] and storage of derived data, such as consolidations β the operation known as processing. Such MOLAP tools generally utilize a pre-calculated data set referred to as a [[data cube]]. The data cube contains all the possible answers to a given range of questions. As a result, they have a very fast response to queries. On the other hand, updating can take a long time depending on the degree of pre-computation. Pre-computation can also lead to what is known as data explosion. Other MOLAP tools, particularly those that implement the [[Functional Database Model|functional database model]] do not pre-compute derived data but make all calculations on demand other than those that were previously requested and stored in a cache. '''Advantages of MOLAP''' * Fast query performance due to optimized storage, multidimensional indexing and caching. * Smaller on-disk size of data compared to data stored in [[relational database]] due to compression techniques. * Automated computation of higher-level aggregates of the data. * It is very compact for low dimension data sets. * Array models provide natural indexing. * Effective data extraction achieved through the pre-structuring of aggregated data. '''Disadvantages of MOLAP''' * Within some MOLAP systems the processing step (data load) can be quite lengthy, especially on large data volumes. This is usually remedied by doing only incremental processing, i.e., processing only the data which have changed (usually new data) instead of reprocessing the entire data set. * Some MOLAP methodologies introduce data redundancy. ====Products==== Examples of commercial products that use MOLAP are [[Cognos]] Powerplay, [[Oracle OLAP|Oracle Database OLAP Option]], [[MicroStrategy]], [[Microsoft Analysis Services]], [[Essbase]], [[Applix|TM1]], [[Jedox]], and icCube. ===Relational OLAP (ROLAP)=== '''ROLAP''' works directly with relational databases and does not require pre-computation. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. It depends on a specialized schema design. This methodology relies on manipulating the data stored in the relational database to give the appearance of traditional OLAP's slicing and dicing functionality. In essence, each action of slicing and dicing is equivalent to adding a "WHERE" clause in the SQL statement. ROLAP tools do not use pre-calculated data cubes but instead pose the query to the standard relational database and its tables in order to bring back the data required to answer the question. ROLAP tools feature the ability to ask any question because the methodology is not limited to the contents of a cube. ROLAP also has the ability to drill down to the lowest level of detail in the database. While ROLAP uses a relational database source, generally the database must be carefully designed for ROLAP use. A database which was designed for [[OLTP]] will not function well as a ROLAP database. Therefore, ROLAP still involves creating an additional copy of the data. However, since it is a database, a variety of technologies can be used to populate the database. ==== Advantages of ROLAP ==== <!--Note to editors: Please review the discussion page before making changes to the advantages or disadvantages. Thank you. --> * ROLAP is considered to be more scalable in handling large data volumes, especially models with [[Dimension (data warehouse)|dimensions]] with very high [[cardinality]] (i.e., millions of members). * With a variety of data loading tools available, and the ability to fine-tune the [[extract, transform, load]] (ETL) code to the particular data model, load times are generally much shorter than with the automated [[#Multidimensional_OLAP_.28MOLAP.29|MOLAP]] loads. * The data are stored in a standard [[relational database]] and can be accessed by any [[SQL]] reporting tool (the tool does not have to be an OLAP tool). * ROLAP tools are better at handling ''non-aggregable facts'' (e.g., textual descriptions). [[#Multidimensional_OLAP_.28MOLAP.29|MOLAP]] tools tend to suffer from slow performance when querying these elements. * By [[Decoupling (electronics)|decoupling]] the data storage from the multi-dimensional model, it is possible to successfully model data that would not otherwise fit into a strict dimensional model. * The ROLAP approach can leverage [[database]] authorization controls such as [[row-level security]], whereby the query results are filtered depending on preset criteria applied, for example, to a given user or group of users ([[SQL]] WHERE clause). ==== Disadvantages of ROLAP ==== <!--Note to editors: Please review the discussion page before making changes to the advantages or disadvantages. Thank you. --> * There is a consensus in the industry that ROLAP tools have slower performance than MOLAP tools. However, see the discussion below about ROLAP performance. * The loading of ''aggregate tables'' must be managed by custom [[Extract, transform, load|ETL]] code. The ROLAP tools do not help with this task. This means additional development time and more code to support. * When the step of creating aggregate tables is skipped, the query performance then suffers because the larger detailed tables must be queried. This can be partially remedied by adding additional aggregate tables; however it is still not practical to create aggregate tables for all combinations of dimensions/attributes. * ROLAP relies on the general-purpose database for querying and caching, and therefore several special techniques employed by [[MOLAP]] tools are not available (such as special hierarchical indexing). However, modern ROLAP tools take advantage of latest improvements in [[SQL]] language such as CUBE and ROLLUP operators, DB2 Cube Views, as well as other SQL OLAP extensions. These SQL improvements can mitigate the benefits of the [[MOLAP]] tools. * Since ROLAP tools rely on [[SQL]] for all of the computations, they are not suitable when the model is heavy on calculations which don't translate well into [[SQL]]. Examples of such models include budgeting, allocations, financial reporting and other scenarios. ==== Performance of ROLAP ==== In the OLAP industry ROLAP is usually perceived as being able to scale for large data volumes but suffering from slower query performance as opposed to [[#Multidimensional_OLAP_.28MOLAP.29|MOLAP]]. The {{usurped|1=[https://web.archive.org/web/20040602192842/http://www.olapreport.com/survey.htm OLAP Survey]}}, the largest independent survey across all major OLAP products, being conducted for 6 years (2001 to 2006) have consistently found that companies using ROLAP report slower performance than those using MOLAP even when data volumes were taken into consideration. However, as with any survey there are a number of subtle issues that must be taken into account when interpreting the results. * The survey shows that ROLAP tools have 7 times more users than [[#Multidimensional_OLAP_.28MOLAP.29|MOLAP]] tools within each company. Systems with more users will tend to suffer more performance problems at peak usage times. * There is also a question about complexity of the model, measured both in number of dimensions and richness of calculations. The survey does not offer a good way to control for these variations in the data being analyzed. ==== Downside of flexibility ==== Some companies select ROLAP because they intend to re-use existing relational database tablesβthese tables will frequently not be optimally designed for OLAP use. The superior flexibility of ROLAP tools allows this less-than-optimal design to work, but performance suffers. [[#Multidimensional_OLAP_.28MOLAP.29|MOLAP]] tools in contrast would force the data to be re-loaded into an optimal OLAP design. ===Hybrid OLAP (HOLAP)=== The undesirable trade-off between additional [[Extract, transform, load|ETL]] cost and slow query performance has ensured that most commercial OLAP tools now use a "Hybrid OLAP" (HOLAP) approach, which allows the model designer to decide which portion of the data will be stored in [[#Multidimensional_OLAP_.28MOLAP.29|MOLAP]] and which portion in ROLAP. There is no clear agreement across the industry as to what constitutes "Hybrid OLAP", except that a database will divide data between relational and specialized storage.<ref name="ieee_cite">{{cite journal | last1 = Bach Pedersen | first1 = Torben | last2 = S. Jensen | title = Multidimensional Database Technology | journal = Distributed Systems Online | volume = 34 | issue = 12 | issn = 0018-9162 | pages = 40β46 | date = December 2001 | doi = 10.1109/2.970558 | first2 = Christian }} </ref> For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data. HOLAP addresses the shortcomings of [[#Multidimensional_OLAP_.28MOLAP.29|MOLAP]] and [[#Relational_OLAP_.28ROLAP.29|ROLAP]] by combining the capabilities of both approaches. HOLAP tools can utilize both pre-calculated cubes and relational data sources. ==== Vertical partitioning ==== In this mode HOLAP stores ''aggregations'' in [[#Multidimensional_OLAP_.28MOLAP.29|MOLAP]] for fast query performance, and detailed data in [[#Relational_OLAP_.28ROLAP.29|ROLAP]] to optimize time of cube ''processing''. ==== Horizontal partitioning ==== In this mode HOLAP stores some slice of data, usually the more recent one (i.e. sliced by Time dimension) in [[#Multidimensional_OLAP_.28MOLAP.29|MOLAP]] for fast query performance, and older data in [[#Relational_OLAP_.28ROLAP.29|ROLAP]]. Moreover, we can store some dices in [[#Multidimensional_OLAP_.28MOLAP.29|MOLAP]] and others in [[#Relational_OLAP_.28ROLAP.29|ROLAP]], leveraging the fact that in a large cuboid, there will be dense and sparse subregions.<ref>{{cite journal|arxiv=cs/0702143|doi=10.1016/j.ins.2005.09.005 |title=Attribute value reordering for efficient hybrid OLAP |year=2006 |last1=Kaser |first1=Owen |last2=Lemire |first2=Daniel |journal=Information Sciences |volume=176 |issue=16 |pages=2304β2336 }}</ref> ==== Products ==== The first product to provide HOLAP storage was [[Holos (software)|Holos]], but the technology also became available in other commercial products such as [[Microsoft Analysis Services]], [[Oracle OLAP|Oracle Database OLAP Option]], [[MicroStrategy]] and [[SAP AG]] BI Accelerator. The hybrid OLAP approach combines ROLAP and MOLAP technology, benefiting from the greater scalability of ROLAP and the faster computation of MOLAP. For example, a HOLAP server may store large volumes of detailed data in a relational database, while aggregations are kept in a separate MOLAP store. The Microsoft SQL Server 7.0 OLAP Services supports a hybrid OLAP server ===Comparison=== Each type has certain benefits, although there is disagreement about the specifics of the benefits between providers. * Some MOLAP implementations are prone to database explosion, a phenomenon causing vast amounts of storage space to be used by MOLAP databases when certain common conditions are met: high number of dimensions, pre-calculated results and sparse multidimensional data. * MOLAP generally delivers better performance due to specialized indexing and storage optimizations. MOLAP also needs less storage space compared to ROLAP because the specialized storage typically includes [[Data compression|compression]] techniques.<ref name="ieee_cite"/> * ROLAP is generally more scalable.<ref name="ieee_cite"/> However, large volume pre-processing is difficult to implement efficiently so it is frequently skipped. ROLAP query performance can therefore suffer tremendously. * Since ROLAP relies more on the database to perform calculations, it has more limitations in the specialized functions it can use. * HOLAP attempts to mix the best of ROLAP and MOLAP. It can generally pre-process swiftly, scale well, and offer good function support. ===Other types=== The following acronyms are also sometimes used, although they are not as widespread as the ones above: * '''WOLAP''' β Web-based OLAP * '''DOLAP''' β [[Desktop computer|Desktop]] OLAP * '''RTOLAP''' β Real-time OLAP * '''GOLAP''' β Graph OLAP<ref>{{Cite news|url=https://www.datanami.com/2016/12/07/week-graph-entity-analytics/|title=This Week in Graph and Entity Analytics|date=2016-12-07|work=Datanami|access-date=2018-03-08|language=en-US}}</ref><ref>{{Cite news|url=http://www.dbta.com/Editorial/News-Flashes/Cambridge-Semantics-Announces-AnzoGraph-Support-for-Amazon-Neptune-and-Graph-Databases-123280.aspx|title=Cambridge Semantics Announces AnzoGraph Support for Amazon Neptune and Graph Databases|date=2018-02-15|work=Database Trends and Applications|access-date=2018-03-08|language=en-US}}</ref> * '''CaseOLAP''' β Context-aware Semantic OLAP,<ref name = "textcubes">{{cite web |title=Multi-Dimensional, Phrase-Based Summarization in Text Cubes |url=http://sites.computer.org/debull/A16sept/p74.pdf |last1=Tao|first1=Fangbo | last2=Zhuang|first2=Honglei | last3=Yu|first3=Chi Wang| first4=Qi|last4=Wang | first5=Taylor|last5=Cassidy | first6=Lance|last6=Kaplan | first7=Clare|last7=Voss| last8=Han | first8=Jiawei | date=2016}}</ref> developed for biomedical applications.<ref>{{Cite journal|last1=Liem|first1=David A.|last2=Murali|first2=Sanjana|last3=Sigdel|first3=Dibakar|last4=Shi|first4=Yu|last5=Wang|first5=Xuan|last6=Shen|first6=Jiaming|last7=Choi|first7=Howard|last8=Caufield|first8=John H.|last9=Wang|first9=Wei|last10=Ping|first10=Peipei|last11=Han|first11=Jiawei|date=2018-10-01|title=Phrase mining of textual data to analyze extracellular matrix protein patterns across cardiovascular disease|journal=American Journal of Physiology. Heart and Circulatory Physiology|volume=315|issue=4|pages=H910βH924|doi=10.1152/ajpheart.00175.2018|issn=1522-1539|pmid=29775406|pmc=6230912}}</ref> The CaseOLAP platform includes data preprocessing (e.g., downloading, extraction, and parsing text documents), indexing and searching with Elasticsearch, creating a functional document structure called Text-Cube,<ref>{{cite book |last1=Lee |first1=S. |last2=Kim |first2=N. |last3=Kim |first3=J. |title=2014 IEEE Fourth International Conference on Big Data and Cloud Computing |chapter=A Multi-dimensional Analysis and Data Cube for Unstructured Text and Social Media |date=2014 |pages=761β764 |doi=10.1109/BDCloud.2014.117|isbn=978-1-4799-6719-3 |s2cid=229585 }}</ref><ref>{{cite journal |last1=Ding |first1=B. |last2= Lin|first2= X.C.|last3=Han|first3=J.|last4=Zhai| first4=C.|last5=Srivastava|first5= A.|last6=Oza|first6= N.C.|title=Efficient Keyword-Based Search for Top-K Cells in Text Cube |journal=IEEE Transactions on Knowledge and Data Engineering |date=December 2011 |volume=23 |issue=12 |pages=1795β1810 |doi=10.1109/TKDE.2011.34|s2cid=13960227 }}</ref><ref>{{cite book |last1=Ding |first1=B. |last2=Zhao |first2=B. |last3=Lin |first3=C.X. |last4=Han |first4=J. |last5=Zhai |first5=C. |title=2010 IEEE 26th International Conference on Data Engineering (ICDE 2010) |chapter=TopCells: Keyword-based search of top-k aggregated documents in text cube |date=2010 |pages=381β384 |doi=10.1109/ICDE.2010.5447838|isbn=978-1-4244-5445-7 |citeseerx=10.1.1.215.7504 |s2cid=14649087 }}</ref><ref>{{cite book |last1=Lin |first1=C.X. |last2=Ding |first2=B. |last3=Han |first3=K. |last4=Zhu |first4=F. |last5=Zhao |first5=B. |title=2008 Eighth IEEE International Conference on Data Mining |chapter=Text Cube: Computing IR Measures for Multidimensional Text Database Analysis |date=2008 |pages=905β910 |doi=10.1109/icdm.2008.135|isbn=978-0-7695-3502-9 |s2cid=1522480 |url=https://ink.library.smu.edu.sg/sis_research/1008 }}</ref><ref>{{cite book |last1=Liu |first1=X. |last2=Tang |first2=K. |last3=Hancock |first3=J. |last4=Han |first4=J. |last5=Song |first5=M. |last6=Xu |first6=R. |last7=Pokorny |first7=B. |editor1-last=Greenberg |editor1-first=A.M. |editor2-last=Kennedy |editor2-first=W.G. |editor3-last=Bos |editor3-first=N.D. |title= Social Computing, Behavioral-Cultural Modeling and Prediction: 6th International Conference, SBP 2013, Washington, DC, USA, April 2-5, 2013, Proceedings|publisher=Springer |location=Berlin, Heidelberg |isbn=978-3-642-37209-4 |pages=321β330 |edition=7812 |chapter= |date=2013-03-21 }}</ref> and quantifying user-defined phrase-category relationships using the core CaseOLAP algorithm.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)