Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Data warehouse
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Dimensional versus normalized approach for storage of data=== The two most important approaches to store data in a warehouse are dimensional and normalized. The dimensional approach uses a [[star schema]] as proposed by [[Ralph Kimball]]. The normalized approach, also called the [[third normal form]] (3NF) is an entity-relational normalized model proposed by Bill Inmon.<ref>{{Cite journal|last1=Golfarelli|first1=Matteo|last2=Maio|first2=Dario|last3=Rizzi|first3=Stefano|date=1998-06-01|title=The dimensional fact model: a conceptual model for data warehouses|url=https://www.worldscientific.com/doi/abs/10.1142/S0218843098000118|journal=International Journal of Cooperative Information Systems|volume=07|issue=2n03|pages=215–247|doi=10.1142/S0218843098000118|issn=0218-8430|url-access=subscription}}</ref> ====Dimensional approach==== In a [[Star schema|dimensional approach]], [[transaction data]] is partitioned into "facts", which are usually numeric transaction data, and "[[dimension (data warehouse)|dimensions]]", which are the reference information that gives context to the facts. For example, a sales transaction can be broken up into facts such as the number of products ordered and the total price paid for the products, and into dimensions such as order date, customer name, product number, order ship-to and bill-to locations, and salesperson responsible for receiving the order. This dimensional approach makes data easier to understand and speeds up data retrieval.<ref name=":0" /> Dimensional structures are easy for business users to understand because the structure is divided into measurements/facts and context/dimensions. Facts are related to the organization's business processes and operational system, and dimensions are the context about them (Kimball, Ralph 2008). Another advantage is that the dimensional model does not involve a relational database every time. Thus, this type of modeling technique is very useful for end-user queries in data warehouse. The model of facts and dimensions can also be understood as a [[data cube]],<ref>{{cite web| url = http://www2.cs.uregina.ca/~dbd/cs831/notes/dcubes/dcubes.html| title = Introduction to Data Cubes}}</ref> where dimensions are the categorical coordinates in a multi-dimensional cube, the fact is a value corresponding to the coordinates. The main disadvantages of the dimensional approach are: # It is complicated to maintain the integrity of facts and dimensions, loading the data warehouse with data from different operational systems # It is difficult to modify the warehouse structure if the organization changes the way it does business. ====Normalized approach==== In the normalized approach, the data in the warehouse are stored following, to a degree, [[database normalization]] rules. Normalized relational database tables are grouped into ''subject areas'' (for example, customers, products and finance). When used in large enterprises, the result is dozens of tables linked by a web of joins.(Kimball, Ralph 2008). The main advantage of this approach is that it is straightforward to add information into the database. Disadvantages include that, because of the large number of tables, it can be difficult for users to join data from different sources into meaningful information and access the information without a precise understanding of the date sources and the [[data structure]] of the data warehouse. Both normalized and dimensional models can be represented in entity–relationship diagrams because both contain joined relational tables. The difference between them is the degree of normalization. These approaches are not mutually exclusive, and there are other approaches. Dimensional approaches can involve normalizing data to a degree (Kimball, Ralph 2008). In ''Information-Driven Business'',<ref>{{cite book|last=Hillard|first=Robert|title=Information-Driven Business|year=2010|publisher=Wiley|isbn=978-0-470-62577-4}}</ref> [[Robert Hillard (writer)|Robert Hillard]] compares the two approaches based on the information needs of the business problem. He concludes that normalized models hold far more information than their dimensional equivalents (even when the same fields are used in both models) but at the cost of usability. The technique measures information quantity in terms of [[Entropy (information theory)|information entropy]] and usability in terms of the Small Worlds data transformation measure.<ref>{{cite web|url=http://mike2.openmethodology.org/wiki/Small_Worlds_Data_Transformation_Measure |title=Information Theory & Business Intelligence Strategy - Small Worlds Data Transformation Measure - MIKE2.0, the open source methodology for Information Development |publisher=Mike2.openmethodology.org |access-date=2013-06-14}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)