Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Data modeling
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Topics == === Data models === {{main|Data model}} [[File:3-4 Data model roles.svg|thumb|320px|How data models deliver benefit.<ref name="MW99"/>]] Data models provide a framework for [[data]] to be used within [[information system]]s by providing specific definitions and formats. If a data model is used consistently across systems then compatibility of data can be achieved. If the same data structures are used to store and access data then different applications can share data seamlessly. The results of this are indicated in the diagram. However, systems and interfaces are often expensive to build, operate, and maintain. They may also constrain the business rather than support it. This may occur when the quality of the data models implemented in systems and interfaces is poor.<ref name="MW99">Matthew West and Julian Fowler (1999). [https://sites.google.com/site/drmatthewwest/publications/princ03.pdf Developing High Quality Data Models] {{Webarchive|url=https://web.archive.org/web/20200909121755/https://d2024367-a-62cb3a1a-s-sites.googlegroups.com/site/drmatthewwest/publications/princ03.pdf?attachauth=ANoY7crjITgBSUdEyb3UlEOS2OxXk3r-iJk0-S4EfbK3PtqCZvEgcZwvpBiF3VGC7M0IMhTWLZoERz8Otd2Tu5Bquzo4NmuOxyeAzvQa0DZlSIea0KlbnoKFHPK9zM3Pg1p7f2b_OcaIv3_J8mkFK8rMoR_UABqsAM_Pa9wd6qHK1by_hBvYNRPKQZpTM4-rqh1D4x68mcRDzADCED8sFixAn4Nezq0zd_hunEOcJ8m7FSTyRa2xnOA%3D&attredirects=0 |date=September 9, 2020 }}. The European Process Industries STEP Technical Liaison Executive (EPISTLE).</ref> Some common problems found in data models are: * Business rules, specific to how things are done in a particular place, are often fixed in the structure of a data model. This means that small changes in the way business is conducted lead to large changes in computer systems and interfaces. So, business rules need to be implemented in a flexible way that does not result in complicated dependencies, rather the data model should be flexible enough so that changes in the business can be implemented within the data model in a relatively quick and efficient way. * Entity types are often not identified, or are identified incorrectly. This can lead to replication of data, data structure and functionality, together with the attendant costs of that duplication in development and maintenance. Therefore, data definitions should be made as explicit and easy to understand as possible to minimize misinterpretation and duplication. * Data models for different systems are arbitrarily different. The result of this is that complex interfaces are required between systems that share data. These interfaces can account for between 25 and 70% of the cost of current systems. Required interfaces should be considered inherently while designing a data model, as a data model on its own would not be usable without interfaces within different systems. * Data cannot be shared electronically with customers and suppliers, because the structure and meaning of data have not been standardised. To obtain optimal value from an implemented data model, it is very important to define standards that will ensure that data models will both meet business needs and be consistent.<ref name="MW99"/> === Conceptual, logical and physical schemas === [[File:4-2 ANSI-SPARC three level architecture.svg|thumb|320px|The ANSI/SPARC three-level architecture. This shows that a data model can be an external model (or view), a conceptual model, or a physical model. This is not the only way to look at data models, but it is a useful way, particularly when comparing models.<ref name="MW99"/>]] In 1975 [[American National Standards Institute|ANSI]] described three kinds of data-model ''instance'':<ref>American National Standards Institute. 1975. ''ANSI/X3/SPARC Study Group on Data Base Management Systems; Interim Report''. FDT (Bulletin of ACM SIGMOD) 7:2.</ref> * [[Conceptual schema]]: describes the semantics of a domain (the scope of the model). For example, it may be a model of the interest area of an organization or of an industry. This consists of entity classes, representing kinds of things of significance in the domain, and relationship assertions about associations between pairs of entity classes. A conceptual schema specifies the kinds of facts or propositions that can be expressed using the model. In that sense, it defines the allowed expressions in an artificial "language" with a scope that is limited by the scope of the model. Simply described, a conceptual schema is the first step in organizing the data requirements. * [[Logical schema]]: describes the structure of some domain of information. This consists of descriptions of (for example) tables, columns, object-oriented classes, and XML tags. The logical schema and conceptual schema are sometimes implemented as one and the same.<ref name="RS001"/> * [[Physical schema]]: describes the physical means used to store data. This is concerned with partitions, CPUs, [[tablespace]]s, and the like. According to ANSI, this approach allows the three perspectives to be relatively independent of each other. Storage technology can change without affecting either the logical or the conceptual schema. The table/column structure can change without (necessarily) affecting the conceptual schema. In each case, of course, the structures must remain consistent across all schemas of the same data model. === Data modelling process === {{further|Database design}} [[File:Data modeling context.svg|thumb|360px|Data modeling in the context of [[business process]] integration.<ref name="SS93">Paul R. Smith & Richard Sarfaty (1993). [http://www.osti.gov/energycitations/purl.cover.jsp;jsessionid=6192EDBFBAB7DCED13883C55F221221A?purl=/10160331-YhIRrY/ Creating a strategic plan for configuration management using Computer Aided Software Engineering (CASE) tools.] Paper For 1993 National DOE/Contractors and Facilities CAD/CAE User's Group.</ref>]] In the context of [[Business process modeling#Business process integration|business process integration]] (see figure), data modeling complements [[business process modeling]], and ultimately results in database generation.<ref name="SS93"/> The process of designing a database involves producing the previously described three types of schemas – conceptual, logical, and physical. The database design documented in these schemas is converted through a [[Data Definition Language]], which can then be used to generate a database. A fully attributed data model contains detailed attributes (descriptions) for every entity within it. The term "database design" can describe many different parts of the design of an overall [[database system]]. Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the data. In the [[relational model]] these are the [[table (database)|tables]] and [[view (database)|views]]. In an [[object database]] the entities and relationships map directly to object classes and named relationships. However, the term "database design" could also be used to apply to the overall process of designing, not just the base data structures, but also the forms and queries used as part of the overall database application within the [[Database management system|Database Management System]] or DBMS. In the process, system [[Interface (computer science)|interface]]s account for 25% to 70% of the development and support costs of current systems. The primary reason for this cost is that these systems do not share a [[common data model]]. If data models are developed on a system by system basis, then not only is the same analysis repeated in overlapping areas, but further analysis must be performed to create the interfaces between them. Most systems within an organization contain the same basic data, redeveloped for a specific purpose. Therefore, an efficiently designed basic data model can minimize rework with minimal modifications for the purposes of different systems within the organization<ref name="MW99"/> === Modeling methodologies === {{See also|Model-driven engineering}} Data models represent information areas of interest. While there are many ways to create data models, according to [[Len Silverston]] (1997)<ref name="SIG97">Len Silverston, W.H.Inmon, Kent Graziano (2007). ''The Data Model Resource Book''. Wiley, 1997. {{ISBN|0-471-15364-8}}. Reviewed by [http://www.tdan.com/view-book-reviews/5593 Van Scott on tdan.com]. Accessed November 1, 2008.</ref> only two modeling methodologies stand out, top-down and bottom-up: * Bottom-up models or View Integration models are often the result of a [[reengineering (software)|reengineering]] effort. They usually start with existing data structures forms, fields on application screens, or reports. These models are usually physical, application-specific, and incomplete from an [[enterprise architecture|enterprise perspective]]. They may not promote data sharing, especially if they are built without reference to other parts of the organization.<ref name="SIG97"/> * Top-down [[logical data model]]s, on the other hand, are created in an abstract way by getting information from people who know the subject area. A system may not implement all the entities in a logical model, but the model serves as a reference point or template.<ref name="SIG97"/> Sometimes models are created in a mixture of the two methods: by considering the data needs and structure of an application and by consistently referencing a subject-area model. In many environments, the distinction between a logical data model and a physical data model is blurred. In addition, some [[Computer-aided software engineering|CASE]] tools don't make a distinction between logical and [[physical data model]]s.<ref name="SIG97"/> === Entity–relationship diagrams === {{main|Entity–relationship model}} [[File:B 5 1 IDEF1X Diagram.jpg|thumb|550px|right|Example of an [[IDEF1X]] entity–relationship diagrams used to model IDEF1X itself. The name of the view is mm. The domain hierarchy and constraints are also given. The constraints are expressed as sentences in the formal theory of the meta model.<ref name="FIPS184">[http://www.itl.nist.gov/fipspubs/idef1x.doc FIPS Publication 184] {{Webarchive|url=https://web.archive.org/web/20131203223034/http://www.itl.nist.gov/fipspubs/idef1x.doc |date=December 3, 2013 }} released of IDEF1X by the Computer Systems Laboratory of the National Institute of Standards and Technology (NIST). December 21, 1993.</ref>]] There are several notations for data modeling. The actual model is frequently called "entity–relationship model", because it depicts data in terms of the entities and relationships described in the [[data]].<ref name="WBD04"/> An entity–relationship model (ERM) is an abstract conceptual representation of structured data. Entity–relationship modeling is a relational schema [[database model]]ing method, used in [[software engineering]] to produce a type of [[conceptual schema|conceptual data model]] (or [[semantic data model]]) of a system, often a [[relational database]], and its requirements in a [[Top-down and bottom-up design|top-down]] fashion. These models are being used in the first stage of [[information system]] design during the [[requirements analysis]] to describe information needs or the type of [[information]] that is to be stored in a [[database]]. The [[data model]]ing technique can be used to describe any [[Ontology (computer science)|ontology]] (i.e. an overview and classifications of used terms and their relationships) for a certain [[Domain of discourse|universe of discourse]] i.e. the area of interest. Several techniques have been developed for the design of data models. While these methodologies guide data modelers in their work, two different people using the same methodology will often come up with very different results. Most notable are: * [[Bachman diagram]]s * [[Barker's notation]] * [[Entity–relationship model|Chen's notation]] * [[Data Vault Modeling]] * [[Extended Backus–Naur form]] * [[IDEF1X]] * [[Object-relational mapping]] * [[Object-Role Modeling]] and [[FCO-IM|Fully Communication Oriented Information Modeling]] * [[Relational Model]] * [[Relational Model/Tasmania]] === Generic data modeling === {{main|Generic data model}} [[File:HL7 Reference Information Model.jpg|thumb|320px|Example of a Generic data model.<ref>Amnon Shabo (2006). [http://healthit.hhs.gov/portal/server.pt?open=512&objID=1263&mode=2 Clinical genomics data standards for pharmacogenetics and pharmacogenomics] {{Webarchive|url=https://web.archive.org/web/20090722232240/http://healthit.hhs.gov/portal/server.pt?open=512&objID=1263&mode=2 |date=July 22, 2009 }}.</ref>]] Generic data models are generalizations of conventional [[data model]]s. They define standardized general relation types, together with the kinds of things that may be related by such a relation type. The definition of the generic data model is similar to the definition of a natural language. For example, a generic data model may define relation types such as a 'classification relation', being a [[binary relation]] between an individual thing and a kind of thing (a class) and a 'part-whole relation', being a binary relation between two things, one with the role of part, the other with the role of whole, regardless the kind of things that are related. Given an extensible list of classes, this allows the classification of any individual thing and to specification of part-whole relations for any individual object. By standardization of an extensible list of relation types, a generic data model enables the expression of an unlimited number of kinds of facts and will approach the capabilities of natural languages. Conventional data models, on the other hand, have a fixed and limited domain scope, because the instantiation (usage) of such a model only allows expressions of kinds of facts that are predefined in the model. === Semantic data modeling === {{main|Semantic data model}} The logical data structure of a DBMS, whether hierarchical, network, or relational, cannot totally satisfy the requirements for a conceptual definition of data because it is limited in scope and biased toward the implementation strategy employed by the DBMS. That is unless the semantic data model is implemented in the database on purpose, a choice which may slightly impact performance but generally vastly improves productivity. [[File:A2 4 Semantic Data Models.jpg|thumb|320px|Semantic data models.<ref name="FIPS184"/>]] Therefore, the need to define data from a conceptual view has led to the development of [[semantic data model]]ing techniques. That is, techniques to define the meaning of data within the context of its interrelationships with other data. As illustrated in the figure the real world, in terms of resources, ideas, events, etc., is symbolically defined by its description within physical data stores. A semantic data model is an [[Abstraction (computer science)|abstraction]] which defines how the stored symbols relate to the real world. Thus, the model must be a true representation of the real world.<ref name="FIPS184"/> The purpose of semantic data modeling is to create a structural model of a piece of the real world, called "universe of discourse". For this, three fundamental structural relations are considered: * Classification/instantiation: Objects with some structural similarity are described as instances of classes * Aggregation/decomposition: Composed objects are obtained by joining their parts * Generalization/specialization: Distinct classes with some common properties are reconsidered in a more generic class with the common attributes A semantic data model can be used to serve many purposes, such as:<ref name="FIPS184"/> * Planning of data resources * Building of shareable databases * Evaluation of vendor software * Integration of existing databases The overall goal of semantic data models is to capture more meaning of data by integrating relational concepts with more powerful [[Abstraction (computer science)|abstraction]] concepts known from the [[artificial intelligence]] field. The idea is to provide high-level modeling primitives as integral parts of a data model in order to facilitate the representation of real-world situations.<ref>"Semantic data modeling" In: ''Metaclasses and Their Application''. Book Series Lecture Notes in Computer Science. Publisher Springer Berlin / Heidelberg. Volume Volume 943/1995.</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)