Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Distributed database
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|A database in which data is stored across different physical locations.}} {{multiple issues| {{more citations needed|date=August 2010}} {{more footnotes needed|date=April 2013}} }} A '''distributed database''' is a [[database]] in which data is stored across different physical locations.<ref>{{cite web|url=http://www.its.bldrdoc.gov/fs-1037/dir-012/_1750.htm|title=Definition: distributed database|author=|date=|website=www.its.bldrdoc.gov}}</ref> It may be stored in multiple [[computers]] located in the same physical location (e.g. a data centre); or maybe dispersed over a [[computer network|network]] of interconnected computers. Unlike [[Parallel computing|parallel systems]], in which the processors are tightly coupled and constitute a single database system, a distributed database system consists of loosely coupled sites that share no physical components. System administrators can distribute collections of data (e.g. in a database) across multiple physical locations. A distributed database can reside on organised [[network servers]] or [[blockchain (database)|decentralised independent computers]] on the [[Internet]], on corporate [[intranets]] or [[extranets]], or on other organisation [[Computer network|networks]]. Because distributed databases store data across multiple computers, distributed databases may improve performance at [[end-user]] worksites by allowing transactions to be processed on many machines, instead of being limited to one.<ref name="O'Brien"> O'Brien, J. & Marakas, G.M.(2008) Management Information Systems (pp. 185-189). New York, NY: McGraw-Hill Irwin</ref> Two processes ensure that the distributed databases remain up-to-date and current: [[Replication (computing)|replication]]<ref>{{Cite journal |last1=Ozsu |first1=M.T. |last2=Valduriez |first2=P. |date=1991 |title=Distributed database systems: where are we now? |url=https://ieeexplore.ieee.org/document/84879 |journal=Computer |volume=24 |issue=8 |pages=68β78 |doi=10.1109/2.84879 |s2cid=5898169 |issn=1558-0814|url-access=subscription }}</ref> and [[Data transmission|duplication]]. # Replication involves using specialized software that looks for changes in the distributive database. Once the changes have been identified, the replication process makes all the databases look the same. The replication process can be complex and time-consuming, depending on the size and number of the distributed databases. This process can also require much time and computer resources. # Duplication, on the other hand, has less complexity. It identifies one database as a [[master/slave (technology)|master]] and then duplicates that database. The duplication process is normally done at a set time after hours. This is to ensure that each distributed location has the same data. In the duplication process, users may change only the master database. This ensures that local data will not be overwritten. Both replication and duplication can keep the data current in all distributive locations.<ref name="O'Brien" /> Besides distributed database replication and fragmentation, there are many other distributed database design technologies. For example, local autonomy, synchronous, and asynchronous distributed database technologies. The implementation of these technologies can and do depend on the needs of the business and the sensitivity/[[confidentiality]] of the data stored in the database and the price the business is willing to spend on ensuring [[data security]], [[data consistency|consistency]] and [[data integrity|integrity]]. When discussing access to distributed databases, [[Microsoft]] favors the term '''distributed query''', which it defines in protocol-specific manner as "[a]ny SELECT, INSERT, UPDATE, or DELETE statement that references tables and rowsets from one or more external OLE DB data sources".<ref> {{cite web |url = https://technet.microsoft.com/en-us/library/cc966484.aspx |title = TechNet Glossary |date = 28 January 2010 |publisher = Microsoft |accessdate = 2013-07-16 |quote = distributed query[:] Any SELECT, INSERT, UPDATE, or DELETE statement that references tables and rowsets from one or more external OLE DB data sources. }} </ref> [[Oracle Database|Oracle]] provides a more language-centric view in which distributed queries and [[distributed transaction]]s form part of '''distributed SQL'''.<ref> {{cite web |url = http://docs.oracle.com/cd/E11882_01/server.112/e25789/toc.htm |title = Oracle Database Concepts, 11g Release 2 (11.2) |last1 = Ashdown |first1 = Lance |last2 = Kyte |first2 = Tom |date = September 2011 |publisher = Oracle Corporation |accessdate = 2013-07-17 |quote = Distributed SQL synchronously accesses and updates data distributed among multiple databases. [...] Distributed SQL includes distributed queries and distributed transactions. |url-status = dead |archiveurl = https://web.archive.org/web/20130715001716/http://docs.oracle.com/cd/E11882_01/server.112/e25789/toc.htm |archivedate = 2013-07-15 }} </ref> == Architecture == There are 3 main architecture types for distributed databases: * [[Shared-memory architecture|Shared-memory]]: very rarely used<ref name=":0">{{Cite web |last=Garrod |first=Charlie |date=2023 |title=Lecture #21: Introduction to Distributed Databases |url=https://15445.courses.cs.cmu.edu/spring2023/notes/21-distributed.pdf |access-date=2023-03-12 |website=Carnegie Mellon University - School of Computer Science}}</ref> * [[Shared-disk architecture|Shared-disk]] * [[Shared-nothing architecture|Shared-nothing]] In the shared-memory and shared-disk architectures, the data is not [[Partition (database)|partitioned]], but it has to be in a shared-nothing architecture. Shared-disk architecture is more common for [[Cloud database|cloud databases]] than for on-premise.<ref name=":0" /> Historically, shared-nothing was the first architecture to be implemented on the cloud, before the advent of shared cloud storage made shared-disk possible. In practice, different layers of the database can have different architectures. It is now common to have a compute layer with a shared nothing architecture, and a storage layer with a shared disk architecture. This is for instance the case of [[Snowflake Inc.|Snowflake]]<ref>{{Cite web |last=Kaushik |first=Arun |date=2020-02-14 |title=What Makes Snowflake So Powerful β It's the Hybrid of Shared Disk and Shared Nothing Architecture |url=https://medium.com/@a.kaushik5587/what-makes-snowflake-so-powerful-its-the-hybrid-of-shared-disk-and-shared-nothing-architecture-5b4fa8f039fa |access-date=2024-03-12 |website=Medium |language=en}}</ref> and [[Amazon Aurora|AWS Aurora]].<ref>{{Cite web |last1=Brahmadesam |first1=Murali |last2=Ternstrom |first2=Tobias |date=2019 |title=Amazon Aurora storage demystified: How it all works |url=https://d1.awsstatic.com/events/reinvent/2019/REPEAT_Amazon_Aurora_storage_demystified_How_it_all_works_DAT309-R.pdf |access-date=2024-03-12}}</ref> === List of shared-nothing databases === * [[IBM Db2]] * [[Greenplum]] * [[Netezza]] * [[Teradata]] * [[TiDB]] * [[Vertica]] === List of shared-disk databases === * [[Amazon Aurora|AWS Aurora]] * Neon * [[Snowflake Inc.|Snowflake]] ==See also== *[[Centralized database]] *[[Data grid]] *[[Distributed cache]] *[[Distributed data store]] *[[Distributed hash table]] *[[Routing protocol]] *[[Distributed SQL]] ==References== {{Reflist|30em}} == Further reading == *M. T. Γzsu and P. Valduriez, ''Principles of Distributed Databases'' (3rd edition) (2011), Springer, {{ISBN|978-1-4419-8833-1}} *Elmasri and Navathe, ''Fundamentals of database systems'' (3rd edition), Addison-Wesley Longman, {{ISBN|0-201-54263-3}} *''Oracle Database Administrator's Guide 10g'' (Release 1), http://docs.oracle.com/cd/B14117_01/server.101/b10739/ds_concepts.htm {{Databases}} {{Authority control}} [[Category:Data management]] [[Category:Types of databases]] [[Category:Distributed computing architecture]] [[Category:Applications of distributed computing]] [[Category:Database management systems]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Authority control
(
edit
)
Template:Cite journal
(
edit
)
Template:Cite web
(
edit
)
Template:Databases
(
edit
)
Template:ISBN
(
edit
)
Template:Multiple issues
(
edit
)
Template:Reflist
(
edit
)
Template:Short description
(
edit
)