Editing Distributed hash table (section)

== Structure ==
The structure of a DHT can be decomposed into several main components.<ref>Moni Naor and Udi Wieder. [http://www.wisdom.weizmann.ac.il/~naor/PAPERS/dh.pdf Novel Architectures for P2P Applications: the Continuous-Discrete Approach] {{Webarchive|url=https://web.archive.org/web/20191209032152/http://www.wisdom.weizmann.ac.il/~naor/PAPERS/dh.pdf |date=2019-12-09 }}. Proc. SPAA, 2003.</ref><ref>Gurmeet Singh Manku. [http://www-db.stanford.edu/~manku/phd/index.html Dipsea: A Modular Distributed Hash Table] {{webarchive|url=https://web.archive.org/web/20040910154927/http://www-db.stanford.edu/~manku/phd/index.html |date=2004-09-10 }}. Ph. D. Thesis (Stanford University), August 2004.</ref> The foundation is an abstract [[Keyspace (distributed data store)|keyspace]], such as the set of 160-bit [[string (computer science)|string]]s. A keyspace [[Partition (database)|partitioning]] scheme splits ownership of this keyspace among the participating nodes. An [[overlay network]] then connects the nodes, allowing them to find the owner of any given key in the keyspace.

Once these components are in place, a typical use of the DHT for storage and retrieval might proceed as follows. Suppose the keyspace is the set of 160-bit strings. To index a file with given {{Var serif|filename}} and {{mvar|data}} in the DHT, the [[SHA-1]] hash of {{mvar|filename}} is generated, producing a 160-bit key {{mvar|k}}, and a message {{math|''put''(''k, data'')}} is sent to any node participating in the DHT. The message is forwarded from node to node through the overlay network until it reaches the single node responsible for key {{mvar|k}} as specified by the keyspace partitioning. That node then stores the key and the data. Any other client can then retrieve the contents of the file by again hashing {{mvar|filename}} to produce {{mvar|k}} and asking any DHT node to find the data associated with {{mvar|k}} with a message {{math|''get''(''k'')}}. The message will again be routed through the overlay to the node responsible for {{mvar|k}}, which will reply with the stored {{mvar|data}}.

The keyspace partitioning and overlay network components are described below with the goal of capturing the principal ideas common to most DHTs; many designs differ in the details.

=== Keyspace partitioning ===
Most DHTs use some variant of [[consistent hashing]] or [[rendezvous hashing]] to map keys to nodes. The two algorithms appear to have been devised independently and simultaneously to solve the distributed hash table problem.

Both consistent hashing and rendezvous hashing have the essential property that removal or addition of one node changes only the set of keys owned by the nodes with adjacent IDs, and leaves all other nodes unaffected. Contrast this with a traditional [[hash table]] in which addition or removal of one bucket causes nearly the entire keyspace to be remapped. Since any change in ownership typically corresponds to bandwidth-intensive movement of objects stored in the DHT from one node to another, minimizing such reorganization is required to efficiently support high rates of [[Churn rate|churn]] (node arrival and failure).

==== Consistent hashing ====
{{further|Consistent hashing}}

Consistent hashing employs a function <math>\delta(k_1, k_2)</math> that defines an abstract notion of the distance between the keys <math>k_1</math> and <math>k_2</math>, which is unrelated to geographical distance or [[network latency]]. Each node is assigned a single key called its ''identifier'' (ID). A node with ID <math>i_x</math> owns all the keys <math>k_m</math> for which <math>i_x</math> is the closest ID, measured according to <math>\delta(k_m, i_x)</math>.

For example, the [[Chord (peer-to-peer)|Chord DHT]] uses consistent hashing, which treats nodes as points on a circle, and <math>\delta(k_1, k_2)</math> is the distance traveling clockwise around the circle from <math>k_1</math> to <math>k_2</math>. Thus, the circular keyspace is split into contiguous segments whose endpoints are the node identifiers. If <math>i_1</math> and <math>i_2</math> are two adjacent IDs, with a shorter clockwise distance from <math>i_1</math> to <math>i_2</math>, then the node with ID <math>i_2</math> owns all the keys that fall between <math>i_1</math> and <math>i_2</math>.

==== Rendezvous hashing ====
{{further|Rendezvous hashing}}

In rendezvous hashing, also called highest random weight (HRW) hashing, all clients use the same hash function <math>h()</math> (chosen ahead of time) to associate a key to one of the ''n'' available servers.
Each client has the same list of identifiers {{math|{{mset|''S''<sub>1</sub>, ''S''<sub>2</sub>, ..., ''S''<sub>''n''</sub> }}}}, one for each server.
Given some key ''k'', a client computes ''n'' hash weights {{math|1=''w''<sub>1</sub> = ''h''(''S''<sub>1</sub>, ''k''), ''w''<sub>2</sub> = ''h''(''S''<sub>2</sub>, ''k''), ..., ''w''<sub>''n''</sub> = ''h''(''S''<sub>''n''</sub>, ''k'')}}.
The client associates that key with the server corresponding to the highest hash weight for that key.
A server with ID <math>S_x</math> owns all the keys <math>k_m</math> for which the hash weight <math>h(S_x, k_m)</math> is higher than the hash weight of any other node for that key.

==== Locality-preserving hashing ====
{{further|Locality-preserving hashing}}

Locality-preserving hashing ensures that similar keys are assigned to similar objects. This can enable a more efficient execution of range queries, however, in contrast to using consistent hashing, there is no more assurance that the keys (and thus the load) is uniformly randomly distributed over the key space and the participating peers. DHT protocols such as Self-Chord and Oscar<ref>{{Cite journal|last1=Girdzijauskas|first1=Šarūnas|last2=Datta|first2=Anwitaman|last3=Aberer|first3=Karl|date=2010-02-01|title=Structured overlay for heterogeneous environments|journal=ACM Transactions on Autonomous and Adaptive Systems|volume=5|issue=1|pages=1–25|doi=10.1145/1671948.1671950|s2cid=13218263|issn=1556-4665|url=http://infoscience.epfl.ch/record/134972|access-date=2020-03-12|archive-date=2020-07-12|archive-url=https://web.archive.org/web/20200712230838/https://infoscience.epfl.ch/record/134972|url-status=live}}</ref> address such issues. Self-Chord decouples object keys from peer IDs and sorts keys along the ring with a statistical approach based on the [[swarm intelligence]] paradigm.<ref>{{cite journal |last1=Forestiero |first1=Agostino |last2=Leonardi |first2=Emilio |last3=Mastroianni |first3=Carlo |last4=Meo |first4=Michela |title=Self-Chord: A Bio-Inspired P2P Framework for Self-Organizing Distributed Systems |journal=IEEE/ACM Transactions on Networking |date=October 2010 |volume=18 |issue=5 |pages=1651–1664 |doi=10.1109/TNET.2010.2046745 |s2cid=14797120 |url=http://porto.polito.it/2370172/ |access-date=2019-07-28 |archive-date=2012-07-01 |archive-url=https://web.archive.org/web/20120701163158/http://porto.polito.it/2370172/ |url-status=live }}</ref> Sorting ensures that similar keys are stored by neighbour nodes and that discovery procedures, including [[Range query (data structures)|range queries]], can be performed in logarithmic time. Oscar constructs a navigable [[small-world network]] based on [[random walk]] sampling also assuring logarithmic search time.

=== Overlay network ===
Each node maintains a set of [[Data link|link]]s to other nodes (its ''neighbors'' or [[routing table]]). Together, these links form the overlay network.<ref>{{Citation|last1=Galuba|first1=Wojciech|title=Peer to Peer Overlay Networks: Structure, Routing and Maintenance|date=2009|encyclopedia=Encyclopedia of Database Systems|pages=2056–2061|editor-last=LIU|editor-first=LING|editor-link=Ling Liu (computer scientist)|publisher=Springer US|language=en|doi=10.1007/978-0-387-39940-9_1215|isbn=9780387399409|last2=Girdzijauskas|first2=Sarunas|editor2-last=ÖZSU|editor2-first=M. TAMER}}</ref> A node picks its neighbors according to a certain structure, called the [[network topology|network's topology]].

All DHT topologies share some variant of the most essential property: for any key {{mvar|k}}, each node either has a node ID that owns {{mvar|k}} or has a link to a node whose node ID is ''closer'' to {{mvar|k}}, in terms of the keyspace distance defined above. It is then easy to route a message to the owner of any key {{mvar|k}} using the following [[greedy algorithm]] (that is not necessarily globally optimal): at each step, forward the message to the neighbor whose ID is closest to {{mvar|k}}. When there is no such neighbor, then we must have arrived at the closest node, which is the owner of {{mvar|k}} as defined above. This style of routing is sometimes called [[key-based routing]].

Beyond basic routing correctness, two important constraints on the topology are to guarantee that the maximum number of [[Hop (networking)|hops]] in any route (route length) is low, so that requests complete quickly; and that the maximum number of neighbors of any node (maximum node [[Degree (graph theory)|degree]]) is low, so that maintenance overhead is not excessive. Of course, having shorter routes requires higher [[maximum degree]]. Some common choices for maximum degree and route length are as follows, where {{mvar|n}} is the number of nodes in the DHT, using [[Big O notation]]:

{| class="wikitable"
|-
! Max. degree !! Max route length !! Used in !! Note
|-
| <math>O(1)</math> || <math>O(n)</math> ||  || Worst lookup lengths, with likely much slower lookups times
|-
| <math>O(1)</math> || <math>O(\log n)</math> || [[Koorde]] (with constant degree) || More complex to implement, but acceptable lookup time can be found with a fixed number of connections
|-
| <math>O(\log n)</math> || <math>O(\log n)</math> || [[Chord (peer-to-peer)|Chord]] <br/> [[Kademlia]] <br/> [[Pastry (DHT)|Pastry]] <br/> [[Tapestry (DHT)|Tapestry]] || Most common, but not optimal (degree/route length).  Chord is the most basic version, with Kademlia seeming the most popular optimized variant (should have improved average lookup)
|-
| <math>O(\log n)</math> || <math>O(\log n/\log (\log n))</math> || [[Koorde]] (with optimal lookup) || More complex to implement, but lookups might be faster (have a lower worst case bound)
|-
| <math>O(\sqrt{n})</math> || <math>O(1)</math> ||  || Worst local storage needs, with much communication after any node connects or disconnects
|}

The most common choice, <math>O(\log n)</math> degree/route length, is not optimal in terms of degree/route length tradeoff, but such topologies typically allow more flexibility in choice of neighbors. Many DHTs use that flexibility to pick neighbors that are close in terms of latency in the physical underlying network. In general, all DHTs construct navigable small-world network topologies, which trade-off route length vs. network degree.<ref>{{Cite thesis|url=https://infoscience.epfl.ch/record/130838?ln=en|title=Designing peer-to-peer overlays a small-world perspective|last=Girdzijauskas|first=Sarunas|date=2009|website=epfl.ch|publisher=EPFL|doi=10.5075/epfl-thesis-4327 |access-date=2019-11-11|archive-date=2020-03-03|archive-url=https://web.archive.org/web/20200303182938/https://infoscience.epfl.ch/record/130838?ln=en|url-status=live}}</ref>

Maximum route length is closely related to [[Diameter (graph theory)|diameter]]: the maximum number of hops in any shortest path between nodes. Clearly, the network's worst case route length is at least as large as its diameter, so DHTs are limited by the degree/diameter tradeoff<ref>{{citation |url=http://maite71.upc.es/grup_de_grafs/table_g.html |title=The (Degree, Diameter) Problem for Graphs |publisher=Maite71.upc.es |access-date=2012-01-10 |archive-url=https://web.archive.org/web/20120217054532/http://maite71.upc.es/grup_de_grafs/table_g.html/ |archive-date=2012-02-17 |url-status=dead }}</ref> that is fundamental in [[graph theory]]. Route length can be greater than diameter, since the greedy routing algorithm may not find shortest paths.<ref>Gurmeet Singh Manku, Moni Naor, and Udi Wieder. [http://citeseer.ist.psu.edu/naor04know.html "Know thy Neighbor's Neighbor: the Power of Lookahead in Randomized P2P Networks"] {{Webarchive|url=https://web.archive.org/web/20080420030133/http://citeseer.ist.psu.edu/naor04know.html |date=2008-04-20 }}. Proc. STOC, 2004.</ref>

=== Algorithms for overlay networks ===
Aside from routing, there exist many algorithms that exploit the structure of the overlay network for sending a message to all nodes, or a subset of nodes, in a DHT.<ref>{{cite web|author=[[Ali Ghodsi]]|url=http://www.sics.se/~ali/thesis/|title= Distributed k-ary System: Algorithms for Distributed Hash Tables |archive-url=https://web.archive.org/web/20070522060750/http://www.sics.se/~ali/thesis/ |archive-date=22 May 2007|date=22 May 2007 |url-status=dead}}. KTH-Royal Institute of Technology, 2006.</ref> These algorithms are used by applications to do [[overlay multicast]], range queries, or to collect statistics. Two systems that are based on this approach are Structella,<ref>{{cite journal |last1=Castro |first1=Miguel |last2=Costa |first2=Manuel |last3=Rowstron |first3=Antony |title=Should we build Gnutella on a structured overlay? |journal=ACM SIGCOMM Computer Communication Review |date=1 January 2004 |volume=34 |issue=1 |pages=131 |doi=10.1145/972374.972397 |citeseerx=10.1.1.221.7892 |s2cid=6587291 |url=http://nms.lcs.mit.edu/HotNets-II/papers/structella.pdf |access-date=25 September 2019 |archive-date=14 February 2021 |archive-url=https://web.archive.org/web/20210214011937/http://nms.lcs.mit.edu/HotNets-II/papers/structella.pdf |url-status=live }}</ref> which implements flooding and random walks on a Pastry overlay, and DQ-DHT, which implements a dynamic querying search algorithm over a Chord network.<ref>{{cite journal |last1=Talia |first1=Domenico |last2=Trunfio |first2=Paolo |title=Enabling Dynamic Querying over Distributed Hash Tables |journal=Journal of Parallel and Distributed Computing |date=December 2010 |volume=70 |issue=12 |pages=1254–1265 |doi=10.1016/j.jpdc.2010.08.012 }}</ref>