Editing Associative array (section)

==Implementation==

For dictionaries with very few mappings, it may make sense to implement the dictionary using an [[association list]], which is a [[linked list]] of mappings. With this implementation, the time to perform the basic dictionary operations is linear in the total number of mappings. However, it is easy to implement and the constant factors in its running time are small.<ref name="gt"/><ref>{{cite web |url=http://www.faqs.org/faqs/lisp-faq/part2/section-2.html
|title=When should I use a hash table instead of an association list?
|publisher=lisp-faq/part2
|date=1996-02-20}}</ref>

Another very simple implementation technique, usable when the keys are restricted to a narrow range, is direct addressing into an array: the value for a given key ''k'' is stored at the array cell ''A''[''k''], or if there is no mapping for ''k'' then the cell stores a special [[sentinel value]] that indicates the lack of a mapping. This technique is simple and fast, with each dictionary operation taking constant time. However, the space requirement for this structure is the size of the entire keyspace, making it impractical unless the keyspace is small.<ref name="clrs" />

The two major approaches for implementing dictionaries are a [[hash table]] or a [[search tree]].<ref name="gt"/><ref name="ms"/><ref name="clrs"/><ref name="dietzfelbinger"/>

=== Hash table implementations ===
{{main | Hash table}}
[[File:Hash table average insertion time.png|thumb|right|362px|This graph compares the average number of [[CPU cache]] misses required to look up elements in large hash tables (far exceeding size of the cache) with chaining and [[linear probing]]. Linear probing performs better due to better [[locality of reference]], though as the table gets full, its performance degrades drastically.]]
The most frequently used general-purpose implementation of an associative array is with a [[hash table]]: an [[Array data structure|array]] combined with a [[hash function]] that separates each key into a separate "bucket" of the array. The basic idea behind a hash table is that accessing an element of an array via its index is a simple, constant-time operation. Therefore, the average overhead of an operation for a hash table is only the computation of the key's hash, combined with accessing the corresponding bucket within the array. As such, hash tables usually perform in O(1) time, and usually outperform alternative implementations.

Hash tables must be able to handle [[hash collision|collisions]]: the mapping by the hash function of two different keys to the same bucket of the array. The two most widespread approaches to this problem are [[separate chaining]] and [[open addressing]].<ref name="gt"/><ref name="ms"/><ref name="clrs"/><ref name="fklm">{{citation|contribution=Pathfinders for associative maps|title=Ext. Abstracts GIS-l 2006|last1=Klammer|first1=F.|author1-link=F. Klammer|last2=Mazzolini|first2=L.|author2-link=L. Mazzolini|publisher=GIS-I|year=2006|pages=71–74}}.</ref> In separate chaining, the array does not store the value itself but stores a [[Pointer (computer programming)|pointer]] to another container, usually an [[association list]], that stores all the values matching the hash. By contrast, in open addressing, if a hash collision is found, the table seeks an empty spot in an array to store the value in a deterministic manner, usually by looking at the next immediate position in the array.

Open addressing has a lower [[cache miss]] ratio than separate chaining when the table is mostly empty. However, as the table becomes filled with more elements, open addressing's performance degrades exponentially. Additionally, separate chaining uses less memory in most cases, unless the entries are very small (less than four times the size of a pointer).

=== Tree implementations ===
{{main | Search tree }}

==== Self-balancing binary search trees ====
Another common approach is to implement an associative array with a [[self-balancing binary search tree]], such as an [[AVL tree]] or a [[red–black tree]].<ref>
Joel Adams and Larry Nyhoff.
[http://cs.calvin.edu/books/c++/intro/3e/WebItems/Ch15-Web/STL-Trees.pdf "Trees in STL"].
Quote:

"The Standard Template library ... some of its containers -- the set<T>, map<T1, T2>, multiset<T>, and multimap<T1, T2> templates -- are generally built using a special kind of ''self-balancing binary search tree'' called a ''red–black tree''."
</ref>

Compared to hash tables, these structures have both strengths and weaknesses. The worst-case performance of self-balancing binary search trees is significantly better than that of a hash table, with a time complexity in [[big O notation]] of O(log ''n''). This is in contrast to hash tables, whose worst-case performance involves all elements sharing a single bucket, resulting in O(''n'') time complexity. In addition, and like all binary search trees, self-balancing binary search trees keep their elements in order. Thus, traversing its elements follows a least-to-greatest pattern, whereas traversing a hash table can result in elements being in seemingly random order. Because they are in order, tree-based maps can also satisfy range queries (find all values between two bounds) whereas a hashmap can only find exact values. However, hash tables have a much better average-case time complexity than self-balancing binary search trees of O(1), and their worst-case performance is highly unlikely when a good [[hash function]] is used.

A self-balancing binary search tree can be used to implement the buckets for a hash table that uses separate chaining. This allows for average-case constant lookup, but assures a worst-case performance of O(log ''n''). However, this introduces extra complexity into the implementation and may cause even worse performance for smaller hash tables, where the time spent inserting into and balancing the tree is greater than the time needed to perform a [[linear search]] on all elements of a linked list or similar data structure.<ref name="knuth">{{cite book| first=Donald |last=Knuth |author1-link=Donald Knuth| title = The Art of Computer Programming| volume = 3: ''Sorting and Searching''| edition = 2nd| publisher = Addison-Wesley| year = 1998| isbn = 0-201-89685-0| pages = 513–558}}</ref><ref>{{cite web |url=https://schani.wordpress.com/2010/04/30/linear-vs-binary-search/ |title=Linear vs Binary Search |last=Probst |first=Mark |date=2010-04-30 |access-date=2016-11-20 }}</ref>

==== Other trees ====
Associative arrays may also be stored in unbalanced [[binary search tree]]s or in data structures specialized to a particular type of keys such as [[radix tree]]s, [[trie]]s, [[Judy array]]s, or [[van Emde Boas tree]]s, though the relative performance of these implementations varies. For instance, Judy trees have been found to perform less efficiently than hash tables, while carefully selected hash tables generally perform more efficiently than adaptive radix trees, with potentially greater restrictions on the data types they can handle.<ref>{{Cite book|last1=Alvarez|first1=Victor|last2=Richter|first2=Stefan|last3=Chen|first3=Xiao|last4=Dittrich|first4=Jens|title=2015 IEEE 31st International Conference on Data Engineering |chapter=A comparison of adaptive radix trees and hash tables |date=April 2015|location=Seoul, South Korea|publisher=IEEE|pages=1227–1238|doi=10.1109/ICDE.2015.7113370|isbn=978-1-4799-7964-6|s2cid=17170456}}</ref> The advantages of these alternative structures come from their ability to handle additional associative array operations, such as finding the mapping whose key is the closest to a queried key when the query is absent in the set of mappings.

=== Comparison ===
{| class="wikitable"
! rowspan="2" | Underlying data structure
! colspan="2" | Lookup or Removal
! colspan="2" | Insertion
! rowspan="2" | Ordered
|-
! average
! worst case
! average
! worst case
|-
! [[Hash table]]
|style="background:#ddffdd"|O(1)
|style="background:#ffdddd"|O(''n'')
|style="background:#ddffdd"|O(1)
|style="background:#ffdddd"|O(''n'')
|{{no}}
|-
! [[Self-balancing binary search tree]]
|style="background:#ffffdd"|O(log ''n'')
|style="background:#ffffdd"|O(log ''n'')
|style="background:#ffffdd"|O(log ''n'')
|style="background:#ffffdd"|O(log ''n'')
|{{yes}}
|-
! unbalanced [[binary search tree]]
|style="background:#ffffdd"|O(log ''n'')
|style="background:#ffdddd"|O(''n'')
|style="background:#ffffdd"|O(log ''n'')
|style="background:#ffdddd"|O(''n'')
|{{yes}}
|-
! Sequential container of [[attribute–value pair|key–value pair]]s<br />(e.g. [[association list]])
|style="background:#ffdddd"|O(''n'')
|style="background:#ffdddd"|O(''n'')
|style="background:#ddffdd"|O(1)
|style="background:#ddffdd"|O(1)
|{{no}}
|}