Editing Persistent data structure (section)

==Examples of persistent data structures==
Perhaps the simplest persistent data structure is the [[Linked list|singly linked list]] or ''cons''-based list, a simple list of objects formed by each carrying a [[reference]] to the next in the list. This is persistent because the ''tail'' of the list can be taken, meaning the last ''k'' items for some ''k'', and new nodes can be added in front of it. The tail will not be duplicated, instead becoming shared between both the old list and the new list.  So long as the contents of the tail are immutable, this sharing will be invisible to the program.

Many common reference-based data structures, such as [[red–black tree]]s,<ref name="sarnak2">{{cite journal |author=Neil Sarnak |author2=Robert E. Tarjan |year=1986 |title=Planar Point Location Using Persistent Search Trees |url=http://www.link.cs.cmu.edu/15859-f07/papers/point-location.pdf |journal=Communications of the ACM |volume=29 |issue=7 |pages=669–679 |doi=10.1145/6138.6151  |s2cid=8745316 |author2-link=Robert Tarjan |access-date=2011-04-06 |archive-url=https://web.archive.org/web/20151010204956/http://www.link.cs.cmu.edu/15859-f07/papers/point-location.pdf |archive-date=2015-10-10 |url-status=dead }}</ref> [[Stack (data structure)|stacks]],<ref name="okasaki2">{{Cite journal|author=Chris Okasaki|title=Purely Functional Data Structures (thesis)|url=https://www.cs.cmu.edu/~rwh/theses/okasaki.pdf}}</ref> and [[treap]]s,<ref>{{cite journal |last=Liljenzin |first=Olle |title=Confluently Persistent Sets and Maps|arxiv=1301.3388|bibcode=2013arXiv1301.3388L|year=2013 }}</ref> can easily be adapted to create a persistent version. Some others need slightly more effort, for example: [[Queue (data structure)|queues]], [[Double-ended queue|dequeues]], and extensions including [[min-deque]]s (which have an additional ''O''(1) operation ''min'' returning the minimal element) and [[random access deque]]s (which have an additional operation of random access with sub-linear, most often logarithmic, complexity).

There also exist persistent data structures which use destructive{{clarify|date=June 2013}} operations, making them impossible to implement efficiently in purely functional languages (like Haskell outside specialized monads like state or IO), but possible in languages like C or Java. These types of data structures can often be avoided with a different design. One primary advantage to using purely persistent data structures is that they often behave better in multi-threaded environments.

===Linked lists===
Singly [[linked list]]s are the bread-and-butter data structure in functional languages.<ref name="auto">''This example is taken from Okasaki.  See the bibliography.''</ref> Some [[ML programming language|ML]]-derived languages, like [[Haskell (programming language)|Haskell]], are purely functional because once a node in the list has been allocated, it cannot be modified, only copied,  referenced or destroyed by the [[Garbage collection (computer science)|garbage collector]] when nothing refers to it.  (Note that ML itself is '''not''' purely functional, but supports non-destructive list operations subset, that is also true in the [[Lisp (programming language)|Lisp]] (LISt Processing) functional language dialects like [[Racket (programming language)|Scheme]] and [[Racket (programming language)|Racket]].)

Consider the two lists:
 xs = [0, 1, 2]
 ys = [3, 4, 5]
These would be represented in memory by:

[[File:Purely_functional_list_before.svg]]

where a circle indicates a node in the list (the arrow out representing the second element of the node which is a pointer to another node).

Now concatenating the two lists:
 zs = xs ++ ys
results in the following memory structure:

[[File:Purely_functional_list_after.svg]]

Notice that the nodes in list <code>xs</code> have been copied, but the nodes in <code>ys</code> are shared.  As a result, the original lists (<code>xs</code> and <code>ys</code>) persist and have not been modified.

The reason for the copy is that the last node in <code>xs</code> (the node containing the original value <code>2</code>) cannot be modified to point to the start of <code>ys</code>, because that would change the value of <code>xs</code>.

===Trees===
Consider a [[binary search tree]],<ref name="auto"/> where every node in the tree has the [[Recursion|recursive]] [[Invariant (computer science)|invariant]] that all subnodes contained in the left subtree have a value that is less than or equal to the value stored in the node, and subnodes contained in the right subtree have a value that is greater than the value stored in the node.

For instance, the set of data
 xs = [a, b, c, d, f, g, h]
might be represented by the following binary search tree:

[[File:Purely_functional_tree_before.svg]]

A function which inserts data into the binary tree and maintains the invariant is:<syntaxhighlight lang="ocaml">
 fun insert (x, E) = T (E, x, E)
   | insert (x, s as T (a, y, b)) =
        if x < y then T (insert (x, a), y, b)
        else if x > y then T (a, y, insert (x, b))
        else s
</syntaxhighlight>After executing
 ys = insert ("e", xs)
The following configuration is produced:

[[File:Purely_functional_tree_after.svg]]

Notice two points: first, the original tree (<code>xs</code>) persists.  Second, many common nodes are shared between the old tree and the new tree. Such persistence and sharing is difficult to manage without some form of [[Garbage collection (computer science)|garbage collection]] (GC) to automatically free up nodes which have no live references, and this is why GC is a feature commonly found in [[Functional programming|functional programming languages]].

==== Code ====
GitHub repo containing implementations of persistent BSTs using Fat Nodes, Copy-on-Write, and Path Copying Techniques.

To use the persistent BST implementations, simply clone the repository and follow the instructions provided in the README file.<ref>{{github|DesaultierMAKK/PersistentBST}}</ref>

=== Persistent hash array mapped trie ===
A persistent hash array mapped trie is a specialized variant of a [[hash array mapped trie]] that will preserve previous versions of itself on any updates. It is often used to implement a general purpose persistent map data structure.<ref name=":1">{{Citation|last=BoostCon|title=C++Now 2017: Phil Nash "The Holy Grail!? A Persistent Hash-Array-Mapped Trie for C++"|date=2017-06-13|url=https://www.youtube.com/watch?v=WT9kmIE3Uis |archive-url=https://ghostarchive.org/varchive/youtube/20211221/WT9kmIE3Uis |archive-date=2021-12-21 |url-status=live|access-date=2018-10-22}}{{cbignore}}</ref>

Hash array mapped tries were originally described in a 2001 paper by [[Phil Bagwell]] entitled "Ideal Hash Trees". This paper presented a mutable [[Hash table]] where "Insert, search and delete times are small and constant, independent of key set size, operations are O(1). Small worst-case times for insert, search and removal operations can be guaranteed and misses cost less than successful searches".<ref>{{Cite journal|last=Phil|first=Bagwell|date=2001|title=Ideal Hash Trees|url=https://infoscience.epfl.ch/record/64398|language=en}}</ref> This data structure was then modified by [[Rich Hickey]] to be fully persistent for use in the [[Clojure]] programming language.<ref>{{Cite web|url=https://www.infoq.com/presentations/Are-We-There-Yet-Rich-Hickey|title=Are We There Yet?|website=InfoQ|access-date=2018-10-22}}</ref>

Conceptually, hash array mapped tries work similar to any generic [[Tree (data structure)|tree]] in that they store nodes hierarchically and retrieve them by following a path down to a particular element. The key difference is that Hash Array Mapped Tries first use a [[hash function]] to transform their lookup key into a (usually 32 or 64 bit) integer. The path down the tree is then determined by using slices of the binary representation of that integer to index into a [[Sparse matrix|sparse array]] at each level of the tree. The leaf nodes of the tree behave similar to the buckets used to construct [[hash table]]s and may or may not contain multiple candidates depending on [[hash collision]]s.<ref name=":1" />

Most implementations of persistent hash array mapped tries use a branching factor of 32 in their implementation. This means that in practice while insertions, deletions, and lookups into a persistent hash array mapped trie have a computational complexity of ''O''(log ''n''), for most applications they are effectively constant time, as it would require an extremely large number of entries to make any operation take more than a dozen steps.<ref>{{Cite journal |last1=Steindorfer |first1=Michael J. |last2=Vinju |first2=Jurgen J. |date=2015-10-23 |title=Optimizing hash-array mapped tries for fast and lean immutable JVM collections |journal=ACM SIGPLAN Notices |volume=50 |issue=10 |pages=783–800 |doi=10.1145/2814270.2814312 |s2cid=10317844 |issn=0362-1340|url=https://ir.cwi.nl/pub/24029 }}</ref>