Editing Persistent data structure (section)

==Techniques for preserving previous versions==

=== Copy-on-write ===
One method for creating a persistent data structure is to use a platform provided ephemeral data structure such as an [[Mutable array|array]] to store the data in the data structure and copy the entirety of that data structure using [[Copy-on-write|copy-on-write semantics]] for any updates to the data structure. This is an inefficient technique because the entire backing data structure must be copied for each write, leading to worst case O(n·m) performance characteristics for m modifications of an array of size n.{{citation needed|date=May 2019}}

===Fat node===
The fat node method is to record all changes made to node fields in the nodes themselves, without erasing old values of the fields. This requires that nodes be allowed to become arbitrarily “fat”. In other words, each fat node contains the same information and [[Pointer (computer programming)|pointer]] fields as an ephemeral node, along with space for an arbitrary number of extra field values. Each extra field value has an associated field name and a version stamp which indicates the version in which the named field was changed to have the specified value. Besides, each fat node has its own version stamp, indicating the version in which the node was created. The only purpose of nodes having version stamps is to make sure that each node only contains one value per field name per version. In order to navigate through the structure, each original field value in a node has a version stamp of zero.

====Complexity of fat node====
With using fat node method, it requires O(1) space for every modification: just store the new data. Each modification takes O(1) additional time to store the modification at the end of the modification history. This is an [[Amortized analysis|amortized time]] bound, assuming modification history is stored in a growable [[Array data structure|array]]. At [[access time]], the right version at each node must be found as the structure is traversed. If "m" modifications were to be made, then each access operation would have O(log m) slowdown resulting from the cost of finding the nearest modification in the array.

===Path copying===
With the path copying method a copy of all nodes is made on the path to any node which is about to be modified. These changes must then be [[Fractional cascading|cascaded]] back through the data structure: all nodes that pointed to the old node must be modified to point to the new node instead. These modifications cause more cascading changes, and so on, until the root node is reached.

====Complexity of path copying====
With m modifications, this costs O(log m) additive [[lookup]] time. Modification time and space are bounded by the size of the longest path in the data structure and the cost of the update in the ephemeral data structure.  In a Balanced Binary Search Tree without parent pointers the worst case modification time complexity is O(log n + update cost). However, in a linked list the worst case modification time complexity is O(n + update cost).

===A combination===
Driscoll, Sarnak, [[Daniel Sleator|Sleator]], [[Robert Tarjan|Tarjan]] came up<ref name=Driscoll /> with a way to combine the techniques of fat nodes and path copying, achieving O(1) access slowdown and O(1) modification space and time complexity.

In each node, one modification box is stored. This box can hold one modification to the node—either a modification to one of the pointers, or to the node's key, or to some other piece of node-specific data—and a timestamp for when that modification was applied. Initially, every node's modification box is empty.

Whenever a node is accessed, the modification box is checked, and its timestamp is compared against the access time. (The access time specifies the version of the data structure being considered.) If the modification box is empty, or the access time is before the modification time, then the modification box is ignored and only the normal part of the node is considered. On the other hand, if the access time is after the modification time, then the value in the modification box is used, overriding that value in the node.

Modifying a node works like this. (It is assumed that each modification touches one pointer or similar field.) If the node's modification box is empty, then it is filled with the modification. Otherwise, the modification box is full. A copy of the node is made, but using only the latest values. The modification is performed directly on the new node, without using the modification box. (One of the new node's fields is overwritten and its modification box stays empty.) Finally, this change is cascaded to the node's parent, just like path copying. (This may involve filling the parent's modification box, or making a copy of the parent recursively. If the node has no parent—it's the root—it is added the new root to a [[sorted array]] of roots.)

With this [[algorithm]], given any time t, at most one modification box exists in the data structure with time t. Thus, a modification at time t splits the tree into three parts: one part contains the data from before time t, one part contains the data from after time t, and one part was unaffected by the modification.

====Complexity of the combination====
Time and space for modifications require amortized analysis. A modification takes O(1) amortized space, and O(1) amortized time. To see why, use a [[Potential method|potential function]] ''ϕ'', where ''ϕ''(T) is the number of full live nodes in T . The live nodes of T are just the nodes that are reachable from the current root at the current time (that is, after the last modification). The full live nodes are the live nodes whose modification boxes are full.

Each modification involves some number of copies, say ''k'', followed by 1 change to a modification box. Consider each of the ''k'' copies. Each costs O(1) space and time, but decreases the potential function by one. (First, the node to be copied must be full and live, so it contributes to the potential function. The potential function will only drop, however, if the old node isn't reachable in the new tree. But it is known that it isn't reachable in the new tree—the next step in the algorithm will be to modify the node's parent to point at the copy. Finally, it is known that the copy's modification box is empty. Thus, replaced a full live node has been replaced with an empty live node, and ''ϕ'' goes down by one.) The final step fills a modification box, which costs O(1) time and increases ''ϕ'' by one.

Putting it all together, the change in ''ϕ'' is Δ''ϕ'' =1 − ''k''. Thus, the algorithm takes O(''k'' +Δ''ϕ'')= O(1) space and O(''k'' +Δ''ϕ'' +1) = O(1) time