Editing Linearizability (section)

== Examples ==

=== Counters ===
To demonstrate the power and necessity of linearizability we will consider a simple counter which different processes can increment.

We would like to implement a counter object which multiple processes can access. Many common systems make use of counters to keep track of the number of times an event has occurred.

The counter object can be accessed by multiple processes and has two available operations.
# Increment - adds 1 to the value stored in the counter, return acknowledgement
# Read - returns the current value stored in the counter without changing it.
We will attempt to implement this counter object using [[shared register]]s.

Our first attempt which we will see is non-linearizable has the following implementation using one shared register among the processes.

==== Non-atomic ====
The naive, non-atomic implementation:

'''Increment:'''
# Read the value in the register R
# Add one to the value
# Writes the new value back into register R
'''Read:'''

Read register R

This simple implementation is not linearizable, as is demonstrated by the following example.

Imagine two processes are running accessing the single counter object initialized to have value 0:
# The first process reads the value in the register as 0.
# The first process adds one to the value, the counter's value should be 1, but before it has finished writing the new value back to the register it may become suspended, meanwhile the second process is running:
# The second process reads the value in the register, which is still equal to 0;
# The second process adds one to the value;
# The second process writes the new value into the register, the register now has value 1.
The second process is finished running and the first process continues running from where it left off:
# The first process writes 1 into the register, unaware that the other process has already updated the value in the register to 1.
In the above example, two processes invoked an increment command, however the value of the object only increased from 0 to 1, instead of 2 as it should have. One of the increment operations was lost as a result of the system not being linearizable.

The above example shows the need for carefully thinking through implementations of data structures and how linearizability can have an effect on the correctness of the system.

==== Atomic ====
To implement a linearizable or atomic counter object we will modify our previous implementation so '''each process P<small>i</small> will use its own register R<small>i</small>'''

Each process increments and reads according to the following algorithm:

'''Increment:'''
# Read value in register R<small>i</small>.
# Add one to the value.
# Write new value back into R<small>i</small>
'''Read:'''
# Read registers R<small>1,</small> R<small>2, ...</small> R<small>n</small>.
# Return the sum of all registers.
This implementation solves the problem with our original implementation. In this system the increment operations are linearized at the write step. The linearization point of an increment operation is when that operation writes the new value in its register R<small>i.</small> The read operations are linearized to a point in the system when the value returned by the read is equal to the sum of all the values stored in each register R<small>i.</small>

This is a trivial example. In a real system, the operations can be more complex and the errors introduced extremely subtle. For example, reading a [[64-bit]] value from memory may actually be implemented as two [[sequence|sequential]] reads of two [[32-bit]] memory locations. If a process has only read the first 32&nbsp;bits, and before it reads the second 32&nbsp;bits the value in memory gets changed, it will have neither the original value nor the new value but a mixed-up value.

Furthermore, the specific order in which the processes run can change the results, making such an error difficult to detect, reproduce and [[debug]].

=== Compare-and-swap ===
{{Main|Compare-and-swap}}
Most systems provide an atomic compare-and-swap instruction that reads from a memory location, compares the value with an "expected" one provided by the user, and writes out a "new" value if the two match, returning whether the update succeeded. We can use this to fix the non-atomic counter algorithm as follows:

:# Read the value in the memory location;
:# add one to the value;
:# use compare-and-swap to write the incremented value back;
:# retry if the value read in by the compare-and-swap did not match the value we originally read.

Since the compare-and-swap occurs (or appears to occur) instantaneously, if another process updates the location while we are in-progress, the compare-and-swap is guaranteed to fail.

=== Fetch-and-increment ===
{{Main|Fetch-and-increment}}
Many systems provide an atomic fetch-and-increment instruction that reads from a memory location, unconditionally writes a new value (the old value plus one), and returns the old value.
We can use this to fix the non-atomic counter algorithm as follows:

:# Use fetch-and-increment to read the old value and write the incremented value back.

Using fetch-and increment is always better (requires fewer memory references) for some algorithms—such as the one shown here—than compare-and-swap,<ref name="cond-sync">{{cite book| last1=Fich | first1=Faith | title=Proceedings of the twenty-third annual ACM symposium on Principles of distributed computing – PODC '04 | last2=Hendler | first2=Danny | last3=Shavit | first3=Nir | year=2004 | publisher=ACM | location=New York, NY | pages=80–87 | isbn=978-1-58113-802-3 |doi=10.1145/1011767.1011780| chapter=On the inherent weakness of conditional synchronization primitives | s2cid=9313205 }}</ref> even though Herlihy earlier proved that compare-and-swap is better for certain other algorithms that can't be implemented at all using only fetch-and-increment.
So [[CPU design]]s with both fetch-and-increment and compare-and-swap (or equivalent instructions) may be a better choice than ones with only one or the other.<ref name="cond-sync" />

=== Locking ===
{{Main|Lock (computer science)}}
Another approach is to turn the naive algorithm into a [[critical section]], preventing other threads from disrupting it, using a [[lock (computer science)|lock]]. Once again fixing the non-atomic counter algorithm:

:# Acquire a lock, excluding other threads from running the critical section (steps 2–4) at the same time;
:# read the value in the memory location;
:# add one to the value;
:# write the incremented value back to the memory location;
:# release the lock.

This strategy works as expected; the lock prevents other threads from updating the value until it is released. However, when compared with direct use of atomic operations, it can suffer from significant overhead due to lock contention. To improve program performance, it may therefore be a good idea to replace simple critical sections with atomic operations for [[non-blocking synchronization]] (as we have just done for the counter with compare-and-swap and fetch-and-increment), instead of the other way around, but unfortunately a significant improvement is not guaranteed and lock-free algorithms can easily become too complicated to be worth the effort.