Editing Consistency model (section)

== Relaxed memory consistency models ==
Some different consistency models can be defined by relaxing one or more requirements in [[sequential consistency]] called relaxed consistency models.<ref name="RelaxedModel">{{Citation |last=Mankin |first=Jenny |title=CSG280: Parallel Computing Memory Consistency Models: A Survey in Past and Present Research |date=2007}}</ref> These consistency models do not provide memory consistency at the hardware level. In fact, the programmers are responsible for implementing the memory consistency by applying synchronization techniques. The above models are classified based on four criteria and are detailed further.

There are four comparisons to define the relaxed consistency:
; Relaxation: One way to categorize the relaxed consistency is to define which sequential consistency requirements are relaxed. We can have less strict models by relaxing either program order or write atomicity requirements defined by Adve and Gharachorloo, 1996.<ref name="sharedmemory"/> Program order guarantees that each process issues a memory request ordered by its program and write atomicity defines that memory requests are serviced based on the order of a single FIFO queue. In relaxing program order, any or all the ordering of operation pairs, write-after-write, read-after-write, or read/write-after-read, can be relaxed. In the relaxed write atomicity model, a process can view its own writes before any other processors.
; Synchronizing vs. non-synchronizing: A synchronizing model can be defined by dividing the memory accesses into two groups and assigning different consistency restrictions to each group considering that one group can have a weak consistency model while the other one needs a more restrictive consistency model. In contrast, a non-synchronizing model assigns the same consistency model to the memory access types.
; Issue vs. view-based:<ref name="unified">
{{cite journal
 | author1 = Steinke, Robert C.
 | author2 = Gary J. Nutt
 |   title = A unified theory of shared memory consistency.
 |    date = 2004
 | journal = Journal of the ACM
 |  volume = 51
 |   issue = 5
 |   pages = 800–849
 |     doi = 10.1145/1017460.1017464
 |   arxiv = cs/0208027
| s2cid = 3206071
 }}</ref> Issue method provides sequential consistency simulation by defining the restrictions for processes to issue memory operations. Whereas, view method describes the visibility restrictions on the events order for processes.
; Relative model strength: Some consistency models are more restrictive than others. In other words, strict consistency models enforce more constraints as consistency requirements.  The strength of a model can be defined by the program order or atomicity relaxations and the strength of models can also be compared. Some models are directly related if they apply the same relaxations or more. On the other hand, the models that relax different requirements are not directly related.

Sequential consistency has two requirements, program order and write atomicity. Different relaxed consistency models can be obtained by relaxing these requirements. This is done so that, along with relaxed constraints, the performance increases, but the programmer is responsible for implementing the memory consistency by applying synchronisation techniques and must have a good understanding of the hardware.

Potential relaxations:
* Write to read program order
* Write to write program order
* Read to read and read to write program orders

=== Relaxed write to read ===

An approach to improving the performance at the hardware level is by relaxing the PO of a write followed by a read which effectively hides the latency of write operations. The optimisation this type of relaxation relies on is that it allows the subsequent reads to be in a relaxed order with respect to the previous writes from the processor. Because of this relaxation some programs like XXX may fail to give SC results because of this relaxation. Whereas, programs like YYY are still expected to give consistent results because of the enforcement of the remaining program order constraints.

Three models fall under this category. The IBM 370 model is the strictest model. A read can be complete before an earlier write to a different address, but it is prohibited from returning the value of the write unless all the processors have seen the write. The SPARC V8 total store ordering model (TSO) model partially relaxes the IBM 370 Model, it allows a read to return the value of its own processor's write with respect to other writes to the same location i.e. it returns the value of its own write before others see it. Similar to the previous model, this cannot return the value of write unless all the processors have seen the write. The processor consistency model (PC) is the most relaxed of the three models and relaxes both the constraints such that a read can complete before an earlier write even before it is made visible to other processors.

In Example A, the result is possible only in IBM 370 because read(A) is not issued until the write(A) in that processor is completed. On the other hand, this result is possible in TSO and PC because they allow the reads of the flags before the writes of the flags in a single processor.

In Example B the result is possible only with PC as it allows P2 to return the value of a write even before it is visible to P3. This won't be possible in the other two models.

To ensure sequential consistency in the above models, safety nets or fences are used to manually enforce the constraint. The IBM370 model has some specialised ''serialisation instructions'' which are manually placed between operations. These instructions can consist of memory instructions or non-memory instructions such as branches. On the other hand, the TSO and PC models do not provide safety nets, but the programmers can still use read-modify-write operations to make it appear like the program order is still maintained between a write and a following read. In case of TSO, PO appears to be maintained if the R or W which is already a part of a R-modify-W is replaced by a R-modify-W, this requires the W in the R-modify-W is a ‘dummy’ that returns the read value. Similarly for PC, PO seems to be maintained if the read is replaced by a write or is already a part of R-modify-W.

However, compiler optimisations cannot be done after exercising this relaxation alone. Compiler optimisations require the full flexibility of reordering any two operations in the PO, so the ability to reorder a write with respect to a read is not sufficiently helpful in this case.

{| class="wikitable"
|+ Example A
|-
! {{abbr|P|Processor}}1
! {{abbr|P|Processor}}2
|-
| colspan=2 style="text-align:center;" | A = flag1 = flag2 = 0
|-
| flag1 = 1    || flag2 = 1
|-
| A = 1        || A = 2
|-
| reg1 = A     || reg3 = A
|- 
| reg2 = flag2 || reg4 = flag1
|-
| colspan=2 style="text-align:center;" | reg1 = 1; reg3 = 2, reg2 = reg4 = 0
|}

{| class="wikitable"
|+ Example B
|-
! {{abbr|P|Processor}}1
! {{abbr|P|Processor}}2
! {{abbr|P|Processor}}3
|-
| colspan=3 style="text-align:center;" | A = B = 0
|-
| A = 1 ||             ||
|-
|       || if (A == 1) ||
|-
|       || B = 1       || if (B == 1)
|-
|       ||             || reg1 = A
|-
| colspan=3 style="text-align:center;" | B = 1, reg1 = 0
|}

=== Relaxed write to read and write to write ===

Some models relax the program order even further by relaxing even the ordering constraints between writes to different locations. The SPARC V8 partial store ordering model (PSO) is the only example of such a model. The ability to pipeline and overlap writes to different locations from the same processor is the key hardware optimisation enabled by PSO. PSO is similar to TSO in terms of atomicity requirements, in that it allows a processor to read the value of its own write and prevents other processors from reading another processor's write before the write is visible to all other processors. Program order between two writes is maintained by PSO using an explicit STBAR instruction. The STBAR is inserted in a write buffer in implementations with FIFO write buffers. A counter is used to determine when all the writes before the STBAR instruction have been completed, which triggers a write to the memory system to increment the counter. A write acknowledgement decrements the counter, and when the counter becomes 0, it signifies that all the previous writes are completed.

In the examples A and B, PSO allows both these non-sequentially consistent results. The safety net that PSO provides is similar to TSO's, it imposes program order from a write to a read and enforces write atomicity.

Similar to the previous models, the relaxations allowed by PSO are not sufficiently flexible to be useful for compiler optimisation, which requires a much more flexible optimisation.

=== Relaxing read and read to write program orders: Alpha, RMO, and PowerPC ===

In some models, all operations to different locations are relaxed. A read or write may be reordered with respect to a different read or write in a different location. The ''weak ordering'' may be classified under this category and two types of release consistency models (RCsc and RCpc) also come under this model. Three commercial architectures are also proposed under this category of relaxation: the Digital Alpha, SPARC V9 relaxed memory order (RMO), and IBM PowerPC models. 

These three commercial architectures exhibit explicit fence instructions as their safety nets. The Alpha model provides two types of fence instructions, ''memory barrier'' (MB) and ''write memory barrier'' (WMB). The MB operation can be used to maintain program order of any memory operation before the MB with a memory operation after the barrier. Similarly, the WMB maintains program order only among writes. The SPARC V9 RMO model provides a MEMBAR instruction which can be customised to order previous reads and writes with respect to future read and write operations. There is no need for using read-modify-writes to achieve this order because the MEMBAR instruction can be used to order a write with respect to a succeeding read. The PowerPC model uses a single fence instruction called the SYNC instruction. It is similar to the MB instruction, but with a little exception that reads can occur out of program order even if a SYNC is placed between two reads to the same location. This model also differs from Alpha and RMO in terms of atomicity. It allows a write to be seen earlier than a read's completion. A combination of read modify write operations may be required to make an illusion of write atomicity.

RMO and PowerPC allow reordering of reads to the same location. These models violate sequential order in examples A and B. An additional relaxation allowed in these models is that memory operations following a read operation can be overlapped and reordered with respect to the read. Alpha and RMO allow a read to return the value of another processor's early write. From a programmer's perspective these models must maintain the illusion of write atomicity even though they allow the processor to read its own write early.