Editing Out-of-order execution (section)

== Dispatch and issue decoupling allows out-of-order issue ==
One of the differences created by the new paradigm is the creation of queues that allow the dispatch step to be decoupled from the issue step and the graduation stage to be decoupled from the execute stage. An early name for the paradigm was ''decoupled architecture''. In the earlier ''in-order'' processors, these stages operated in a fairly [[Lockstep (computing)|lock-step]], pipelined fashion.

The [[Instruction cycle|fetch and decode stages]] is separated from the execute stage in a [[Pipeline (computing)|pipelined]] processor by using a [[Data buffer|buffer]]. The buffer's purpose is to partition the [[Memory access pattern|memory access]] and execute functions in a computer program and achieve high performance by exploiting the fine-grain [[parallel computing|parallelism]] between the two.<ref>{{cite journal |author-last=Smith |author-first1=J. E. |title=Decoupled access/execute computer architectures |journal= ACM Transactions on Computer Systems|date=1984 |volume=2 |issue=4 |pages=289–308 |citeseerx=10.1.1.127.4475 |doi=10.1145/357401.357403|s2cid=13903321 }}</ref> In doing so, it effectively hides all [[memory latency]] from the processor's perspective.

A larger buffer can, in theory, increase throughput. However, if the processor has a [[branch misprediction]] then the entire buffer may need to be flushed, wasting a lot of [[clock cycle]]s and reducing the effectiveness. Furthermore, larger buffers create more heat and use more [[Die (integrated circuit)|die]] space. For this reason processor designers today favor a [[multi-threaded]] design approach.

Decoupled architectures are generally thought of as not useful for general-purpose computing as they do not handle control-intensive code well.<ref>{{cite journal |author-last1=Kurian |author-first1=L. |author-last2=Hulina |author-first2=P. T. |author-last3=Coraor |author-first3=L. D. |title=Memory latency effects in decoupled architectures |journal=[[IEEE Transactions on Computers]] |volume=43 |issue=10 |date=1994 |pages=1129–1139 |doi=10.1109/12.324539 |s2cid=6913858 |url=https://pdfs.semanticscholar.org/6aa3/18cce633e3c2d86d970d6d50104d818d9407.pdf |archive-url=https://web.archive.org/web/20180612141055/https://pdfs.semanticscholar.org/6aa3/18cce633e3c2d86d970d6d50104d818d9407.pdf |url-status=dead |archive-date=2018-06-12 }}</ref> Control intensive code include such things as nested branches that occur frequently in [[operating system kernel]]s. Decoupled architectures play an important role in scheduling in [[very long instruction word]] (VLIW) architectures.<ref>{{cite journal |author-first1=M. N. |author-last1=Dorojevets |author-first2=V. |author-last2=Oklobdzija |title=Multithreaded decoupled architecture |journal=International Journal of High Speed Computing |volume=7 |issue=3 |pages=465–480 |date=1995 |doi=10.1142/S0129053395000257 |url=https://www.researchgate.net/publication/220171480}}</ref>