Editing Loop unrolling (section)

{{Short description|Loop transformation technique}}

'''Loop unrolling''', also known as '''loop unwinding''', is a [[loop transformation]] technique that attempts to optimize a program's execution speed at the expense of its [[binary file|binary]] size, which is an approach known as  [[space–time tradeoff]]. The transformation can be undertaken manually by the programmer or by an [[optimizing compiler]]. On modern processors, loop unrolling is often counterproductive, as the increased code size can cause more cache misses; ''cf.'' [[Duff's device#Performance|Duff's device]].<ref name="lkml-0008.2/0171">{{cite web
 | url = http://lkml.indiana.edu/hypermail/linux/kernel/0008.2/0171.html
 | title = Re: [PATCH] Re: Move of input drivers, some word needed from you
 | date = August 22, 2000 | access-date = August 22, 2014
 | last = Tso | first = Ted | publisher = [[Linux kernel mailing list]] | website = lkml.indiana.edu
 | quote = Jim Gettys has a wonderful explanation of this effect in the X server. It turns out that with branch predictions and the relative speed of CPU vs. memory changing over the past decade, loop unrolling is pretty much pointless. In fact, by eliminating all instances of Duff's Device from the [[XFree86]] 4.0 server, the server shrunk in size by _half_ _a_ _megabyte_ (!!!), and was faster to boot, because the elimination of all that excess code meant that the X server wasn't thrashing the cache lines as much.
}}</ref>

The goal of loop unwinding is to increase a program's speed by reducing or eliminating instructions that control the loop, such as [[pointer arithmetic]] and "end of loop" tests on each iteration;<ref>{{cite book |author1=Ullman, Jeffrey D. |author2=Aho, Alfred V. |title=Principles of compiler design |url=https://archive.org/details/principlesofcomp0000ahoa |url-access=registration |publisher=Addison-Wesley Pub. Co |location=Reading, Mass |year=1977 |pages=[https://archive.org/details/principlesofcomp0000ahoa/page/471 471–2] |isbn=0-201-10073-8 }}</ref> reducing branch penalties; as well as hiding latencies, including the delay in reading data from memory.<ref>{{cite book |author=Petersen, W.P., Arbenz, P. |title=Introduction to Parallel Computing |url=https://archive.org/details/introductiontopa00wppe |url-access=registration |publisher=Oxford University Press |year=2004 |page=[https://archive.org/details/introductiontopa00wppe/page/n28 10] }}</ref> To eliminate this [[computational overhead]], loops can be re-written as a repeated sequence of similar independent statements.<ref>{{cite journal |author=Nicolau, Alexandru |title=Loop Quantization: Unwinding for Fine-Grain Parallelism Exploitation |series=Dept. of Computer Science Technical Report |publisher=Cornell University |location=Ithaca, NY |year=1985 |oclc=14638257 }}</ref>

Loop unrolling is also part of certain [[formal verification]] techniques, in particular bounded [[model checking]].<ref>[http://research.microsoft.com/pubs/144781/nfm11.pdf Model Checking Using SMT and Theory of Lists]</ref>