Editing Optimizing compiler (section)

== Factors affecting optimization ==
{{Original research|section|date=August 2020}}
;Target machine: Whether particular optimizations can and should be applied may depend on the characteristics of the target machine. Some compilers such as [[GNU Compiler Collection|GCC]] and [[Clang]] parameterize machine-dependent factors so that they can be used to optimize for different machines.<ref>{{cite web|title=GCC – Machine-Dependent Options|url=https://gcc.gnu.org/onlinedocs/gcc/gcc-command-options/machine-dependent-options.html|publisher=[[GNU Project]]}}</ref>

;Target [[Central processing unit|CPU]] architecture
* Number of [[processor register|registers]]: Registers can be used to optimize for performance. [[Local variable]]s can be stored in registers instead of the [[Call stack|stack]]. Temporary/intermediate results can be accessed in registers instead of slower memory.
* [[RISC]] vs. [[Complex instruction set computer|CISC]]: CISC instruction sets often have variable instruction lengths,<ref>{{Cite web|title=RISC vs. CISC|url=https://cs.stanford.edu/people/eroberts/courses/soco/projects/risc/risccisc/|access-date=2024-10-15|website=cs.stanford.edu}}</ref> often have a larger number of possible instructions that can be used, and each instruction could take differing amounts of time. RISC instruction sets attempt to limit the variability in each of these: instruction sets are usually constant in length, with few exceptions, there are usually fewer combinations of registers and memory operations, and the instruction issue rate (the number of instructions completed per time period, usually an integer multiple of the clock cycle) is usually constant in cases where memory latency is not a factor. There may be several ways of carrying out a certain task, with CISC usually offering more alternatives than RISC. Compilers have to know the relative costs among the various instructions and choose the best instruction sequence (see [[instruction selection]]).
* [[Instruction pipeline|Pipelines]]: A pipeline is a CPU broken up into an [[assembly line]]. It allows the use of parts of the CPU for different instructions by breaking up the execution of instructions into various stages: instruction decode, address decode, memory fetch, register fetch, compute, register store, etc. One instruction could be in the register store stage, while another could be in the register fetch stage. Pipeline conflicts occur when an instruction in one stage of the pipeline depends on the result of another instruction ahead of it in the pipeline but not yet completed. Pipeline conflicts can lead to [[pipeline stall]]s: where the CPU wastes cycles waiting for a conflict to resolve. Compilers can ''schedule'', or reorder, instructions so that pipeline stalls occur less frequently.
* [[Superscalar|Number of functional units]]: Some CPUs have several [[arithmetic logic unit|ALUs]] and [[floating-point unit|FPUs]] that allow them to execute multiple instructions simultaneously. There may be restrictions on which instructions can pair with which other instructions ("pairing" is the simultaneous execution of two or more instructions), and which functional unit can execute which instruction. They also have issues similar to pipeline conflicts. Instructions can be scheduled so that the functional units are fully loaded.

;Machine architecture
* [[CPU cache]] size and type (direct mapped, 2-/4-/8-/16-way associative, fully associative): Techniques such as [[inline expansion]] and [[loop unrolling]] may increase the size of the generated code and reduce code locality. The program may slow down drastically if a highly used section of code (like inner loops in various algorithms) no longer fits in the cache as a result of optimizations that increase code size. Also, caches that are not fully associative have higher chances of cache collisions even in an unfilled cache.
* Cache/memory transfer rates: These give the compiler an indication of the penalty for cache misses. This is used mainly in specialized applications.

;Intended use
* [[Debugging]]: During development, optimizations are often disabled to speed compilation or to make the executable code easier to debug. Optimizing transformations, particularly those that reorder code, can make it difficult to relate the executable code to the source code.
* General-purpose use: Prepackaged software is often expected to run on a variety of machines that may share the same instruction set but have different performance characteristics. The code may not be optimized to any particular machine or may be tuned to work best on the most popular machine while working less optimally on others.
* Special-purpose use: If the software is compiled for machines with uniform characteristics, then the compiler can heavily optimize the generated code for those machines.

:Notable cases include code designed for [[parallel computing|parallel]] and [[vector processor]]s, for which special [[parallelizing compiler]]s are used.

:Firmware for an [[embedded system]] can be optimized for the target CPU and memory. System cost or reliability may be more important than the code speed. For example, compilers for embedded software usually offer options that reduce code size at the expense of speed. The code's timing may need to be predictable, rather than as fast as possible, so code caching might be disabled, along with compiler optimizations that require it.