Editing Strength reduction (section)

=== Many optimizations ===

The compiler will start doing many optimizations &ndash; not just strength reduction. Expressions that are constant (invariant) within a loop will be [[Loop-invariant code motion|hoisted]] out of the loop.  Constants can be loaded outside of both loops—such as floating point registers fr3 and fr4. Recognition that some variables don't change allows registers to be merged; n is constant, so r2, r4, r7, r12 can be hoisted and collapsed. The common value i*n is computed in (the hoisted) r8 and r13, so they collapse. The innermost loop (0120-0260) has been reduced from 11 to 7 intermediate instructions. The only multiply that remains in the innermost loop is line 0210's multiply by 8.

<syntaxhighlight lang="nasm">
0010  ; for (i = 0, i < n; i++)
0020    {
0030      r1 = #0                        ; i = 0
0050      load r2, n
0130 ;   load r4, n     killed; use r2
0180 ;   load r7, n     killed; use r2
0300 ;   load r12, n    killed; use r2
0220      fr3 = #0.0
0340      fr4 = #1.0
0040  G0000:
0060      cmp r1, r2                     ; i < n
0070      bge G0001
0080
0190      r8  = r1 * r2                  ; calculate subscript i * n
0310 ;   r13 = r1 * r2  killed; use r8   ; calculate subscript i * n
0090      ; for (j = 0; j < n; j++)
0100        {
0110           r3   = #0                 ; j = 0
0120  G0002:
0140           cmp r3, r2                ; j < n
0150           bge G0003
0160
0170           ; A[i,j] = 0.0;
0200           r9  = r8 + r3             ; calculate subscript i * n + j
0210           r10 = r9 * #8             ; calculate byte address
0230           fstore fr3, A[r10]
0240
0250           r3 = r3 + #1              ; j++
0260           br G0002
0270        }
0280  G0003:
0290      ; A[i,i] = 1.0;
0320      r14 = r8  + r1                 ; calculate subscript i * n + i
0330      r15 = r14 * #8                 ; calculate byte address
0350      fstore fr4, A[r15]
0360
0370      ;i++
0380      r1 = r1 + #1
0390      br G0000
0400    }
0410  G0001:
</syntaxhighlight>

There are more optimizations to do. Register r3 is the main variable in the innermost loop (0140-0260); it gets incremented by 1 each time through the loop. Register r8 (which is invariant in the innermost loop) is added to r3. Instead of using r3, the compiler can eliminate r3 and use r9. The loop, instead of being controlled by r3 = 0 to n-1, can be controlled by r9=r8+0 to r8+n-1. That adds four instructions and kills four instructions, but there's one fewer instruction inside the loop.

<syntaxhighlight lang="nasm">
0110  ;       r3   = #0     killed       ; j = 0
0115           r9   = r8                 ; new assignment
0117           r20  = r8 + r2            ; new limit
0120  G0002:
0140  ;       cmp r3, r2    killed       ; j < n
0145           cmp r9, r20               ; r8 + j < r8 + n = r20
0150           bge G0003
0160
0170           ; A[i,j] = 0.0;
0200  ;       r9  = r8 + r3 killed       ; calculate subscript i * n + j
0210           r10 = r9 * #8             ; calculate byte address
0230           fstore fr3, A[r10]
0240
0250  ;       r3 = r3 + #1  killed       ; j++
0255           r9 = r9 + #1              ; new loop variable
0260           br G0002
</syntaxhighlight>

Now r9 is the loop variable, but it interacts with the multiply by 8. Here we get to do some strength reduction. The multiply by 8 can be reduced to some successive additions of 8. Now there are no multiplications inside the loop.

<syntaxhighlight lang="nasm">
0115           r9   = r8                 ; new assignment
0117           r20  = r8 + r2            ; new limit
0118           r10  = r8 * #8            ; initial value of r10
0120  G0002:
0145           cmp r9, r20               ; r8 + j < r8 + n = r20
0150           bge G0003
0160
0170           ; A[i,j] = 0.0;
0210  ;       r10 = r9 * #8    killed    ; calculate byte address
0230           fstore fr3, A[r10]
0240
0245           r10 = r10 + #8            ; strength reduced multiply
0255           r9 = r9 + #1              ; loop variable
0260           br G0002
</syntaxhighlight>

Registers r9 and r10 (= 8*r9) aren't both needed; r9 can be eliminated in the loop. The loop is now 5 instructions.

<syntaxhighlight lang="nasm">
0115  ;       r9   = r8         killed
0117           r20  = r8 + r2            ; limit
0118           r10  = r8 * #8            ; initial value of r10
0119           r22  = r20 * #8           ; new limit
0120  G0002:
0145  ;       cmp r9, r20       killed   ; r8 + j < r8 + n = r20
0147           cmp r10, r22              ; r10 = 8*(r8 + j) < 8*(r8 + n) = r22
0150           bge G0003
0160
0170           ; A[i,j] = 0.0;
0230           fstore fr3, A[r10]
0240
0245           r10 = r10 + #8            ; strength reduced multiply
0255  ;       r9 = r9 + #1      killed   ; loop variable
0260           br G0002
</syntaxhighlight>