Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Strength reduction
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Many optimizations === The compiler will start doing many optimizations – not just strength reduction. Expressions that are constant (invariant) within a loop will be [[Loop-invariant code motion|hoisted]] out of the loop. Constants can be loaded outside of both loops—such as floating point registers fr3 and fr4. Recognition that some variables don't change allows registers to be merged; n is constant, so r2, r4, r7, r12 can be hoisted and collapsed. The common value i*n is computed in (the hoisted) r8 and r13, so they collapse. The innermost loop (0120-0260) has been reduced from 11 to 7 intermediate instructions. The only multiply that remains in the innermost loop is line 0210's multiply by 8. <syntaxhighlight lang="nasm"> 0010 ; for (i = 0, i < n; i++) 0020 { 0030 r1 = #0 ; i = 0 0050 load r2, n 0130 ; load r4, n killed; use r2 0180 ; load r7, n killed; use r2 0300 ; load r12, n killed; use r2 0220 fr3 = #0.0 0340 fr4 = #1.0 0040 G0000: 0060 cmp r1, r2 ; i < n 0070 bge G0001 0080 0190 r8 = r1 * r2 ; calculate subscript i * n 0310 ; r13 = r1 * r2 killed; use r8 ; calculate subscript i * n 0090 ; for (j = 0; j < n; j++) 0100 { 0110 r3 = #0 ; j = 0 0120 G0002: 0140 cmp r3, r2 ; j < n 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0200 r9 = r8 + r3 ; calculate subscript i * n + j 0210 r10 = r9 * #8 ; calculate byte address 0230 fstore fr3, A[r10] 0240 0250 r3 = r3 + #1 ; j++ 0260 br G0002 0270 } 0280 G0003: 0290 ; A[i,i] = 1.0; 0320 r14 = r8 + r1 ; calculate subscript i * n + i 0330 r15 = r14 * #8 ; calculate byte address 0350 fstore fr4, A[r15] 0360 0370 ;i++ 0380 r1 = r1 + #1 0390 br G0000 0400 } 0410 G0001: </syntaxhighlight> There are more optimizations to do. Register r3 is the main variable in the innermost loop (0140-0260); it gets incremented by 1 each time through the loop. Register r8 (which is invariant in the innermost loop) is added to r3. Instead of using r3, the compiler can eliminate r3 and use r9. The loop, instead of being controlled by r3 = 0 to n-1, can be controlled by r9=r8+0 to r8+n-1. That adds four instructions and kills four instructions, but there's one fewer instruction inside the loop. <syntaxhighlight lang="nasm"> 0110 ; r3 = #0 killed ; j = 0 0115 r9 = r8 ; new assignment 0117 r20 = r8 + r2 ; new limit 0120 G0002: 0140 ; cmp r3, r2 killed ; j < n 0145 cmp r9, r20 ; r8 + j < r8 + n = r20 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0200 ; r9 = r8 + r3 killed ; calculate subscript i * n + j 0210 r10 = r9 * #8 ; calculate byte address 0230 fstore fr3, A[r10] 0240 0250 ; r3 = r3 + #1 killed ; j++ 0255 r9 = r9 + #1 ; new loop variable 0260 br G0002 </syntaxhighlight> Now r9 is the loop variable, but it interacts with the multiply by 8. Here we get to do some strength reduction. The multiply by 8 can be reduced to some successive additions of 8. Now there are no multiplications inside the loop. <syntaxhighlight lang="nasm"> 0115 r9 = r8 ; new assignment 0117 r20 = r8 + r2 ; new limit 0118 r10 = r8 * #8 ; initial value of r10 0120 G0002: 0145 cmp r9, r20 ; r8 + j < r8 + n = r20 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0210 ; r10 = r9 * #8 killed ; calculate byte address 0230 fstore fr3, A[r10] 0240 0245 r10 = r10 + #8 ; strength reduced multiply 0255 r9 = r9 + #1 ; loop variable 0260 br G0002 </syntaxhighlight> Registers r9 and r10 (= 8*r9) aren't both needed; r9 can be eliminated in the loop. The loop is now 5 instructions. <syntaxhighlight lang="nasm"> 0115 ; r9 = r8 killed 0117 r20 = r8 + r2 ; limit 0118 r10 = r8 * #8 ; initial value of r10 0119 r22 = r20 * #8 ; new limit 0120 G0002: 0145 ; cmp r9, r20 killed ; r8 + j < r8 + n = r20 0147 cmp r10, r22 ; r10 = 8*(r8 + j) < 8*(r8 + n) = r22 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0230 fstore fr3, A[r10] 0240 0245 r10 = r10 + #8 ; strength reduced multiply 0255 ; r9 = r9 + #1 killed ; loop variable 0260 br G0002 </syntaxhighlight>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)