Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Strength reduction
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===The last multiply=== That leaves the two loops with only one multiplication operation (at 0330) within the outer loop and no multiplications within the inner loop. <syntaxhighlight lang="nasm"> 0010 ; for (i = 0, i < n; i++) 0020 { 0030 r1 = #0 ; i = 0 0050 load r2, n 0220 fr3 = #0.0 0340 fr4 = #1.0 0055 r8 = r1 * r2 ; set initial value for r8 0056 r40 = r8 * #8 ; initial value for r8 * 8 0057 r30 = r2 * #8 ; increment for r40 0058 r20 = r8 + r2 ; copied from 0117 0058 r22 = r20 * #8 ; initial value of r22 0040 G0000: 0060 cmp r1, r2 ; i < n 0070 bge G0001 0080 0118 r10 = r40 ; strength reduced expression to r40 0090 ; for (j = 0; j < n; j++) 0100 { 0120 G0002: 0147 cmp r10, r22 ; r10 = 8*(r8 + j) < 8*(r8 + n) = r22 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0230 fstore fr3, A[r10] 0240 0245 r10 = r10 + #8 ; strength reduced multiply 0260 br G0002 0270 } 0280 G0003: 0290 ; A[i,i] = 1.0; 0320 r14 = r8 + r1 ; calculate subscript i * n + i 0330 r15 = r14 * #8 ; calculate byte address 0350 fstore fr4, A[r15] 0360 0370 ;i++ 0380 r1 = r1 + #1 0385 r8 = r8 + r2 ; strength reduce r8 = r1 * r2 0386 r40 = r40 + r30 ; strength reduce expression r8 * 8 0388 r22 = r22 + r30 ; strength reduce r22 = r20 * 8 0390 br G0000 0400 } 0410 G0001: </syntaxhighlight> At line 0320, r14 is the sum of r8 and r1, and r8 and r1 are being incremented in the loop. Register r8 is being bumped by r2 (=n) and r1 is being bumped by 1. Consequently, r14 is being bumped by n+1 each time through the loop. The last loop multiply at 0330 can be strength reduced by adding (r2+1)*8 each time through the loop. <syntaxhighlight lang="nasm"> 0010 ; for (i = 0, i < n; i++) 0020 { 0030 r1 = #0 ; i = 0 0050 load r2, n 0220 fr3 = #0.0 0340 fr4 = #1.0 0055 r8 = r1 * r2 ; set initial value for r8 0056 r40 = r8 * #8 ; initial value for r8 * 8 0057 r30 = r2 * #8 ; increment for r40 0058 r20 = r8 + r2 ; copied from 0117 0058 r22 = r20 * #8 ; initial value of r22 005A r14 = r8 + r1 ; copied from 0320 005B r15 = r14 * #8 ; initial value of r15 (0330) 005C r49 = r2 + #1 005D r50 = r49 * #8 ; strength reduced increment 0040 G0000: 0060 cmp r1, r2 ; i < n 0070 bge G0001 0080 0118 r10 = r40 ; strength reduced expression to r40 0090 ; for (j = 0; j < n; j++) 0100 { 0120 G0002: 0147 cmp r10, r22 ; r10 = 8*(r8 + j) < 8*(r8 + n) = r22 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0230 fstore fr3, A[r10] 0240 0245 r10 = r10 + #8 ; strength reduced multiply 0260 br G0002 0270 } 0280 G0003: 0290 ; A[i,i] = 1.0; 0320 ; r14 = r8 + r1 killed ; dead code 0330 ; r15 = r14 * #8 killed ; strength reduced 0350 fstore fr4, A[r15] 0360 0370 ;i++ 0380 r1 = r1 + #1 0385 r8 = r8 + r2 ; strength reduce r8 = r1 * r2 0386 r40 = r40 + r30 ; strength reduce expression r8 * 8 0388 r22 = r22 + r30 ; strength reduce r22 = r20 * 8 0389 r15 = r15 + r50 ; strength reduce r15 = r14 * 8 0390 br G0000 0400 } 0410 G0001: </syntaxhighlight> There's still more to go. Constant folding will recognize that r1=0 in the preamble, so several instructions will clean up. Register r8 isn't used in the loop, so it can disappear. Furthermore, r1 is only being used to control the loop, so r1 can be replaced by a different induction variable such as r40. Where i went 0 <= i < n, register r40 goes 0 <= r40 < 8 * n * n. <syntaxhighlight lang="nasm"> 0010 ; for (i = 0, i < n; i++) 0020 { 0030 ; r1 = #0 ; i = 0, becomes dead code 0050 load r2, n 0220 fr3 = #0.0 0340 fr4 = #1.0 0055 ; r8 = #0 killed ; r8 no longer used 0056 r40 = #0 ; initial value for r8 * 8 0057 r30 = r2 * #8 ; increment for r40 0058 ; r20 = r2 killed ; r8 = 0, becomes dead code 0058 r22 = r2 * #8 ; r20 = r2 005A ; r14 = #0 killed ; r8 = 0, becomes dead code 005B r15 = #0 ; r14 = 0 005C r49 = r2 + #1 005D r50 = r49 * #8 ; strength reduced increment 005D r60 = r2 * r30 ; new limit for r40 0040 G0000: 0060 ; cmp r1, r2 killed ; i < n; induction variable replaced 0065 cmp r40, r60 ; i * 8 * n < 8 * n * n 0070 bge G0001 0080 0118 r10 = r40 ; strength reduced expression to r40 0090 ; for (j = 0; j < n; j++) 0100 { 0120 G0002: 0147 cmp r10, r22 ; r10 = 8*(r8 + j) < 8*(r8 + n) = r22 0150 bge G0003 0160 0170 ; A[i,j] = 0.0; 0230 fstore fr3, A[r10] 0240 0245 r10 = r10 + #8 ; strength reduced multiply 0260 br G0002 0270 } 0280 G0003: 0290 ; A[i,i] = 1.0; 0350 fstore fr4, A[r15] 0360 0370 ;i++ 0380 ; r1 = r1 + #1 killed ; dead code (r40 controls loop) 0385 ; r8 = r8 + r2 killed ; dead code 0386 r40 = r40 + r30 ; strength reduce expression r8 * 8 0388 r22 = r22 + r30 ; strength reduce r22 = r20 * 8 0389 r15 = r15 + r50 ; strength reduce r15 = r14 * 8 0390 br G0000 0400 } 0410 G0001: </syntaxhighlight>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)