Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Locality of reference
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
=== Matrix multiplication === A common example is [[Matrix multiplication algorithm|matrix multiplication]]: <syntaxhighlight lang="pascal" line="1"> for i in 0..n for j in 0..m for k in 0..p C[i][j] = C[i][j] + A[i][k] * B[k][j]; </syntaxhighlight> By switching the looping order for <code>j</code> and <code>k</code>, the speedup in large matrix multiplications becomes dramatic, at least for languages that put contiguous array elements in the last dimension. This will not change the mathematical result, but it improves efficiency. In this case, "large" means, approximately, more than 100,000 elements in each matrix, or enough addressable memory such that the matrices will not fit in L1 and L2 caches. <syntaxhighlight lang="pascal" line="1"> for i in 0..n for k in 0..p for j in 0..m C[i][j] = C[i][j] + A[i][k] * B[k][j]; </syntaxhighlight> The reason for this speedup is that in the first case, the reads of <code>A[i][k]</code> are in cache (since the <code>k</code> index is the contiguous, last dimension), but <code>B[k][j]</code> is not, so there is a cache miss penalty on <code>B[k][j]</code>. <code>C[i][j]</code> is irrelevant, because it can be [[Loop-invariant_code_motion|hoisted]] out of the inner loop -- the loop variable there is <code>k</code>. <syntaxhighlight lang="pascal" line="1"> for i in 0..n for j in 0..m temp = C[i][j] for k in 0..p temp = temp + A[i][k] * B[k][j]; C[i][j] = temp </syntaxhighlight> In the second case, the reads and writes of <code>C[i][j]</code> are both in cache, the reads of <code>B[k][j]</code> are in cache, and the read of <code>A[i][k]</code> can be hoisted out of the inner loop. <syntaxhighlight lang="pascal" line="1"> for i in 0..n for k in 0..p temp = A[i][k] for j in 0..m C[i][j] = C[i][j] + temp * B[k][j]; </syntaxhighlight> Thus, the second example has no cache miss penalty in the inner loop while the first example has a cache penalty. On a year 2014 processor, the second case is approximately five times faster than the first case, when written in [[C (programming language)|C]] and compiled with <code>gcc -O3</code>. (A careful examination of the disassembled code shows that in the first case, [[GNU Compiler Collection|GCC]] uses [[SIMD]] instructions and in the second case it does not, but the cache penalty is much worse than the SIMD gain.){{Citation needed|date=September 2014}} Temporal locality can also be improved in the above example by using a technique called [[Loop blocking|blocking]]. The larger matrix can be divided into evenly sized sub-matrices, so that the smaller blocks can be referenced (multiplied) several times while in memory. Note that this example works for square matrices of dimensions SIZE x SIZE, but it can easily be extended for arbitrary matrices by substituting SIZE_I, SIZE_J and SIZE_K where appropriate. <syntaxhighlight lang="pascal" line="1"> for (ii = 0; ii < SIZE; ii += BLOCK_SIZE) for (kk = 0; kk < SIZE; kk += BLOCK_SIZE) for (jj = 0; jj < SIZE; jj += BLOCK_SIZE) maxi = min(ii + BLOCK_SIZE, SIZE); for (i = ii; i < maxi; i++) maxk = min(kk + BLOCK_SIZE, SIZE); for (k = kk; k < maxk; k++) maxj = min(jj + BLOCK_SIZE, SIZE); for (j = jj; j < maxj; j++) C[i][j] = C[i][j] + A[i][k] * B[k][j]; </syntaxhighlight> The temporal locality of the above solution is provided because a block can be used several times before moving on, so that it is moved in and out of memory less often. Spatial locality is improved because elements with consecutive memory addresses tend to be pulled up the memory hierarchy together.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)