Editing CPU cache (section)

==Associativity==
[[File:Cache Fill.svg|thumb|upright=1.7|An illustration of different ways in which memory locations can be cached by particular cache locations]]
{{Main|Cache placement policies}}
The [[Cache placement policies|placement policy]] decides where in the cache a copy of a particular entry of main memory will go. If the placement policy is free to choose any entry in the cache to hold the copy, the cache is called ''fully associative''. At the other extreme, if each entry in the main memory can go in just one place in the cache, the cache is ''direct-mapped''. Many caches implement a compromise in which each entry in the main memory can go to any one of N places in the cache, and are described as N-way set associative.<ref>{{cite web |date=2010-12-02 |title=Cache design |url=https://cseweb.ucsd.edu/classes/fa10/cse240a/pdf/08/CSE240A-MBT-L15-Cache.ppt.pdf |access-date=2023-01-29 |website=ucsd.edu |pages=10&ndash;15}}</ref> For example, the level-1 data cache in an [[AMD Athlon]] is two-way set associative, which means that any particular location in main memory can be cached in either of two locations in the level-1 data cache.

Choosing the right value of associativity involves a [[trade-off]]. If there are ten places to which the placement policy could have mapped a memory location, then to check if that location is in the cache, ten cache entries must be searched. Checking more places takes more power and chip area, and potentially more time. On the other hand, caches with more associativity suffer fewer misses (see [[Cache performance measurement and metric#Conflict misses|conflict misses]]), so that the CPU wastes less time reading from the slow main memory. The general guideline is that doubling the associativity, from direct mapped to two-way, or from two-way to four-way, has about the same effect on raising the hit rate as doubling the cache size. However, increasing associativity more than four does not improve hit rate as much,<ref>{{Cite conference |url=https://ieeexplore.ieee.org/document/5234663 |access-date=2023-10-18 |via=ieeexplore.ieee.org | date=2009 | doi=10.1109/ICCSIT.2009.5234663 | s2cid=18236635 |language=en-US | last1=Megalingam | first1=Rajesh Kannan | last2=Deepu | first2=K.B | last3=Joseph | first3=Iype P. | last4=Vikram | first4=Vandana |conference=2009 2nd IEEE International Conference on Computer Science and Information Technology |title=Phased set associative cache design for reduced power consumption | pages=551–556 | isbn=978-1-4244-4519-6 | url-access=subscription }}</ref> and are generally done for other reasons (see [[#Virtual aliasing|virtual aliasing]]). Some CPUs can dynamically reduce the associativity of their caches in low-power states, which acts as a power-saving measure.<ref>{{cite web |last1=Jahagirdar |first1=Sanjeev |last2=George |first2=Varghese |last3=Sodhi |first3=Inder |last4=Wells |first4=Ryan |year=2012 |title=Power Management of the Third Generation Intel Core Micro Architecture formerly codenamed Ivy Bridge |url=http://hotchips.org/wp-content/uploads/hc_archives/hc24/HC24-1-Microprocessor/HC24.28.117-HotChips_IvyBridge_Power_04.pdf#page=18 |url-status=dead |archive-url=https://web.archive.org/web/20200729002711/http://hotchips.org/wp-content/uploads/hc_archives/hc24/HC24-1-Microprocessor/HC24.28.117-HotChips_IvyBridge_Power_04.pdf#page=18 |archive-date=2020-07-29 |access-date=2015-12-16 |website=hotchips.org |page=18 |format=PDF}}</ref>

<!-- where does "pseudo-associative cache" go in this spectrum? -->
In order of worse but simple to better but complex:

* Direct mapped cache{{snd}}good best-case time, but unpredictable in the worst case
* Two-way set associative cache
* Two-way skewed associative cache<ref name="Seznec">{{cite journal |author=Seznec |first=André |year=1993 |title=A Case for Two-Way Skewed-Associative Caches |journal=ACM SIGARCH Computer Architecture News |volume=21 |issue=2 |pages=169–178 |doi=10.1145/173682.165152 |doi-access=free}}</ref>
* Four-way set-associative cache
* Eight-way set-associative cache, a common choice for later implementations
* 12-way set associative cache, similar to eight-way
* Fully associative cache{{snd}}the best miss rates, but practical only for a small number of entries

===Direct-mapped cache===
In this cache organization, each location in the main memory can go in only one entry in the cache. Therefore, a direct-mapped cache can also be called a "one-way set associative" cache. It does not have a placement policy as such, since there is no choice of which cache entry's contents to evict. This means that if two locations map to the same entry, they may continually knock each other out. Although simpler, a direct-mapped cache needs to be much larger than an associative one to give comparable performance, and it is more unpredictable. Let {{mvar|x}} be block number in cache, {{mvar|y}} be block number of memory, and {{mvar|n}} be number of blocks in cache, then mapping is done with the help of the equation {{math|''x'' {{=}} ''y'' mod ''n''}}.

===Two-way set associative cache===
If each location in the main memory can be cached in either of two locations in the cache, one logical question is: ''which one of the two?'' The simplest and most commonly used scheme, shown in the right-hand diagram above, is to use the least significant bits of the memory location's index as the index for the cache memory, and to have two entries for each index. One benefit of this scheme is that the tags stored in the cache do not have to include that part of the main memory address which is implied by the cache memory's index. Since the cache tags have fewer bits, they require fewer transistors, take less space on the processor circuit board or on the microprocessor chip, and can be read and compared faster. Also [[Cache algorithms|LRU]] algorithm is especially simple since only one bit needs to be stored for each pair.

===Speculative execution===
One of the advantages of a direct-mapped cache is that it allows simple and fast [[speculative execution|speculation]]. Once the address has been computed, the one cache index which might have a copy of that location in memory is known. That cache entry can be read, and the processor can continue to work with that data before it finishes checking that the tag actually matches the requested address.

The idea of having the processor use the cached data before the tag match completes can be applied to associative caches as well. A subset of the tag, called a ''hint'', can be used to pick just one of the possible cache entries mapping to the requested address. The entry selected by the hint can then be used in parallel with checking the full tag. The hint technique works best when used in the context of address translation, as explained below.

===Two-way skewed associative cache===
Other schemes have been suggested, such as the ''skewed cache'',<ref name="Seznec" /> where the index for way 0 is direct, as above, but the index for way 1 is formed with a [[hash function]]. A good hash function has the property that addresses which conflict with the direct mapping tend not to conflict when mapped with the hash function, and so it is less likely that a program will suffer from an unexpectedly large number of conflict misses due to a pathological access pattern. The downside is extra latency from computing the hash function.<ref name="CK">{{cite web |author=Kozyrakis |first=C. |title=Lecture 3: Advanced Caching Techniques |url=http://www.stanford.edu/class/ee282/08_handouts/L03-Cache.pdf |url-status=dead |archive-url=https://web.archive.org/web/20120907012034/http://www.stanford.edu/class/ee282/08_handouts/L03-Cache.pdf |archive-date=September 7, 2012}}</ref> Additionally, when it comes time to load a new line and evict an old line, it may be difficult to determine which existing line was least recently used, because the new line conflicts with data at different indexes in each way; [[Cache algorithms|LRU]] tracking for non-skewed caches is usually done on a per-set basis. Nevertheless, skewed-associative caches have major advantages over conventional set-associative ones.<ref>
{{cite web |title=Micro-Architecture |url=https://www.irisa.fr/caps/PROJECTS/Architecture/ |quote=Skewed-associative caches have been shown to have two major advantages over conventional set-associative caches.}}</ref>

===Pseudo-associative cache===
A true set-associative cache tests all the possible ways simultaneously, using something like a [[content-addressable memory]]. A pseudo-associative cache tests each possible way one at a time. A hash-rehash cache and a column-associative cache are examples of a pseudo-associative cache.

In the common case of finding a hit in the first way tested, a pseudo-associative cache is as fast as a direct-mapped cache, but it has a much lower conflict miss rate than a direct-mapped cache, closer to the miss rate of a fully associative cache.<ref name="CK" />

===Multicolumn cache===
Comparing with a direct-mapped cache, a set associative cache has a reduced number of bits for its cache set index that maps to a cache set, where multiple ways or blocks stays, such as 2 blocks for a 2-way set associative cache and 4 blocks for a 4-way set associative cache. Comparing with a direct mapped cache, the unused cache index bits become a part of the tag bits. For example, a 2-way set associative cache contributes 1 bit to the tag and a 4-way set associative cache contributes 2 bits to the tag. The basic idea of the multicolumn cache<ref name="Two-fast">{{Cite journal |last1=Zhang |first1=Chenxi |last2=Zhang |first2=Xiaodong |last3=Yan |first3=Yong |date=September–October 1997 |title=Two fast and high-associativity cache schemes |journal=IEEE Micro |volume=17 |issue=5 |pages=40–49 |doi=10.1109/40.621212}}</ref> is to use the set index to map to a cache set as a conventional set associative cache does, and to use the added tag bits to index a way in the set. For example, in a 4-way set associative cache, the two bits are used to index way 00, way 01, way 10, and way 11, respectively. This double cache indexing is called a "major location mapping", and its latency is equivalent to a direct-mapped access. Extensive experiments in multicolumn cache design<ref name="Two-fast" /> shows that the hit ratio to major locations is as high as 90%. If cache mapping conflicts with a cache block in the major location, the existing cache block will be moved to another cache way in the same set, which is called "selected location". Because the newly indexed cache block is a most recently used (MRU) block, it is placed in the major location in multicolumn cache with a consideration of temporal locality. Since multicolumn cache is designed for a cache with a high associativity, the number of ways in each set is high; thus, it is easy find a selected location in the set. A selected location index by an additional hardware is maintained for the major location in a cache block.{{Citation needed|date=May 2023}}

Multicolumn cache remains a high hit ratio due to its high associativity, and has a comparable low latency to a direct-mapped cache due to its high percentage of hits in major locations. The concepts of major locations and selected locations in multicolumn cache have been used in several cache designs in ARM Cortex R chip,<ref>{{Cite web |date=2014 |title=ARM Cortex-R Series Programmer's Guide |url=https://developer.arm.com/documentation/den0042/latest |access-date=2023-05-04 |website=developer.arm.com}}</ref> Intel's way-predicting cache memory,<ref>{{Cite web|url=https://patents.google.com/patent/US6425055B1/en|title=Way-predicting cache memory}}</ref> IBM's reconfigurable multi-way associative cache memory<ref>{{Cite web |date=November 22, 1994 |title=Reconfigurable multi-way associative cache memory |url=https://patents.google.com/patent/US5367653A/en |access-date=2024-01-19}}</ref> and Oracle's dynamic cache replacement way selection based on address tab bits.<ref>{{Cite web |title=US Patent Application for DYNAMIC CACHE REPLACEMENT WAY SELECTION BASED ON ADDRESS TAG BITS Patent Application (Application #20160350229 issued December 1, 2016) – Justia Patents Search |url=https://patents.justia.com/patent/20160350229 |website=patents.justia.com}}</ref>