Editing CPU cache (section)

===Example: the K8===
To illustrate both specialization and multi-level caching, here is the cache hierarchy of the K8 core in the AMD [[Athlon 64]] CPU.<ref>{{cite web|url=http://www.sandpile.org/impl/k8.htm |title=AMD K8 |access-date=2007-06-02 |website=Sandpile.org |url-status=dead |archive-url=https://web.archive.org/web/20070515052223/http://www.sandpile.org/impl/k8.htm |archive-date=2007-05-15 }}</ref>

[[File:Cache,hierarchy-example.svg|thumb|center|upright=1.6|Cache hierarchy of the K8 core in the AMD Athlon 64 CPU]]

The K8 has four specialized caches: an instruction cache, an instruction [[translation lookaside buffer|TLB]], a data TLB, and a data cache. Each of these caches is specialized:

* The instruction cache keeps copies of 64-byte lines of memory, and fetches 16 bytes each cycle. Each byte in this cache is stored in ten bits rather than eight, with the extra bits marking the boundaries of instructions (this is an example of predecoding). The cache has only [[parity bit|parity]] protection rather than [[Error-correcting code|ECC]], because parity is smaller and any damaged data can be replaced by fresh data fetched from memory (which always has an up-to-date copy of instructions).
* The instruction TLB keeps copies of page table entries (PTEs). Each cycle's instruction fetch has its virtual address translated through this TLB into a physical address. Each entry is either four or eight bytes in memory. Because the K8 has a variable page size, each of the TLBs is split into two sections, one to keep PTEs that map 4&nbsp;KiB pages, and one to keep PTEs that map 4&nbsp;MiB or 2&nbsp;MiB pages. The split allows the fully associative match circuitry in each section to be simpler. The operating system maps different sections of the virtual address space with different size PTEs.
* The data TLB has two copies which keep identical entries. The two copies allow two data accesses per cycle to translate virtual addresses to physical addresses. Like the instruction TLB, this TLB is split into two kinds of entries.
* The data cache keeps copies of 64-byte lines of memory. It is split into 8 banks (each storing 8&nbsp;KiB of data), and can fetch two 8-byte data each cycle so long as those data are in different banks. There are two copies of the tags, because each 64-byte line is spread among all eight banks. Each tag copy handles one of the two accesses per cycle.

The K8 also has multiple-level caches. There are second-level instruction and data TLBs, which store only PTEs mapping 4&nbsp;KiB. Both instruction and data caches, and the various TLBs, can fill from the large '''unified''' L2 cache. This cache is exclusive to both the L1 instruction and data caches, which means that any 8-byte line can only be in one of the L1 instruction cache, the L1 data cache, or the L2 cache. It is, however, possible for a line in the data cache to have a PTE which is also in one of the TLBs—the operating system is responsible for keeping the TLBs coherent by flushing portions of them when the page tables in memory are updated.

The K8 also caches information that is never stored in memory—prediction information. These caches are not shown in the above diagram. As is usual for this class of CPU, the K8 has fairly complex
[[branch prediction]], with tables that help predict whether branches are taken and other tables which predict the targets of branches and jumps. Some of this information is associated with instructions, in both the level 1 instruction cache and the unified secondary cache.

The K8 uses an interesting trick to store prediction information with instructions in the secondary cache. Lines in the secondary cache are protected from accidental data corruption (e.g. by an [[alpha particle]] strike) by either [[Error-correcting code|ECC]] or [[parity (telecommunication)|parity]], depending on whether those lines were evicted from the data or instruction primary caches. Since the parity code takes fewer bits than the ECC code, lines from the instruction cache have a few spare bits. These bits are used to cache branch prediction information associated with those instructions. The net result is that the branch predictor has a larger effective history table, and so has better accuracy.