Memory hierarchy

Diagram of the computer memory hierarchy

Template:Memory types Template:Distinguish

In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlling technologies.<ref name="toyzee" /> Memory hierarchy affects performance in computer architectural design, algorithm predictions, and lower level programming constructs involving locality of reference.

Designing for high performance requires considering the restrictions of the memory hierarchy, i.e. the size and capabilities of each component. Each of the various components can be viewed as part of a hierarchy of memories Template:Math in which each member Template:Mvar is typically smaller and faster than the next highest member Template:Math of the hierarchy. To limit waiting by higher levels, a lower level will respond by filling a buffer and then signaling for activating the transfer.

There are four major storage levels.<ref name="toyzee">Template:Cite book</ref>

InternalTemplate:Dash processor registers and cache.
MainTemplate:Dashthe system RAM and controller cards.
On-line mass storageTemplate:Dashsecondary storage.
Off-line bulk storageTemplate:Dashtertiary and off-line storage.

This is a general memory hierarchy structuring. Many other structures are useful. For example, a paging algorithm may be considered as a level for virtual memory when designing a computer architecture, and one can include a level of nearline storage between online and offline storage.

Properties of the technologies in the memory hierarchyEdit

Adding complexity slows the memory hierarchy.<ref>Write-combining</ref>
CMOx memory technology stretches the flash space in the memory hierarchy<ref>{{#invoke:citation/CS1|citation

|CitationClass=web }}</ref>

One of the main ways to increase system performance is minimising how far down the memory hierarchy one has to go to manipulate data.<ref>{{#invoke:citation/CS1|citation

|CitationClass=web }}</ref>

Latency and bandwidth are two metrics associated with caches. Neither of them is uniform, but is specific to a particular component of the memory hierarchy.<ref name=sun>Template:Cite journal</ref>
Predicting where in the memory hierarchy the data resides is difficult.<ref name=sun />
The location in the memory hierarchy dictates the time required for the prefetch to occur.<ref name=sun />

ExamplesEdit

File:Hwloc.png

Memory hierarchy of an AMD Bulldozer server

The number of levels in the memory hierarchy and the performance at each level has increased over time. The type of memory or storage components also change historically.<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> For example, the memory hierarchy of an Intel Haswell Mobile<ref>{{#invoke:citation/CS1|citation |CitationClass=web }}</ref> processor circa 2013 is:

Processor registers Template:Dashthe fastest possible access (usually 1 CPU cycle). A few thousand bytes in size.
Cache
- Level 0 (L0), micro-operations cacheTemplate:Dash6,144 bytes (6 KiBTemplate:Cn Template:Original research inline)<ref>{{#invoke:citation/CS1|citation

|CitationClass=web }}</ref> in size

- Level 1 (L1) instruction cacheTemplate:Dash128 KiBTemplate:Cn Template:Original research inline in size
- Level 1 (L1) data cacheTemplate:Dash128 KiBTemplate:Cn Template:Original research inline in size. Best access speed is around 700 GB/s.<ref name=sisd_qa_f_mem_hsw>{{#invoke:citation/CS1|citation

|CitationClass=web }}</ref>

- Level 2 (L2) instruction and data (shared)Template:Dash1 MiB Template:Cn Template:Original research inline in size. Best access speed is around 200 GB/s.<ref name=sisd_qa_f_mem_hsw />
- Level 3 (L3) shared cacheTemplate:Dash6 MiBTemplate:Cn Template:Original research inline in size. Best access speed is around 100 GB/s.<ref name=sisd_qa_f_mem_hsw />
- Level 4 (L4) shared cacheTemplate:Dash128 MiBTemplate:Cn Template:Original research inline in size. Best access speed is around 40 GB/s.<ref name=sisd_qa_f_mem_hsw />
Main memory (primary storage)Template:Dash GiB Template:Cn Template:Original research inline in size. Best access speed is around 10 GB/s.<ref name=sisd_qa_f_mem_hsw /> In the case of a NUMA machine, access times may not be uniform.
Mass storage (secondary storage)Template:Dash terabytes in size. Template:As of, best access speed is from a consumer solid state drive is about 2000 MB/s.<ref>{{#invoke:citation/CS1|citation

|CitationClass=web }}</ref>

Nearline storage (tertiary storage)Template:Dashup to exabytes in size. Template:As of, best access speed is about 160 MB/s.<ref>{{#invoke:citation/CS1|citation

|CitationClass=web }}</ref>

Offline storage

The lower levels of the hierarchyTemplate:Dashfrom mass storage downwardsTemplate:Dashare also known as tiered storage. The formal distinction between online, nearline, and offline storage is:<ref name="pearson2010">{{#invoke:citation/CS1|citation |CitationClass=web }}</ref>

Online storage is immediately available for I/O.
Nearline storage is not immediately available, but can be made online quickly without human intervention.
Offline storage is not immediately available, and requires some human intervention to bring online.

For example, always-on spinning disks are online, while spinning disks that spin down, such as massive arrays of idle disk (MAID), are nearline. Removable media such as tape cartridges that can be automatically loaded, as in a tape library, are nearline, while cartridges that must be manually loaded are offline.

Most modern CPUs are so fast that, for most program workloads, the bottleneck is the locality of reference of memory accesses and the efficiency of the caching and memory transfer between different levels of the hierarchyTemplate:Citation needed. As a result, the CPU spends much of its time idling, waiting for memory I/O to complete. This is sometimes called the space cost, as a larger memory object is more likely to overflow a small and fast level and require use of a larger, slower level. The resulting load on memory use is known as pressure (respectively register pressure, cache pressure, and (main) memory pressure). Terms for data being missing from a higher level and needing to be fetched from a lower level are, respectively: register spilling (due to register pressure: register to cache), cache miss (cache to main memory), and (hard) page fault (real main memory to virtual memory, i.e. mass storage, commonly referred to as disk regardless of the actual mass storage technology used).

Modern programming languages mainly assume two levels of memory, main (working) memory and mass storage, though in assembly language and inline assemblers in languages such as C, registers can be directly accessed. Taking optimal advantage of the memory hierarchy requires the cooperation of programmers, hardware, and compilers (as well as underlying support from the operating system):

Programmers are responsible for moving data between disk and memory through file I/O.
Hardware is responsible for moving data between memory and caches.
Optimizing compilers are responsible for generating code that, when executed, will cause the hardware to use caches and registers efficiently.

Many programmers assume one level of memory. This works fine until the application hits a performance wall. Then the memory hierarchy will be assessed during code refactoring.

ReferencesEdit

Template:Reflist

Memory hierarchy

Contents

Properties of the technologies in the memory hierarchyEdit

ExamplesEdit

See alsoEdit

ReferencesEdit