Editing Virtual memory (section)

==Paged virtual memory==
{{See also|Memory paging}}
{{More citations needed section|date=December 2010}}
Nearly all current implementations of virtual memory divide a [[virtual address space]] into [[Page (computer memory)|page]]s, blocks of contiguous virtual memory addresses. Pages on contemporary{{efn|IBM [[DOS/360 and successors#DOS/VS|DOS/VS]], [[OS/VS1]] and [[DOS/VS]] only supported 2&nbsp;KB pages.}} systems are usually at least 4 [[kilobyte]]s in size; systems with large virtual address ranges or amounts of real memory generally use larger page sizes.<ref>{{cite book|last1=Quintero|first1=Dino |display-authors=et al.|title=IBM Power Systems Performance Guide: Implementing and Optimizing|date=1 May 2013|publisher=IBM Corporation|isbn=978-0738437668|page=138|url=https://books.google.com/books?id=lHTJAgAAQBAJ&pg=PA138|access-date=18 July 2017}}</ref>

===Page tables===
[[Page table]]s are used to translate the virtual addresses seen by the application into [[physical address]]es used by the [[Computer hardware|hardware]] to process instructions;<ref>{{cite book|last1=Sharma|first1=Dp|title=Foundation of Operating Systems|date=2009|publisher=Excel Books India|isbn=978-81-7446-626-6|page=62|url=https://books.google.com/books?id=AjWh-o7eICMC&pg=PA62|access-date=18 July 2017}}</ref> such hardware that handles this specific translation is often known as the [[memory management unit]]. Each entry in the page table holds a flag indicating whether the corresponding page is in real memory or not.  If it is in real memory, the page table entry will contain the real memory address at which the page is stored.  When a reference is made to a page by the hardware, if the page table entry for the page indicates that it is not currently in real memory, the hardware raises a [[page fault]] [[trap (computing)|exception]], invoking the paging supervisor component of the [[operating system]].

Systems can have, e.g., one page table for the whole system, separate page tables for each address space or process, separate page tables for each segment; similarly, systems can have, e.g., no segment table, one segment table for the whole system, separate segment tables for each address space or process, separate segment tables for each ''region'' in a tree{{efn|On [[IBM Z]]<ref>{{cite book
 |       title = z/Architecture - Principles of Operation
 |          id = SA22-7832-13
 |     edition = Fourteenth
 |        date = May 2022
 |     section = Translation Tables
 | section-url = http://publibfp.dhe.ibm.com/epubs/pdf/a227832d.pdf#page=152
 |       pages = 3-46-3-53
 |         url = http://publibfp.dhe.ibm.com/epubs/pdf/a227832d.pdf
 |   publisher = [[IBM]]
 | access-date = January 18, 2023
 }}
</ref> there is a 3-level tree of regions for each address space.}} of region tables for each address space or process. If there is only one page table, different applications [[multiprogramming|running at the same time]] use different parts of a single range of virtual addresses. If there are multiple page or segment tables, there are multiple virtual address spaces and concurrent applications with separate page tables redirect to different real addresses.

Some earlier systems with smaller real memory sizes, such as the [[SDS 940]], used ''[[Page address register|page registers]]'' instead of page tables in memory for address translation.

===Paging supervisor===
This part of the operating system creates and manages page tables and lists of free page frames. In order to ensure that there will be enough free page frames to quickly resolve page faults, the system may periodically steal allocated page frames, using a [[page replacement algorithm]], e.g., a [[least recently used]] (LRU) algorithm. Stolen page frames that have been modified are written back to auxiliary storage before they are added to the free queue. On some systems the paging supervisor is also responsible for managing translation registers that are not automatically loaded from page tables.

Typically, a page fault that cannot be resolved results in an abnormal termination of the application. However, some systems allow the application to have exception handlers for such errors. The paging supervisor may handle a page fault exception in several different ways, depending on the details:
*If the virtual address is invalid, the paging supervisor treats it as an error.
*If the page is valid and the page information is not loaded into the MMU, the page information will be stored into one of the page registers.
*If the page is uninitialized, a new page frame may be assigned and cleared.
*If there is a stolen page frame containing the desired page, that page frame will be reused.
*For a fault due to a write attempt into a read-protected page, if it is a copy-on-write page then a free page frame will be assigned and the contents of the old page copied; otherwise it is treated as an error.
*If the virtual address is a valid page in a memory-mapped file or a paging file, a free page frame will be assigned and the page read in.
In most cases, there will be an update to the page table, possibly followed by purging the Translation Lookaside Buffer (TLB), and the system restarts the instruction that causes the exception.

If the free page frame queue is empty then the paging supervisor must free a page frame using the same [[page replacement algorithm]] for page stealing.

===Pinned pages===
Operating systems have memory areas that are ''pinned'' (never swapped to secondary storage). Other terms used are ''locked'', ''fixed'', or ''wired'' pages.  For example, [[interrupt]] mechanisms rely on an array of pointers to their handlers, such as [[I/O]] completion and [[page fault]]. If the pages containing these pointers or the code that they invoke were pageable, interrupt-handling would become far more complex and time-consuming, particularly in the case of page fault interruptions. Hence, some part of the page table structures is not pageable.

Some pages may be pinned for short periods of time, others may be pinned for long periods of time, and still others may need to be permanently pinned. For example:
* The paging supervisor code and drivers for secondary storage devices on which pages reside must be permanently pinned, as otherwise paging would not even work because the necessary code would not be available.
* Timing-dependent components may be pinned to avoid variable paging delays.
* [[Data buffer]]s that are accessed directly by peripheral devices that use [[direct memory access]] or [[I/O channel]]s must reside in pinned pages while the I/O operation is in progress because such devices and the [[Bus (computing)|buses]] to which they are attached expect to find data buffers located at physical memory addresses; regardless of whether the bus has a [[IOMMU|memory management unit for I/O]], transfers cannot be stopped if a page fault occurs and then restarted when the page fault has been processed. For example, the data could come from a measurement sensor unit and lost real time data that got lost because of a page fault can not be recovered.

In IBM's operating systems for [[System/370]] and successor systems, the term is "fixed", and such pages may be long-term fixed, or may be short-term fixed, or may be unfixed (i.e., pageable). System control structures are often long-term fixed (measured in wall-clock time, i.e., time measured in seconds, rather than time measured in fractions of one second) whereas I/O buffers are usually short-term fixed (usually measured in significantly less than wall-clock time, possibly for tens of milliseconds). Indeed, the OS has a special facility for "fast fixing" these short-term fixed data buffers (fixing which is performed without resorting to a time-consuming [[Supervisor Call instruction]]).

[[Multics]] used the term "wired". [[OpenVMS]] and [[Microsoft Windows|Windows]] refer to pages temporarily made nonpageable (as for I/O buffers) as "locked", and simply "nonpageable" for those that are never pageable.  The [[Single UNIX Specification]] also uses the term "locked" in the specification for  {{code|lang=c|mlock()}}, as do the {{code|lang=c|mlock()}} [[man pages]] on many [[Unix-like]] systems.

====Virtual-real operation====
In [[OS/VS1]] and similar OSes, some parts of systems memory are managed in "virtual-real" mode, called "V=R". In this mode every virtual address corresponds to the same real address. This mode is used for [[interrupt]] mechanisms, for the paging supervisor and page tables in older systems, and for application programs using non-standard I/O management. For example, IBM's z/OS has 3 modes (virtual-virtual, virtual-real and virtual-fixed).{{citation needed|date=September 2022}}

===Thrashing===
When [[paging]] and [[Paging#Page stealing|page stealing]] are used, a problem called "[[thrashing (computer science)|thrashing]]"<ref name="Thrashing">{{cite web |title=Thrashing |url= https://public.support.unisys.com/aseries/docs/ClearPath-MCP-20.0/86000387-514/section-000023183.html |website=Unisys}}</ref> can occur, in which the computer spends an unsuitably large amount of time transferring pages to and from a backing store, hence slowing down useful work. A task's [[working set]] is the minimum set of pages that should be in memory in order for it to make useful progress. Thrashing occurs when there is insufficient memory available to store the working sets of all active programs. Adding real memory is the simplest response, but improving application design, scheduling, and memory usage can help. Another solution is to reduce the number of active tasks on the system. This reduces demand on real memory by swapping out the entire working set of one or more processes.

A system thrashing is often a result of a sudden spike in page demand from a small number of running programs. Swap-token<ref>{{Cite journal | author1=Song Jiang | author2=Xiaodong Zhang | title=Token-ordered LRU: an effective page replacement policy and its implementation in Linux systems |journal=Performance Evaluation |issn=0166-5316 |volume=60 |issue=1–4 |year=2005 |pages = 5–29 |doi=10.1016/j.peva.2004.10.002}}</ref> is a lightweight and dynamic thrashing protection mechanism. The basic idea is to set a token in the system, which is randomly given to a process that has page faults when thrashing happens. The process that has the token is given a privilege to allocate more physical memory pages to build its working set, which is expected to quickly finish its execution and to release the memory pages to other processes. A time stamp is used to handover the token one by one. The first version of swap-token was implemented in Linux 2.6.<ref name="swap-token-page">{{cite web|first=Xiaodong |last=Zhang<!--No authorship info on the page but it is under Zhang's personal site at OSU. Don't link [[Xiaodong Zhang]], that's someone else--> |publisher=Ohio State University |url=https://web.cse.ohio-state.edu/~zhang.574/swaptoken-PE-05.html|title=Swap Token effectively minimizes system thrasing effects and is adopted in OS kernels|url-status=dead|archive-url=https://web.archive.org/web/20231207203355/https://web.cse.ohio-state.edu/~zhang.574/swaptoken-PE-05.html|archive-date=2023-12-07}}</ref> The second version is called preempt swap-token and is also in Linux 2.6.<ref name="swap-token-page" /> In this updated swap-token implementation, a priority counter is set for each process to track the number of swap-out pages. The token is always given to the process with a high priority, which has a high number of swap-out pages. The length of the time stamp is not a constant but is determined by the priority: the higher the number of swap-out pages of a process, the longer the time stamp for it will be.