Editing C dynamic memory allocation (section)

==Implementations==
The implementation of memory management depends greatly upon operating system and architecture.  Some operating systems supply an allocator for malloc, while others supply functions to control certain regions of data. The same dynamic memory allocator is often used to implement both <code>malloc</code> and the operator <code>new</code> in [[C++]].<ref>{{Cite book |title=Modern C++ Design: Generic Programming and Design Patterns Applied |first=Andrei |last=Alexandrescu |publisher=Addison-Wesley |year=2001 |page=78}}</ref>

===Heap-based===
{{see also|sbrk}}
Implementation of legacy allocators was commonly done using the [[Data_segment#Heap|heap segment]].  The allocator would usually expand and contract the heap to fulfill allocation requests.

The heap method suffers from a few inherent flaws:

* A linear allocator can only shrink if the last allocation is released. Even if largely unused, the heap can get "stuck" at a very large size because of a small but long-lived allocation at its tip which could waste any amount of address space, although some allocators on some systems may be able to release entirely empty intermediate pages to the OS.
* A linear allocator is sensitive to [[fragmentation (computer)|fragmentation]]. A good allocator will attempt to track and reuse free slots through the entire heap, but as allocation sizes and lifetimes get mixed it can be difficult and expensive to find or coalesce free segments large enough to hold new allocation requests.
* A linear allocator has extremely poor concurrency characteristics, as the heap segment is per-process every thread has to synchronise on allocation, and concurrent allocations from threads which may have very different work loads amplifies the previous two issues.

{{anchor|dlmalloc}}

===dlmalloc and ptmalloc{{anchor|dlmalloc}}===
[[Doug Lea]] has developed the [[public domain]] dlmalloc ("Doug Lea's Malloc") as a general-purpose allocator, starting in 1987. The [[GNU C library]] (glibc) is derived from Wolfram Gloger's ptmalloc ("pthreads malloc"), a fork of dlmalloc with threading-related improvements.<ref>{{cite web|url=http://malloc.de/en/|title=Wolfram Gloger's malloc homepage|website=malloc.de|access-date=1 April 2018}}</ref><ref name="phrack-57-8">{{cite journal|last=Kaempf |first=Michel |year=2001 |title=Vudo malloc tricks |journal=[[Phrack]] |issue=57 |page=8 |url=http://phrack.org/issues/57/8.html |access-date=29 April 2009 |url-status=live |archive-url=https://web.archive.org/web/20090122071923/http://www.phrack.org/issues.html?issue=57&id=8&mode=txt |archive-date=22 January 2009 }}</ref><ref>{{cite web |title=Glibc: Malloc Internals |url=https://sourceware.org/glibc/wiki/MallocInternals |website=sourceware.org Trac |access-date=1 December 2019}}</ref> As of November 2023, the latest version of dlmalloc is version 2.8.6 from August 2012.<ref name="dlmalloc">{{cite web |last1=Lee |first1=Doug |title=A Memory Allocator |url=http://gee.cs.oswego.edu/dl/html/malloc.html |access-date=1 December 2019}} [http://gee.cs.oswego.edu/pub/misc/malloc.c HTTP for Source Code]</ref>

dlmalloc is a boundary tag allocator. Memory on the [[heap memory|heap]] is allocated as "chunks", an 8-byte [[data structure alignment|aligned]] [[data structure]] which contains a header, and usable memory. Allocated memory contains an 8- or 16-byte overhead for the size of the chunk and usage flags (similar to a [[dope vector]]). Unallocated chunks also store pointers to other free chunks in the usable space area, making the minimum chunk size 16 bytes on 32-bit systems and 24/32 (depends on alignment) bytes on 64-bit systems.<ref name="phrack-57-8" /><ref name="dlmalloc"/>{{rp|at=2.8.6, Minimum allocated size}}

Unallocated memory is grouped into "[[bin (computational geometry)|bins]]" of similar sizes, implemented by using a double-linked list of chunks (with pointers stored in the unallocated space inside the chunk). Bins are sorted by size into three classes:<ref name="phrack-57-8" /><ref name="dlmalloc"/>{{rp|at=Overlaid data structures}}

* For requests below 256 bytes (a "smallbin" request), a simple two power best fit allocator is used. If there are no free blocks in that bin, a block from the next highest bin is split in two.
* For requests of 256 bytes or above but below the [[mmap]] threshold, dlmalloc since v2.8.0 use [[Trie#Bitwise tries|an in-place ''bitwise trie'' algorithm]] ("treebin"). If there is no free space left to satisfy the request, dlmalloc tries to increase the size of the heap, usually via the [[sbrk|brk]] system call. This feature was introduced way after ptmalloc was created (from v2.7.x), and as a result is not a part of glibc, which inherits the old best-fit allocator.
* For requests above the mmap threshold (a "largebin" request), the memory is always allocated using the [[mmap]] system call. The threshold is usually 128&nbsp;KB.<ref name="glibc-env">{{cite web |url=https://www.gnu.org/software/libc/manual/html_node/Malloc-Tunable-Parameters.html |title=Malloc Tunable Parameters |publisher=[[GNU]] |access-date=2 May 2009 }}</ref> The mmap method averts problems with huge buffers trapping a small allocation at the end after their expiration, but always allocates an entire [[page (computer memory)|page]] of memory, which on many architectures is 4096 bytes in size.<ref>{{cite web |last=Sanderson |first=Bruce |url=http://support.microsoft.com/kb/555223 |title=RAM, Virtual Memory, Pagefile and all that stuff |publisher=Microsoft Help and Support |date=12 December 2004 }}</ref>

Game developer Adrian Stone argues that {{code|dlmalloc}}, as a boundary-tag allocator, is unfriendly for console systems that have virtual memory but do not have [[demand paging]]. This is because its pool-shrinking and growing callbacks ({{code|sysmalloc}}/{{code|systrim}}) cannot be used to allocate and commit individual pages of virtual memory. In the absence of demand paging, fragmentation becomes a greater concern.<ref>{{cite web |last1=Stone |first1=Adrian |title=The Hole That dlmalloc Can't Fill |url=http://gameangst.com/?p=496 |website=Game Angst |access-date=1 December 2019}}</ref>

{{anchor|jemalloc}}

===FreeBSD's and NetBSD's jemalloc===
Since [[FreeBSD]] 7.0 and [[NetBSD]] 5.0, the old <code>malloc</code> implementation ({{code|phkmalloc}} by [[Poul-Henning Kamp]]) was replaced by [http://jemalloc.net/ jemalloc], written by Jason Evans.  The main reason for this was a lack of scalability of {{code|phkmalloc}} in terms of multithreading.  In order to avoid lock contention, {{code|jemalloc}} uses separate "arenas" for each [[central processing unit|CPU]].  Experiments measuring number of allocations per second in multithreading application have shown that this makes it scale linearly with the number of threads, while for both phkmalloc and dlmalloc performance was inversely proportional to the number of threads.<ref>{{cite web |url=http://people.freebsd.org/~jasone/jemalloc/bsdcan2006/jemalloc.pdf |title=A Scalable Concurrent malloc(3) Implementation for FreeBSD |first=Jason |last=Evans |date=16 April 2006 |access-date=18 March 2012 }}</ref>

{{anchor|openbsd}}

===OpenBSD's malloc===
[[OpenBSD]]'s implementation of the <code>malloc</code> function makes use of [[mmap]]. For requests greater in size than one page, the entire allocation is retrieved using <code>mmap</code>; smaller sizes are assigned from memory pools maintained by <code>malloc</code> within a number of "bucket pages", also allocated with <code>mmap</code>.<ref>{{cite web|url=http://bxr.su/OpenBSD/lib/libc/stdlib/malloc.c |title=libc/stdlib/malloc.c |website=BSD Cross Reference, OpenBSD src/lib/}}</ref>{{better source needed|date=November 2015}} On a call to <code>free</code>, memory is released and unmapped from the process [[address space]] using <code>munmap</code>. This system is designed to improve security by taking advantage of the [[address space layout randomization]] and gap page features implemented as part of OpenBSD's <code>mmap</code> [[system call]], and to detect use-after-free bugs—as a large memory allocation is completely unmapped after it is freed, further use causes a [[segmentation fault]] and termination of the program.

The GrapheneOS project initially started out by porting OpenBSD's memory allocator to Android's Bionic C Library.<ref>{{Cite web |title=History {{!}} GrapheneOS |url=https://grapheneos.org/history/ |access-date=2023-03-02 |website=grapheneos.org |language=en}}</ref>

{{anchor|hoard}}

===Hoard malloc===
{{main|Hoard memory allocator}}
Hoard is an allocator whose goal is scalable memory allocation performance. Like OpenBSD's allocator, Hoard uses <code>mmap</code> exclusively, but manages memory in chunks of 64 kilobytes called superblocks. Hoard's heap is logically divided into a single global heap and a number of per-processor heaps. In addition, there is a thread-local cache that can hold a limited number of superblocks. By allocating only from superblocks on the local per-thread or per-processor heap, and moving mostly-empty superblocks to the global heap so they can be reused by other processors, Hoard keeps fragmentation low while achieving near linear scalability with the number of threads.<ref>{{Cite conference | doi = 10.1145/378993.379232| title = Hoard: A Scalable Memory Allocator for Multithreaded Applications| work = Proceedings of the ninth international conference on Architectural support for programming languages and operating systems| conference = [[International Conference on Architectural Support for Programming Languages and Operating Systems|ASPLOS]]-IX| pages = 117–128| date=November 2000 | last1 = Berger | first1 = E. D. | last2 = McKinley | first2 = K. S. |author2-link = Kathryn S. McKinley| last3 = Blumofe | first3 = R. D. | last4 = Wilson | first4 = P. R. | isbn = 1-58113-317-0| url = http://www.cs.umass.edu/~emery/pubs/berger-asplos2000.pdf| citeseerx = 10.1.1.1.4174}}</ref>

===mimalloc===
{{main|mimalloc}}
An [[open source|open-source]] compact general-purpose [[memory allocator]] from [[Microsoft Research]] with focus on performance.<ref>{{Cite web |url=https://slashdot.org/firehose.pl?op=view&id=110928452 |title=Microsoft releases optimized malloc() as open source - Slashdot |access-date=2020-07-04  |archive-date=2023-06-14  |archive-url=https://web.archive.org/web/20230614154744/https://slashdot.org/firehose.pl?op=view&id=110928452 |url-status=dead }}</ref> The library is about 11,000 [[lines of code]].

{{anchor|tcmalloc}}

===Thread-caching malloc (tcmalloc)===
Every thread has a [[thread-local storage]] for small allocations. For large allocations mmap or [[sbrk]] can be used. [https://code.google.com/p/gperftools/ TCMalloc], a ''malloc'' developed by Google,<ref>[//code.google.com/p/gperftools/ TCMalloc homepage]</ref> has garbage-collection for local storage of dead threads. The TCMalloc is considered to be more than twice as fast as glibc's ptmalloc for multithreaded programs.<ref>Ghemawat, Sanjay; Menage, Paul; [http://goog-perftools.sourceforge.net/doc/tcmalloc.html ''TCMalloc : Thread-Caching Malloc'']</ref><ref>{{cite web |first=Mark |last=Callaghan |url=http://mysqlha.blogspot.com/2009/01/double-sysbench-throughput-with_18.html |title=High Availability MySQL: Double sysbench throughput with TCMalloc |work=Mysqlha.blogspot.com |date=18 January 2009 |access-date=18 September 2011 }}</ref>

===In-kernel===
Operating system [[kernel (operating system)|kernels]] need to allocate memory just as application programs do. The implementation of <code>malloc</code> within a kernel often differs significantly from the implementations used by C libraries, however. For example, memory buffers might need to conform to special restrictions imposed by [[direct memory access|DMA]], or the memory allocation function might be called from interrupt context.<ref>{{cite web |url=http://people.netfilter.org/rusty/unreliable-guides/kernel-hacking/routines-kmalloc.html |title=kmalloc()/kfree() include/linux/slab.h |work=People.netfilter.org |access-date=18 September 2011 }}</ref> This necessitates a <code>malloc</code> implementation tightly integrated with the [[virtual memory]] subsystem of the operating system kernel.