Editing Fat binary (section)

==Similar concepts==
The following approaches are similar to fat binaries in that multiple versions of machine code of the same purpose are provided in the same file.

=== Heterogeneous computing===
{{anchor|EXOCHI}}Since 2007, some specialized compilers for [[heterogeneous computing|heterogeneous platform]]s produce code files for [[parallel computing|parallel execution]] on multiple types of processors, i.e. the CHI ([[C (programming language)|C]] for Heterogeneous Integration) compiler from the [[Intel]] EXOCHI (Exoskeleton Sequencer) development suite extends the [[OpenMP]] [[directive (programming)|pragma]] concept for [[multithreading (software)|multithreading]] to produce fat binaries containing code sections for different [[instruction set architecture]]s (ISAs) from which the [[runtime system|runtime]] loader can dynamically initiate the parallel execution on multiple available CPU and [[GPU]] cores in a heterogeneous system environment.<ref name="Collins-Chinya-Jiang-Tian-Girkar-Yang-Lueh-Wang_2007"/><ref name="Wang-Collins-Chinya-Jiang-Tian-Girkar-Pearce-Lueh-Yakoushkin-Wang_2007"/>

{{anchor|cubin|SASS|GPGPU-Sim}}Introduced in 2006, [[Nvidia]]'s parallel computing platform [[CUDA]] (Compute Unified Device Architecture) is a software to enable general-purpose computing on GPUs ([[GPGPU]]). Its [[LLVM]]-based compiler [[NVCC (compiler)|NVCC]] can create [[Executable and Linkable Format|ELF]]-based fat binaries containing so called [[Parallel Thread Execution|PTX]] virtual [[assembly language|assembly]] (as text) which the CUDA runtime driver can later [[just-in-time compilation|just-in-time compile]] into some SASS (Streaming Assembler<!-- sometimes incorrectly given as: Shader Assembly -->) binary executable code for the actually present target GPU. The executables can also include so called ''CUDA binaries'' (aka ''cubin'' files) containing dedicated executable code sections for one or more specific GPU architectures from which the CUDA runtime can choose from at load-time.<ref name="Nvidia_2004"/><ref name="Harris_2014"/><ref name="CUDA_2014"/><ref name="CUDA_2016"/><ref name="CUDA_2022"/><ref name="Braun-Fröning_2019"/> Fat binaries are also supported by {{ill|GPGPU-Sim|de}}, a GPU [[microarchitecture simulation|simulator]] introduced in 2007 as well.<ref name="Fung-Sham-Yuan-Aamodt_2007"/><ref name="Bakhoda-Yuan-Fung-Wong-Aamodt_2009"/>

{{anchor|Multi2Sim}}Multi2Sim (M2S)<!-- v1 (2007) for MIPS, v2 (2008) for x86, v3 (2011) x86+Evergreen, v4 (2012) x86 + MIPS-32 + ARM, Evergreen, v5 -->, an [[OpenCL]] heterogeneous system simulator framework (originally only for either [[MIPS architecture|MIPS]] or x86 CPUs, but later extended to also support [[ARM architecture|ARM]] CPUs and GPUs like the [[Advanced Micro Devices|AMD]]/[[ATI Technologies|ATI]] [[AMD Evergreen|Evergreen]] & [[AMD Southern Islands|Southern Islands]] as well as [[Nvidia Fermi]] & [[Nvidia Kepler|Kepler]] families)<ref name="Multi2Sim_2013"/> supports ELF-based fat binaries as well.<ref name="Ubal-Jang_Mistry_Schaa_Kaeli_2012"/><ref name="Multi2Sim_2013"/>

===Fat objects===
[[GNU Compiler Collection]] (GCC) and LLVM do not have a fat binary format, but they do have fat ''object files'' for [[link-time optimization]] (LTO). Since LTO involves delaying the compilation to link-time, the [[object file]]s must store the [[intermediate representation]] (IR), but on the other hand machine code may need to be stored too (for speed or compatibility). An LTO object containing both IR and machine code is known as a ''fat object''.<ref name="LTO"/>

===Function multi-versioning===
Even in a program or [[library (computing)|library]] intended for the same [[instruction set architecture]], a programmer may wish to make use of some newer instruction set extensions while keeping compatibility with an older CPU. This can be achieved with ''function multi-versioning'' (FMV): versions of the same function are written into the program, and a piece of code decides which one to use by detecting the CPU's capabilities (such as through [[CPUID]]). [[Intel C++ Compiler]], GCC, and LLVM all have the ability to automatically generate multi-versioned functions.<ref name="Wennborg_2018"/> This is a form of [[dynamic dispatch]] without any semantic effects.

Many math libraries feature hand-written assembly routines that are automatically chosen according to CPU capability. Examples include [[glibc]], [[Intel MKL]], and [[OpenBLAS]]. In addition, the library loader in glibc supports loading from alternative paths for specific CPU features.<ref name="Bahena_2018"/>

A similar, but byte-level granular approach originally devised by Matthias R. Paul and Axel C. Frinke is to let a small self-discarding, [[instruction relaxation|relaxing]] and [[relocating loader]] embedded into the executable file alongside any number of alternative binary code snippets conditionally build a size- or speed-optimized runtime image of a program or driver necessary to perform (or not perform) a particular function in a particular target environment at [[load-time]] through a form of [[dynamic dead code elimination]] (DDCE).<ref name="Paul_1997_FreeKEYB"/><ref name="Paul_2002_DDCE2"/><ref name="Paul_2001_FK"/><ref name="Paul_2001_DDCE"/>