Editing X86 assembly language (section)

=== {{Anchor|Data manipulation instructions}}Memory instructions ===
The x86 processor also includes complex addressing modes for addressing memory with an immediate offset, a register, a register with an offset, a scaled register with or without an offset, and a register with an optional offset and another scaled register. So for example, one can encode <code>mov eax, [Table + ebx + esi*4]</code> as a single instruction which loads 32 bits of data from the address computed as <code>(Table + ebx + esi * 4)</code> offset from the <code>ds</code> selector, and stores it to the <code>eax</code> register. In general x86 processors can load and use memory matched to the size of any register it is operating on. (The SIMD instructions also include half-load instructions.)

Most 2-operand x86 instructions, including integer ALU instructions,
use a standard "[[addressing mode]] byte"<ref>
Curtis Meadow.
[http://aturing.umcs.maine.edu/~meadow/courses/cos335/8086-instformat.pdf "Encoding of 8086 Instructions"].
</ref>
often called the [[modR/M|MOD-REG-R/M byte]].<ref>
Igor Kholodov.
[http://www.c-jump.com/CIS77/CPU/x86/X77_0060_mod_reg_r_m_byte.htm "6. Encoding x86 Instruction Operands, MOD-REG-R/M Byte"].
</ref><ref>

[http://www.cs.loyola.edu/~binkley/371/Encoding_Real_x86_Instructions.html "Encoding x86 Instructions"].
</ref><ref>
Michael Abrash.
"Zen of Assembly Language: Volume I, Knowledge".
"Chapter 7: Memory Addressing".
Section [http://www.jagregory.com/abrash-zen-of-asm/ "mod-reg-rm Addressing"].
</ref>
Many 32-bit x86 instructions also have a [[ModR/M|SIB addressing mode byte]] that follows the MOD-REG-R/M byte.<ref>
Intel 80386 Reference Programmer's Manual.
[https://www.scs.stanford.edu/05au-cs240c/lab/i386/s17_02.htm "17.2.1 ModR/M and SIB Bytes"]
</ref><ref>
[https://wiki.osdev.org/X86-64_Instruction_Encoding#ModR.2FM_and_SIB_bytes "X86-64 Instruction Encoding: ModR/M and SIB bytes"]
</ref><ref>
[http://uglyduck.vajn.icu/PDF/Intel/OPx86.pdf "Figure 2-1. Intel 64 and IA-32 Architectures Instruction Format"].
</ref><ref>
[https://paul.bone.id.au/blog/2018/09/26/more-x86-addressing/ "x86 Addressing Under the Hood"].
</ref><ref name="mccamant" >
Stephen McCamant.
[https://www-users.cse.umn.edu/~smccaman/courses/8980/spring2020/lectures/01-x86-intro-arith-8up.pdf "Manual and Automated Binary Reverse Engineering"].
</ref>

In principle, because the instruction opcode is separate from the addressing mode byte, those instructions are [[orthogonal instruction set|orthogonal]] because any of those opcodes can be mixed-and-matched with any addressing mode.
However, the x86 instruction set is generally considered non-orthogonal because many other opcodes have some fixed addressing mode (they have no addressing mode byte), and every register is special.<ref name="mccamant" /><ref>
[https://locklessinc.com/articles/instruction_wishlist/ "X86 Instruction Wishlist"].
</ref>

The x86 instruction set includes string load, store, move, scan and compare instructions (<code>lods</code>, <code>stos</code>, <code>movs</code>, <code>scas</code> and <code>cmps</code>) which perform each operation to a specified size (<code>b</code> for 8-bit byte, <code>w</code> for 16-bit word, <code>d</code> for 32-bit double word) then increments/decrements (depending on DF, direction flag) the implicit address register (<code>si</code> for <code>lods</code>, <code>di</code> for <code>stos</code> and <code>scas</code>, and both for <code>movs</code> and <code>cmps</code>). For the load, store and scan operations, the implicit target/source/comparison register is in the <code>al</code>, <code>ax</code> or <code>eax</code> register (depending on size). The implicit segment registers used are <code>ds</code> for <code>si</code> and <code>es</code> for <code>di</code>. The <code>cx</code> or <code>ecx</code> register is used as a decrementing counter, and the operation stops when the counter reaches zero or (for scans and comparisons) when inequality is detected. Unfortunately, over the years the performance of some of these instructions became neglected and in certain cases it is now possible to get faster results by writing out the algorithms yourself. Intel and AMD have refreshed some of the instructions though, and a few now have very respectable performance, so it is recommended that the programmer should read recent respected benchmark articles before choosing to use a particular instruction from this group.

The stack is a region of memory and an associated ‘stack pointer’, which points to the bottom of the stack. The stack pointer is decremented when items are added (‘push’) and incremented after things are removed (‘pop’). In 16-bit mode, this implicit stack pointer is addressed as SS:[SP], in 32-bit mode it is SS:[ESP], and in 64-bit mode it is [RSP]. The stack pointer actually points to the last value that was stored, under the assumption that its size will match the operating mode of the processor (i.e., 16, 32, or 64 bits) to match the default width of the <code>push</code>/<code>pop</code>/<code>call</code>/<code>ret</code> instructions. Also included are the instructions <code>enter</code> and <code>leave</code> which reserve and remove data from the top of the stack while setting up a stack frame pointer in <code>bp</code>/<code>ebp</code>/<code>rbp</code>. However, direct setting, or addition and subtraction to the <code>sp</code>/<code>esp</code>/<code>rsp</code> register is also supported, so the <code>enter</code>/<code>leave</code> instructions are generally unnecessary.

This code is the beginning of a function typical for a high-level language when compiler optimisation is turned off for ease of debugging:
<syntaxhighlight lang="nasm">
 push    rbp       ; Save the calling function’s stack frame pointer (rbp register)
 mov     rbp, rsp  ; Make a new stack frame below our caller’s stack
 sub     rsp, 32   ; Reserve 32 bytes of stack space for this function’s local variables.
                   ; Local variables will be below rbp and can be referenced relative to rbp,
                   ; again best for ease of debugging, but for best performance rbp will not
                   ; be used at all, and local variables would be referenced relative to rsp
                   ; because, apart from the code saving, rbp then is free for other uses.
  …       …        ; However, if rbp is altered here, its value should be preserved for the caller.
 mov [rbp-8], rdx  ; Example of writing to a local variable (by its memory location) from register rdx
</syntaxhighlight>
...is functionally equivalent to just:
<syntaxhighlight lang="nasm"> enter   32, 0</syntaxhighlight>

Other instructions for manipulating the stack include <code>pushfd</code>(32-bit) / <code>pushfq</code>(64-bit) and <code>popfd/popfq</code> for storing and retrieving the EFLAGS (32-bit) / RFLAGS (64-bit) register.

Values for a SIMD load or store are assumed to be packed in adjacent positions for the SIMD register and will align them in sequential little-endian order. Some SSE load and store instructions require 16-byte alignment to function properly. The SIMD instruction sets also include "prefetch" instructions which perform the load but do not target any register, used for cache loading. The SSE instruction sets also include non-temporal store instructions which will perform stores straight to memory without performing a cache allocate if the destination is not already cached (otherwise it will behave like a regular store.)

Most generic integer and floating-point (but no SIMD) instructions can use one parameter as a complex address as the second source parameter. Integer instructions can also accept one memory parameter as a destination operand.