Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
X86 assembly language
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Examples== {{original research|date=March 2013}} The following examples use the so-called ''Intel-syntax flavor ''as used by the assemblers Microsoft MASM, NASM and many others. (Note: There is also an alternative ''AT&T-syntax flavor'' where the order of source and destination operands are swapped, among many other differences.)<ref>{{cite web| url=https://stackoverflow.com/questions/8549427/nasm-intel-versus-att-syntax-what-are-the-advantages| title=NASM (Intel) versus AT&T Syntax: what are the advantages?| date=18 December 2011| author=Peter Cordes| website=[[Stack Overflow]]}}</ref> ==="Hello world!" program for MS-DOS in MASM-style assembly=== Using the software [[DOS API#Interrupt vectors used by DOS|interrupt 21h]] instruction to call the MS-DOS operating system for output to the display – other samples use [[libc]]'s C printf() routine to write to [[stdout]]. Note that the first example, is a 30-year-old example using 16-bit mode as on an Intel 8086. The second example is Intel 386 code in 32-bit mode. Modern code will be in 64-bit mode.<ref>{{cite web |url=https://www.daniweb.com/programming/software-development/threads/85917/i-just-started-assembly# |title=I just started Assembly |year=2008 |website=daniweb.com}}</ref> <syntaxhighlight lang="nasm"> .model small .stack 100h .data msg db 'Hello world!$' .code start: mov ax, @DATA ; Initializes Data segment mov ds, ax mov ah, 09h ; Sets 8-bit register ‘ah’, the high byte of register ax, to 9, to ; select a sub-function number of an MS-DOS routine called below ; via the software interrupt int 21h to display a message lea dx, msg ; Takes the address of msg, stores the address in 16-bit register dx int 21h ; Various MS-DOS routines are callable by the software interrupt 21h ; Our required sub-function was set in register ah above mov ax, 4C00h ; Sets register ax to the sub-function number for MS-DOS’s software ; interrupt int 21h for the service ‘terminate program’. int 21h ; Calling this MS-DOS service never returns, as it ends the program. end start </syntaxhighlight> ==="Hello world!" program for Windows in MASM and NASM style assembly=== {|class=wikitable |- !! MASM !! NASM !! Description |- | <syntaxhighlight lang="nasm"> ; requires /coff switch on 6.15 and earlier versions .386 .model small,c .stack 1000h </syntaxhighlight> | <syntaxhighlight lang="nasm"> ; Image base = 0x00400000 %define RVA(x) (x-0x00400000) </syntaxhighlight> |Permeable. MASM requires defining the address model and stack size. |- |<syntaxhighlight lang="nasm"> .data msg db "Hello world!",0 </syntaxhighlight> |<syntaxhighlight lang="nasm"> section .data msg db "Hello world!" </syntaxhighlight> | Data section. We use the db (define byte) pseudo-op to define a string. |- |<syntaxhighlight lang="nasm"> .code includelib libcmt.lib includelib libvcruntime.lib includelib libucrt.lib includelib legacy_stdio_definitions.lib extrn printf:near extrn exit:near public main main proc push offset msg call printf push 0 call exit main endp end </syntaxhighlight> | <syntaxhighlight lang="nasm"> section .text push dword msg call dword [printf] push byte +0 call dword [exit] ret section .idata dd RVA(msvcrt_LookupTable) dd -1 dd 0 dd RVA(msvcrt_string) dd RVA(msvcrt_imports) times 5 dd 0 ; ends the descriptor table msvcrt_string dd "msvcrt.dll", 0 msvcrt_LookupTable: dd RVA(msvcrt_printf) dd RVA(msvcrt_exit) dd 0 msvcrt_imports: printf dd RVA(msvcrt_printf) exit dd RVA(msvcrt_exit) dd 0 msvcrt_printf: dw 1 dw "printf", 0 msvcrt_exit: dw 2 dw "exit", 0 dd 0 </syntaxhighlight> |The code (.text section) and the import table. In NASM the import table is manually constructed, while in the MASM example directives are used to simplify the process. |} ==="Hello world!" program for Linux in AT&T and NASM assembly=== {|class=wikitable ! width=30%|AT&T (GNU as) !! width=30%|Intel (NASM) !! Description |- |<syntaxhighlight lang="gas"> .data </syntaxhighlight> |<syntaxhighlight lang="nasm"> section .data </syntaxhighlight> |Like in the Windows example, <code>.data</code> is the section for initialized data. |- |<syntaxhighlight lang="gas"> str: .ascii "Hello, world!\n" </syntaxhighlight> |<syntaxhighlight lang="nasm"> str: db 'Hello world!', 0Ah </syntaxhighlight> |Define a string of text containing "Hello, world!" and then a new line (<code>\n</code>, which is <code>0x0A</code>). Bind the label "str" to the address of the defined string. |- |<syntaxhighlight lang="gas"> str_len = . - str </syntaxhighlight> |<syntaxhighlight lang="nasm"> str_len: equ $ - str </syntaxhighlight> |Calculate the length of <code>str</code>. <code>.</code> means "here" in gas and <code>$</code> means the same in nasm. By subtracting "str" from "here", one gets the length of the previously-defined string. |- |<syntaxhighlight lang="gas"> .text </syntaxhighlight> |<syntaxhighlight lang=nasm> section .text </syntaxhighlight> |Like in the Windows example, <code>.text</code> is the section for program code. |- |<syntaxhighlight lang="gas">.globl _start</syntaxhighlight> |<syntaxhighlight lang=nasm>global _start</syntaxhighlight> |export the _start function to the global scope for it to be "seen" by the linker |- |<syntaxhighlight lang="gas">_start:</syntaxhighlight> |<syntaxhighlight lang="nasm">_start:</syntaxhighlight> |Define a label called <code>_start</code>, to which we will write our subroutine. The name <code>_start</code>, by Linux convention, defines the entry point. |- |<syntaxhighlight lang="gas"> movl $4, %eax movl $1, %ebx movl $str, %ecx movl $str_len, %edx </syntaxhighlight> |<syntaxhighlight lang="nasm"> mov eax, 4 mov ebx, 1 mov ecx, str mov edx, str_len </syntaxhighlight> |Prepare a system call. EAX=4 requests the "sys_write" call on Linux x86. EBX=1 means "stdout" for sys_write. ECX holds the string to write, and EDX holds the number of bytes to write. The is equivalent to the libc-wrapped version <code>write(1, str, str_len)</code>. |- |<syntaxhighlight lang="gas"> int $0x80</syntaxhighlight> |<syntaxhighlight lang="nasm"> int 80h</syntaxhighlight> |On x86, the system interrupt "80h" is used for invoking a system call according to the values of eax, ebx, ecx, and edx. |- |<syntaxhighlight lang="gas"> movl $1, %eax movl $0, %ebx int $0x80 </syntaxhighlight> |<syntaxhighlight lang="nasm"> mov eax, 1 mov ebx, 0 int 80h </syntaxhighlight> |Load another system call, then call it with INT 80h: EAX=1 is sys_exit, and EBX for sys_exit holds the return value. A return value of 0 means a normal exit. In C syntax, <code>_exit(0);</code>. |} Note for NASM: <pre> ; This program runs in 32-bit protected mode. ; build: nasm -f elf -F stabs name.asm ; link: ld -o name name.o ; ; In 64-bit long mode you can use 64-bit registers (e.g. rax instead of eax, rbx instead of ebx, etc.) ; Also change "-f elf " for "-f elf64" in build command. ; For 64-bit long mode, "lea rcx, str" would be the address of the message, note 64-bit register rcx. </pre> ==="Hello world!" program for Linux in NASM style assembly using the C standard library=== {{see also|Libc}} <syntaxhighlight lang="nasm"> ; ; This program runs in 32-bit protected mode. ; gcc links the standard-C library by default ; build: nasm -f elf -F stabs name.asm ; link: gcc -o name name.o ; ; In 64-bit long mode you can use 64-bit registers (e.g. rax instead of eax, rbx instead of ebx, etc..) ; Also change "-f elf " for "-f elf64" in build command. ; global main ; ‘main’ must be defined, as it being compiled ; against the C Standard Library extern printf ; declares the use of external symbol, as printf ; printf is declared in a different object-module. ; The linker resolves this symbol later. segment .data ; section for initialized data string db 'Hello world!', 0Ah, 0 ; message string ending with a newline char (10 ; decimal) and the zero byte ‘NUL’ terminator ; ‘string’ now refers to the starting address ; at which 'Hello, World' is stored. segment .text main: push string ; Push the address of ‘string’ onto the stack. ; This reduces esp by 4 bytes before storing ; the 4-byte address ‘string’ into memory at ; the new esp, the new bottom of the stack. ; This will be an argument to printf() call printf ; calls the C printf() function. add esp, 4 ; Increases the stack-pointer by 4 to put it back ; to where it was before the ‘push’, which ; reduced it by 4 bytes. ret ; Return to our caller. </syntaxhighlight> Because the C runtime is used, we define a main() function as the C runtime expects. Instead of calling exit, we simply return from the main function to have the runtime perform the clean-up. ==="Hello world!" program for 64-bit mode Linux in NASM style assembly=== This example is in modern 64-bit mode. <syntaxhighlight lang="nasm"> ; build: nasm -f elf64 -F dwarf hello.asm ; link: ld -o hello hello.o DEFAULT REL ; use RIP-relative addressing modes by default, so [foo] = [rel foo] SECTION .rodata ; read-only data should go in the .rodata section on GNU/Linux, like .rdata on Windows Hello: db "Hello world!", 10 ; Ending with a byte 10 = newline (ASCII LF) len_Hello: equ $-Hello ; Get NASM to calculate the length as an assembly-time constant ; the ‘$’ symbol means ‘here’. write() takes a length so that ; a zero-terminated C-style string isn't needed. ; It would be for C puts() SECTION .text global _start _start: mov eax, 1 ; __NR_write syscall number from Linux asm/unistd_64.h (x86_64) mov edi, 1 ; int fd = STDOUT_FILENO lea rsi, [rel Hello] ; x86-64 uses RIP-relative LEA to put static addresses into regs mov rdx, len_Hello ; size_t count = len_Hello syscall ; write(1, Hello, len_Hello); call into the kernel to actually do the system call ;; return value in RAX. RCX and R11 are also overwritten by syscall mov eax, 60 ; __NR_exit call number (x86_64) is stored in register eax. xor edi, edi ; This zeros edi and also rdi. ; This xor-self trick is the preferred common idiom for zeroing ; a register, and is always by far the fastest method. ; When a 32-bit value is stored into eg edx, the high bits 63:32 are ; automatically zeroed too in every case. This saves you having to set ; the bits with an extra instruction, as this is a case very commonly ; needed, for an entire 64-bit register to be filled with a 32-bit value. ; This sets our routine’s exit status = 0 (exit normally) syscall ; _exit(0) </syntaxhighlight> Running it under <kbd>strace</kbd> verifies that no extra system calls are made in the process. The printf version would make many more system calls to initialize libc and do [[dynamic linking]]. But this is a static executable because we linked using ld without -pie or any shared libraries; the only instructions that run in user-space are the ones you provide. <syntaxhighlight lang="console"> $ strace ./hello > /dev/null # without a redirect, your program's stdout is mixed with strace's logging on stderr. Which is normally fine execve("./hello", ["./hello"], 0x7ffc8b0b3570 /* 51 vars */) = 0 write(1, "Hello world!\n", 13) = 13 exit(0) = ? +++ exited with 0 +++ </syntaxhighlight> ===Using the flags register=== Flags are heavily used for comparisons in the x86 architecture. When a comparison is made between two data, the CPU sets the relevant flag or flags. Following this, conditional jump instructions can be used to check the flags and branch to code that should run, e.g.: <syntaxhighlight lang="nasm"> cmp eax, ebx jne do_something ; ... do_something: ; do something here </syntaxhighlight> Aside, from compare instructions, there are a great many arithmetic and other instructions that set bits in the flags register. Other examples are the instructions sub, test and add and there are many more. Common combinations such as cmp + conditional jump are internally ‘fused’ (‘[[macro fusion]]’) into one single [[micro-instruction]] (μ-op) and are fast provided the processor can guess which way the conditional jump will go, jump vs continue. The flags register are also used in the x86 architecture to turn on and off certain features or execution modes. For example, to disable all maskable interrupts, you can use the instruction: <syntaxhighlight lang="asm"> cli </syntaxhighlight> The flags register can also be directly accessed. The low 8 bits of the flag register can be loaded into <code>ah</code> using the <code>lahf</code> instruction. The entire flags register can also be moved on and off the stack using the instructions <code>pushfd/pushfq</code>, <code>popfd/popfq</code>, <code>int</code> (including <code>into</code>) and <code>iret</code>. The x87 floating point maths subsystem also has its own independent ‘flags’-type register the fp status word. In the 1990s it was an awkward and slow procedure to access the flag bits in this register, but on modern processors there are ‘compare two floating point values’ instructions that can be used with the normal conditional jump/branch instructions directly without any intervening steps. ===Using the instruction pointer register=== The [[instruction pointer]] is called <code>ip</code> in 16-bit mode, <code>eip</code> in 32-bit mode, and <code>rip</code> in 64-bit mode. The instruction pointer register points to the address of the next instruction that the processor will attempt to execute. It cannot be directly accessed in 16-bit or 32-bit mode, but a sequence like the following can be written to put the address of <code>next_line</code> into <code>eax</code> (32-bit code): <syntaxhighlight lang="asm"> call next_line next_line: pop eax </syntaxhighlight> Writing to the instruction pointer is simple — a <code>jmp</code> instruction stores the given target address into the instruction pointer to, so, for example, a sequence like the following will put the contents of <code>rax</code> into <code>rip</code> (64-bit code): <syntaxhighlight lang="asm"> jmp rax </syntaxhighlight> In 64-bit mode, instructions can reference data relative to the instruction pointer, so there is less need to copy the value of the instruction pointer to another register.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)