Editing Shellcode

{{Short description|Small piece of code used as a payload to exploit a software vulnerability}}
{{Redirect|Shell code|code written in a shell's command language|Shell script}}
{{Redirect|Alphanumeric executable|executable code presented in hexadecimal format|Hex file (disambiguation)}}

In [[hacker (computer security)|hacking]], a '''shellcode''' is a small piece of code used as the [[Payload (computing)|payload]] in the [[exploit (computer security)|exploitation]] of a software [[Vulnerability (computing)|vulnerability]].  It is called "shellcode" because it typically starts a [[Shell (computing)|command shell]] from which the attacker can control the compromised machine, but any piece of code that performs a similar task can be called shellcode.  Because the function of a payload is not limited to merely spawning a shell, some have suggested that the name shellcode is insufficient.<ref>{{cite book 
|title=Sockets, Shellcode, Porting, & Coding: Reverse Engineering Exploits and Tool Coding for Security Professionals 
|author-first1=James C. |author-last1=Foster |author-first2=Mike |author-last2=Price |publisher=Elsevier Science & Technology Books |date=2005-04-12 |isbn=1-59749-005-9 |url=https://books.google.com/books?id=ZNI5dvBSfZoC}}</ref> However, attempts at replacing the term have not gained wide acceptance. Shellcode is commonly written in [[machine code]].

When creating shellcode, it is generally desirable to make it both small and executable, which allows it to be used in as wide a variety of situations as possible.<ref name="anley_koziol_2007">{{Cite book |title=The shellcoder's handbook: discovering and exploiting security holes |date=2007 |publisher=Wiley |author-first1=Chris |author-last1=Anley |author-first2=Jack |author-last2=Koziol |isbn=978-0-470-19882-7 |edition=2 |location=Indianapolis, Indiana, UA |oclc=173682537}}</ref> In assembly code, the same function can be performed in a multitude of ways and there is some variety in the lengths of opcodes that can be used for this purpose; good shellcode writers can put these small opcodes to use to create more compact shellcode.<ref>{{Cite book |title=Buffer overflow attacks: detect, exploit, prevent |date=2005 |publisher=Syngress |author-last=Foster |author-first=James C. |isbn=1-59749-022-9 |location=Rockland, MA, USA |oclc=57566682}}</ref> Some have reached the smallest possible size while maintaining stability.<ref>{{Cite web |title=Tiny Execve sh - Assembly Language - Linux/x86 |url=https://github.com/geyslan/SLAE/blob/master/4th.assignment/tiny_execve_sh.asm |access-date=2021-02-01 |website=GitHub}}</ref>

== Types of shellcode ==
Shellcode can either be ''local'' or ''remote'', depending on whether it gives an attacker control over the machine it runs on (local) or over another machine through a network (remote).

=== Local ===
''Local'' shellcode is used by an attacker who has limited access to a machine but can exploit a vulnerability, for example a [[buffer overflow]], in a higher-privileged process on that machine. If successfully executed, the shellcode will provide the attacker access to the machine with the same higher privileges as the targeted process.

=== Remote ===
''Remote'' shellcode is used when an attacker wants to target a vulnerable process running on another machine on a [[local area network|local network]], [[intranet]], or a [[internet|remote network]]. If successfully executed, the shellcode can provide the attacker access to the target machine across the network. Remote shellcodes normally use standard [[Internet protocol suite|TCP/IP]] [[Stream socket|socket]] connections to allow the attacker access to the shell on the target machine. Such shellcode can be categorized based on how this connection is set up: if the shellcode establishes the connection it is called a "reverse shell", or a ''connect-back'' shellcode because the shellcode ''connects back'' to the attacker's machine. On the other hand, if the attacker establishes the connection, the shellcode is called a ''bindshell'' because the shellcode ''binds'' to a certain port on the victim's machine. There's a peculiar shellcode named ''bindshell random port'' that skips the binding part and listens on a random port made available by the [[operating system]]. Because of that, the ''[https://github.com/geyslan/SLAE/blob/master/improvements/tiny_shell_bind_tcp_random_port_x86_64.asm bindshell random port]'' became the smallest stable bindshell shellcode for [[X86-64|x86_64]] available to this date. A third, much less common type, is ''socket-reuse'' shellcode. This type of shellcode is sometimes used when an exploit establishes a connection to the vulnerable process that is not closed before the shellcode is run. The shellcode can then ''re-use'' this connection to communicate with the attacker. Socket re-using shellcode is more elaborate, since the shellcode needs to find out which connection to re-use and the machine may have many connections open.<ref>{{cite web |url=http://www.blackhatlibrary.net/Shellcode/Socket-reuse |title=Shellcode/Socket-reuse |author=BHA |date=2013-06-06 |access-date=2013-06-07}}</ref>

A [[Firewall (computer)|firewall]] can be used to detect outgoing connections made by connect-back shellcode as well as incoming connections made by bindshells. They can, therefore, offer some protection against an attacker, even if the system is vulnerable, by preventing the attacker from connecting to the shell created by the shellcode. One reason why socket re-using shellcode is sometimes used is that it does not create new connections and, therefore, is harder to detect and block.

=== Download and execute ===
''Download and execute'' is a type of remote shellcode that ''[[downloads]] and [[Execution (computers)|executes]]'' some form of malware on the target system. This type of shellcode does not spawn a shell, but rather instructs the machine to download a certain executable file off the network, save it to disk and execute it. Nowadays, it is commonly used in [[drive-by download]] attacks, where a victim visits a malicious webpage that in turn attempts to run such a download and execute shellcode in order to install software on the victim's machine. A variation of this type of shellcode downloads and [[Dynamic loading|loads]] a [[Library (computing)|library]].<ref>{{cite web |url=http://skypher.com/index.php/2010/01/11/download-and-loadlibrary-shellcode-released/ |title=Download and LoadLibrary shellcode released |author=SkyLined |date=2010-01-11 |access-date=2010-01-19 |url-status=dead |archive-url=https://web.archive.org/web/20100123014637/http://skypher.com/index.php/2010/01/11/download-and-loadlibrary-shellcode-released/ |archive-date=2010-01-23}}</ref><ref>{{cite web |url=http://code.google.com/p/w32-dl-loadlib-shellcode/ |title=Download and LoadLibrary shellcode for x86 Windows |date=2010-01-11 |access-date=2010-01-19}}</ref> Advantages of this technique are that the code can be smaller, that it does not require the shellcode to spawn a new process on the target system, and that the shellcode does not need code to clean up the targeted process as this can be done by the library loaded into the process.

=== Staged ===
When the amount of data that an attacker can inject into the target process is too limited to execute useful shellcode directly, it may be possible to execute it in stages. First, a small piece of shellcode (stage 1) is executed. This code then downloads a larger piece of shellcode (stage 2) into the process's memory and executes it.

=== Egg-hunt ===
This is another form of ''staged'' shellcode, which is used if an attacker can inject a larger shellcode into the process but cannot determine where in the process it will end up. Small ''egg-hunt'' shellcode is injected into the process at a predictable location and executed. This code then searches the process's address space for the larger shellcode (the ''egg'') and executes it.<ref>{{cite web |url=http://www.hick.org/code/skape/papers/egghunt-shellcode.pdf 
|title=Safely Searching Process Virtual Address Space 
|author=Skape |publisher=nologin |date=2004-03-09 |access-date=2009-03-19}}</ref>

=== Omelette ===
This type of shellcode is similar to ''egg-hunt'' shellcode, but looks for multiple small blocks of data (''eggs'') and recombines them into one larger block (the ''omelette'') that is subsequently executed. This is used when an attacker can only inject a number of small blocks of data into the process.<ref>{{cite web |url=http://skypher.com/wiki/index.php?title=Shellcode/w32_SEH_omelet_shellcode |title=w32 SEH omelet shellcode |author=SkyLined |publisher=Skypher.com |date=2009-03-16 |access-date=2009-03-19 |url-status=dead |archive-url=https://web.archive.org/web/20090323030636/http://skypher.com/wiki/index.php?title=Shellcode%2Fw32_SEH_omelet_shellcode |archive-date=2009-03-23}}</ref>

==Shellcode execution strategy==
An exploit will commonly inject a shellcode into the target process before or at the same time as it exploits a vulnerability to gain control over the [[program counter]]. The program counter is adjusted to point to the shellcode, after which it gets executed and performs its task. Injecting the shellcode is often done by storing the shellcode in data sent over the network to the vulnerable process, by supplying it in a file that is read by the vulnerable process or through the command line or environment in the case of local exploits.

==Shellcode encoding==
Because most processes filter or restrict the data that can be injected, shellcode often needs to be written to allow for these restrictions. This includes making the code small, null-free or [[alphanumeric code|alphanumeric]]. Various solutions have been found to get around such restrictions, including:
* Design and implementation optimizations to decrease the size of the shellcode.
* Implementation modifications to get around limitations in the range of bytes used in the shellcode.
* [[Self-modifying code]] that modifies a number of the bytes of its own code before executing them to re-create bytes that are normally impossible to inject into the process.

Since [[intrusion detection]] can detect signatures of simple shellcodes being sent over the network, it is often encoded, made self-decrypting or [[polymorphic code|polymorphic]] to avoid detection.

===Percent encoding===
Exploits that target browsers commonly encode shellcode in a JavaScript string using [[percent-encoding]], escape sequence encoding "{{mono|\uXXXX}}" or [[Character encodings in HTML|entity encoding]].<ref>{{cite web |url=http://www.iss.net/security_center/reference/vuln/JavaScript_Large_Unescape.htm |title=JavaScript large number of unescape patterns detected |archive-url=https://web.archive.org/web/20150403203325/http://www.iss.net/security_center/reference/vuln/JavaScript_Large_Unescape.htm |archive-date=2015-04-03 |url-status=dead}}</ref> Some exploits also obfuscate the encoded shellcode string further to prevent detection by [[intrusion detection|IDS]].

For example, on the [[IA-32]] architecture, here's how two <code>[[NOP (code)|NOP]]</code> (no-operation) instructions would look, first unencoded:
 90             NOP
 90             NOP

{|class=wikitable
|+ Encoded double-NOPs:
|-
! scope=row | percent-encoding
| {{code|unescape("%u9090")}}
|-
! scope=row | unicode literal
| {{code|"\u9090"}}
|-
! scope=row | HTML/XML entity
| {{code|"&#x9090;"}} or {{code|"&#37008;"}}
|}

This instruction is used in [[NOP slide]]s.

===Null-free shellcode===
Most shellcodes are written without the use of [[Null character|null]] bytes because they are intended to be injected into a target process through [[null-terminated string]]s. When a null-terminated string is copied, it will be copied up to and including the first null but subsequent bytes of the shellcode will not be processed. When shellcode that contains nulls is injected in this way, only part of the shellcode would be injected, making it incapable of running successfully.

To produce null-free shellcode from shellcode that contains [[Null character|null]] bytes, one can substitute machine instructions that contain zeroes with instructions that have the same effect but are free of nulls. For example, on the [[IA-32]] architecture one could replace this instruction:
 B8 01000000    [[MOV (x86 instruction)|MOV]] EAX,1          // Set the register EAX to 0x00000001
which contains zeroes as part of the literal (<code>1</code> expands to <code>0x00000001</code>) with these instructions:
 33C0           [[XOR (x86 instruction)|XOR]] EAX,EAX        // Set the register EAX to 0x00000000
 40             [[INC (x86 instruction)|INC]] EAX            // Increase EAX to 0x00000001
which have the same effect but take fewer bytes to encode and are free of nulls.

==={{anchor|Alphanumeric|Multi-architecture}}Alphanumeric and printable shellcode===
An '''alphanumeric shellcode''' is a shellcode that consists of or assembles itself on execution into entirely [[alphanumeric]] [[ASCII]] or [[Unicode]] characters such as 0–9, A–Z and a–z.<ref name="Rix_2001">{{cite journal |title=Writing ia32 alphanumeric shellcodes |author=rix |volume=0x0b |issue=57 |id=#0x0f of 0x12 |journal=Phrack |publisher=Phrack Inc. |date=2001-08-11 |url=http://www.phrack.org/issues.html?issue=57&id=15#article |access-date=2022-05-26 |url-status=live |archive-url=https://web.archive.org/web/20220308045645/http://phrack.org/issues/57/15.html#article |archive-date=2022-03-08}}</ref><ref name="Obscou_2003">{{cite journal |title=Building IA32 'Unicode-Proof' Shellcodes |author=obscou |date=2003-08-13 |volume=11 |issue=61 |id=#0x0b of 0x0f |journal=Phrack |publisher=Phrack Inc. |url=http://www.phrack.org/issues.html?issue=61&id=11#article |access-date=2008-02-29 |url-status=live |archive-url=https://web.archive.org/web/20220526165740/http://phrack.org/issues/61/11.html#article |archive-date=2022-05-26}}</ref> This type of encoding was created by [[Hacker (computer security)|hacker]]s to hide working [[machine code]] inside what appears to be text. This can be useful to avoid detection of the code and to allow the code to pass through filters that scrub non-alphanumeric characters from strings (in part, such filters were a response to non-alphanumeric shellcode exploits). A similar type of encoding is called ''printable code'' and uses all [[control character|printable]] characters (0–9, A–Z, a–z, !@#%^&*() etc.). A similarly restricted variant is ''ECHOable code'' not containing any characters which are not accepted by the [[ECHO (command)|ECHO]] command. It has been shown that it is possible to create shellcode that looks like normal text in English.<ref name="Mason-Small-Monrose-MacManus_2009">{{cite conference |title=English Shellcode |author-first1=Joshua |author-last1=Mason |author-first2=Sam |author-last2=Small |author-first3=Fabian |author-last3=Monrose |author-first4=Greg |author-last4=MacManus |date=November 2009 |conference=Proceedings of the 16th ACM conference on Computer and Communications Security |location=New York, NY, USA |pages=524–533 |url=http://www.cs.jhu.edu/~sam/ccs243-mason.pdf |access-date=2010-01-10 |url-status=live |archive-url=https://web.archive.org/web/20220526164459/https://www.cs.jhu.edu/~sam/ccs243-mason.pdf |archive-date=2022-05-26}} (10 pages)</ref>
Writing alphanumeric or printable code requires good understanding of the [[instruction set architecture]] of the machine(s) on which the code is to be executed. It has been demonstrated that it is possible to write alphanumeric code that is executable on more than one machine,<ref>{{cite web |title=Multi-architecture (x86) and 64-bit alphanumeric shellcode explained |publisher=Blackhat Academy |url=http://www.blackhatlibrary.net/Alphanumeric_shellcode |url-status=dead |archive-url=https://web.archive.org/web/20120621124443/http://www.blackhatlibrary.net/Alphanumeric_shellcode |archive-date=2012-06-21}}</ref> thereby constituting [[multi-architecture executable]] code.

In certain circumstances, a target process will filter any byte from the injected shellcode that is not a [[printable character|printable]] or [[alphanumeric]] character. Under such circumstances, the range of instructions that can be used to write a shellcode becomes very limited. A solution to this problem was published by Rix in [[Phrack]] 57<ref name="Rix_2001"/> in which he showed it was possible to turn any code into alphanumeric code. A technique often used is to create self-modifying code, because this allows the code to modify its own bytes to include bytes outside of the normally allowed range, thereby expanding the range of instructions it can use. Using this trick, a self-modifying decoder can be created that initially uses only bytes in the allowed range. The main code of the shellcode is encoded, also only using bytes in the allowed range. When the output shellcode is run, the decoder can modify its own code to be able to use any instruction it requires to function properly and then continues to decode the original shellcode. After decoding the shellcode the decoder transfers control to it, so it can be executed as normal. It has been shown that it is possible to create arbitrarily complex shellcode that looks like normal text in English.<ref name="Mason-Small-Monrose-MacManus_2009"/>

===Unicode proof shellcode===
Modern programs use [[Unicode]] strings to allow internationalization of text. Often, these programs will convert incoming [[ASCII]] strings to Unicode before processing them. Unicode strings encoded in [[UTF-16]] use two bytes to encode each character (or four bytes for some special characters). When an [[ASCII]] ([[Latin-1]] in general) string is transformed into UTF-16, a zero byte is inserted after each byte in the original string. Obscou proved in [[Phrack]] 61<ref name="Obscou_2003"/> that it is possible to write shellcode that can run successfully after this transformation. Programs that can automatically encode any shellcode into alphanumeric UTF-16-proof shellcode exist, based on the same principle of a small self-modifying decoder that decodes the original shellcode.

==Platforms==
Most shellcode is written in [[machine code]] because of the low level at which the vulnerability being exploited gives an attacker access to the process. Shellcode is therefore often created to target one specific combination of [[Central processing unit|processor]], [[operating system]] and [[service pack]], called a [[Platform (computing)|platform]]. For some exploits, due to the constraints put on the shellcode by the target process, a very specific shellcode must be created. However, it is not impossible for one shellcode to work for multiple exploits, service packs, operating systems and even processors.<ref name="Eugene_2001">{{cite web |title=Architecture Spanning Shellcode |author=eugene |publisher=Phrack Inc. |work=Phrack |date=2001-08-11 |volume=0x0b |issue=57 |id=#0x0e of 0x12 |url=http://www.phrack.org/issues.html?issue=57&id=14#article |access-date=2008-02-29 |url-status=live |archive-url=https://web.archive.org/web/20211109173710/http://phrack.org/issues/57/14.html#article |archive-date=2021-11-09}}</ref><ref name="Nemo_2005">{{cite web |title=OSX - Multi arch shellcode. |author=nemo |work=[[Full disclosure (mailing list)|Full disclosure]] |date=2005-11-13 |url=https://seclists.org/fulldisclosure/2005/Nov/387 |access-date=2022-05-26 |url-status=live |archive-url=https://web.archive.org/web/20220526191616/https://seclists.org/fulldisclosure/2005/Nov/387 |archive-date=2022-05-26}}</ref><ref name="Cha-Pak-Brumley-Lipton_2010">{{cite conference |title=Platform-Independent Programs |author-first1=Sang Kil |author-last1=Cha |author-first2=Brian |author-last2=Pak |author-first3=David |author-last3=Brumley |author-link3=David Brumley |author-first4=Richard Jay |author-last4=Lipton |author-link4=Richard Jay Lipton |conference=Proceedings of the 17th ACM conference on Computer and Communications Security (CCS'10) |location=Chicago, Illinois, USA |date=2010-10-08 |orig-date=2010-10-04 |publisher=[[Carnegie Mellon University]], Pittsburgh, Pennsylvania, USA / [[Georgia Institute of Technology]], Atlanta, Georgia, USA |isbn=978-1-4503-0244-9 |doi=10.1145/1866307.1866369 |pages=547–558 |url=https://softsec.kaist.ac.kr/~sangkilc/papers/cha-ccs10.pdf |access-date=2022-05-26 |url-status=live |archive-url=https://web.archive.org/web/20220526153147/https://softsec.kaist.ac.kr/~sangkilc/papers/cha-ccs10.pdf |archive-date=2022-05-26}} [https://web.archive.org/web/20220526182333/http://users.ece.cmu.edu/~sangkilc/papers/ccs10-cha.pdf] (12 pages) (See also: [https://security.ece.cmu.edu/pip/index.html])</ref> Such versatility is commonly achieved by creating multiple versions of the shellcode that target the various platforms and creating a header that branches to the correct version for the platform the code is running on. When executed, the code behaves differently for different platforms and executes the right part of the shellcode for the platform it is running on.

==Shellcode analysis==
Shellcode cannot be executed directly. In order to analyze what a shellcode attempts to do it must be loaded into another process. One common analysis technique is to write a small C program which holds the shellcode as a byte buffer, and then use a function pointer or use inline assembler to transfer execution to it. Another technique is to use an online tool, such as shellcode_2_exe, to embed the shellcode into a pre-made executable husk which can then be analyzed in a standard debugger. Specialized shellcode analysis tools also exist, such as the iDefense sclog project which was originally released in 2005 as part of the Malcode Analyst Pack. Sclog is designed to load external shellcode files and execute them within an API logging framework. Emulation-based shellcode analysis tools also exist such as the {{Mono|sctest}} application which is part of the cross-platform libemu package. Another emulation-based shellcode analysis tool, built around the libemu library, is {{Mono|scdbg}} which includes a basic debug shell and integrated reporting features.

==See also==
* [[Alphanumeric code]]
* [[Computer security]]
* [[Buffer overflow]]
* [[Exploit (computer security)]]
* [[Heap overflow]]
* [[Metasploit Project]]
* [[Shell (computing)]]
* [[Shell shoveling]]
* [[Stack buffer overflow]]
* [[Vulnerability (computing)]]

==References==
{{reflist}}

==External links==
* [http://www.shell-storm.org/shellcode/ Shell-Storm] Database of shellcodes Multi-Platform.
* [http://www.phrack.org/issues.html?issue=49&id=14#article An introduction to buffer overflows and shellcode]
*[http://www.infosecwriters.com/text_resources/pdf/basics_of_shellcoding.pdf The Basics of Shellcoding] (PDF) An overview of [[x86]] shellcoding by [http://www.rosiello.org/ Angelo Rosiello]
* [https://web.archive.org/web/20120109070051/http://goodfellas.shellcode.com.ar/docz/bof/Writing_shellcode.html An introduction to shellcode development]
* [https://web.archive.org/web/20080302111910/http://www.metasploit.com/shellcode/ Contains x86 and non-x86 shellcode samples and an online interface for automatic shellcode generation and encoding, from the Metasploit Project]
* [https://web.archive.org/web/20060619025456/http://www.linux-secure.com/endymion/shellcodes/ a shellcode archive, sorted by Operating system].
* [https://web.archive.org/web/20061112203748/http://www.milw0rm.com/papers/11 Microsoft Windows and Linux shellcode design tutorial going from basic to advanced].
* [http://www.vividmachines.com/shellcode/shellcode.html Windows and Linux shellcode tutorial containing step by step examples].
* {{usurped|1=[https://web.archive.org/web/20210322094322/http://www.enderunix.org/docs/en/sc-en.txt Designing shellcode demystified]}}
* [http://code.google.com/p/alpha3/ ALPHA3] A shellcode encoder that can turn any shellcode into both Unicode and ASCII, uppercase and mixedcase, alphanumeric shellcode.
* [https://web.archive.org/web/20061115040739/http://www.ngssoftware.com/research/papers/WritingSmallShellcode.pdf Writing Small shellcode by Dafydd Stuttard] A whitepaper explaining how to make shellcode as small as possible by optimizing both the design and implementation.
* [http://skypher.com/wiki/index.php?title=Www.edup.tudelft.nl/~bjwever/whitepaper_shellcode.html.php Writing IA32 Restricted Instruction Set Shellcode Decoder Loops by SkyLined] {{Webarchive|url=https://web.archive.org/web/20150403114315/http://skypher.com/wiki/index.php?title=Www.edup.tudelft.nl%2F~bjwever%2Fwhitepaper_shellcode.html.php |date=2015-04-03}} A whitepaper explaining how to create shellcode when the bytes allowed in the shellcode are very restricted.
* [http://code.google.com/p/beta3/ BETA3] A tool that can encode and decode shellcode using a variety of encodings commonly used in exploits.
* [http://sandsprite.com/shellcode_2_exe.php Shellcode 2 Exe] - Online converter to embed shellcode in exe husk
* [https://github.com/dzzie/sclog Sclog] - Updated build of the iDefense sclog shellcode analysis tool (Windows)
* [https://archive.today/20130219020328/http://libemu.carnivore.it/ Libemu] - emulation based shellcode analysis library (*nix/Cygwin)
* [http://sandsprite.com/blogs/index.php?uid=7&pid=152 Scdbg] - shellcode debugger built around libemu emulation library (*nix/Windows)

{{Information security}}

[[Category:Injection exploits]]