Editing Shellcode (section)

==Shellcode encoding==
Because most processes filter or restrict the data that can be injected, shellcode often needs to be written to allow for these restrictions. This includes making the code small, null-free or [[alphanumeric code|alphanumeric]]. Various solutions have been found to get around such restrictions, including:
* Design and implementation optimizations to decrease the size of the shellcode.
* Implementation modifications to get around limitations in the range of bytes used in the shellcode.
* [[Self-modifying code]] that modifies a number of the bytes of its own code before executing them to re-create bytes that are normally impossible to inject into the process.

Since [[intrusion detection]] can detect signatures of simple shellcodes being sent over the network, it is often encoded, made self-decrypting or [[polymorphic code|polymorphic]] to avoid detection.

===Percent encoding===
Exploits that target browsers commonly encode shellcode in a JavaScript string using [[percent-encoding]], escape sequence encoding "{{mono|\uXXXX}}" or [[Character encodings in HTML|entity encoding]].<ref>{{cite web |url=http://www.iss.net/security_center/reference/vuln/JavaScript_Large_Unescape.htm |title=JavaScript large number of unescape patterns detected |archive-url=https://web.archive.org/web/20150403203325/http://www.iss.net/security_center/reference/vuln/JavaScript_Large_Unescape.htm |archive-date=2015-04-03 |url-status=dead}}</ref> Some exploits also obfuscate the encoded shellcode string further to prevent detection by [[intrusion detection|IDS]].

For example, on the [[IA-32]] architecture, here's how two <code>[[NOP (code)|NOP]]</code> (no-operation) instructions would look, first unencoded:
 90             NOP
 90             NOP

{|class=wikitable
|+ Encoded double-NOPs:
|-
! scope=row | percent-encoding
| {{code|unescape("%u9090")}}
|-
! scope=row | unicode literal
| {{code|"\u9090"}}
|-
! scope=row | HTML/XML entity
| {{code|"&#x9090;"}} or {{code|"&#37008;"}}
|}

This instruction is used in [[NOP slide]]s.

===Null-free shellcode===
Most shellcodes are written without the use of [[Null character|null]] bytes because they are intended to be injected into a target process through [[null-terminated string]]s. When a null-terminated string is copied, it will be copied up to and including the first null but subsequent bytes of the shellcode will not be processed. When shellcode that contains nulls is injected in this way, only part of the shellcode would be injected, making it incapable of running successfully.

To produce null-free shellcode from shellcode that contains [[Null character|null]] bytes, one can substitute machine instructions that contain zeroes with instructions that have the same effect but are free of nulls. For example, on the [[IA-32]] architecture one could replace this instruction:
 B8 01000000    [[MOV (x86 instruction)|MOV]] EAX,1          // Set the register EAX to 0x00000001
which contains zeroes as part of the literal (<code>1</code> expands to <code>0x00000001</code>) with these instructions:
 33C0           [[XOR (x86 instruction)|XOR]] EAX,EAX        // Set the register EAX to 0x00000000
 40             [[INC (x86 instruction)|INC]] EAX            // Increase EAX to 0x00000001
which have the same effect but take fewer bytes to encode and are free of nulls.

==={{anchor|Alphanumeric|Multi-architecture}}Alphanumeric and printable shellcode===
An '''alphanumeric shellcode''' is a shellcode that consists of or assembles itself on execution into entirely [[alphanumeric]] [[ASCII]] or [[Unicode]] characters such as 0–9, A–Z and a–z.<ref name="Rix_2001">{{cite journal |title=Writing ia32 alphanumeric shellcodes |author=rix |volume=0x0b |issue=57 |id=#0x0f of 0x12 |journal=Phrack |publisher=Phrack Inc. |date=2001-08-11 |url=http://www.phrack.org/issues.html?issue=57&id=15#article |access-date=2022-05-26 |url-status=live |archive-url=https://web.archive.org/web/20220308045645/http://phrack.org/issues/57/15.html#article |archive-date=2022-03-08}}</ref><ref name="Obscou_2003">{{cite journal |title=Building IA32 'Unicode-Proof' Shellcodes |author=obscou |date=2003-08-13 |volume=11 |issue=61 |id=#0x0b of 0x0f |journal=Phrack |publisher=Phrack Inc. |url=http://www.phrack.org/issues.html?issue=61&id=11#article |access-date=2008-02-29 |url-status=live |archive-url=https://web.archive.org/web/20220526165740/http://phrack.org/issues/61/11.html#article |archive-date=2022-05-26}}</ref> This type of encoding was created by [[Hacker (computer security)|hacker]]s to hide working [[machine code]] inside what appears to be text. This can be useful to avoid detection of the code and to allow the code to pass through filters that scrub non-alphanumeric characters from strings (in part, such filters were a response to non-alphanumeric shellcode exploits). A similar type of encoding is called ''printable code'' and uses all [[control character|printable]] characters (0–9, A–Z, a–z, !@#%^&*() etc.). A similarly restricted variant is ''ECHOable code'' not containing any characters which are not accepted by the [[ECHO (command)|ECHO]] command. It has been shown that it is possible to create shellcode that looks like normal text in English.<ref name="Mason-Small-Monrose-MacManus_2009">{{cite conference |title=English Shellcode |author-first1=Joshua |author-last1=Mason |author-first2=Sam |author-last2=Small |author-first3=Fabian |author-last3=Monrose |author-first4=Greg |author-last4=MacManus |date=November 2009 |conference=Proceedings of the 16th ACM conference on Computer and Communications Security |location=New York, NY, USA |pages=524–533 |url=http://www.cs.jhu.edu/~sam/ccs243-mason.pdf |access-date=2010-01-10 |url-status=live |archive-url=https://web.archive.org/web/20220526164459/https://www.cs.jhu.edu/~sam/ccs243-mason.pdf |archive-date=2022-05-26}} (10 pages)</ref>
Writing alphanumeric or printable code requires good understanding of the [[instruction set architecture]] of the machine(s) on which the code is to be executed. It has been demonstrated that it is possible to write alphanumeric code that is executable on more than one machine,<ref>{{cite web |title=Multi-architecture (x86) and 64-bit alphanumeric shellcode explained |publisher=Blackhat Academy |url=http://www.blackhatlibrary.net/Alphanumeric_shellcode |url-status=dead |archive-url=https://web.archive.org/web/20120621124443/http://www.blackhatlibrary.net/Alphanumeric_shellcode |archive-date=2012-06-21}}</ref> thereby constituting [[multi-architecture executable]] code.

In certain circumstances, a target process will filter any byte from the injected shellcode that is not a [[printable character|printable]] or [[alphanumeric]] character. Under such circumstances, the range of instructions that can be used to write a shellcode becomes very limited. A solution to this problem was published by Rix in [[Phrack]] 57<ref name="Rix_2001"/> in which he showed it was possible to turn any code into alphanumeric code. A technique often used is to create self-modifying code, because this allows the code to modify its own bytes to include bytes outside of the normally allowed range, thereby expanding the range of instructions it can use. Using this trick, a self-modifying decoder can be created that initially uses only bytes in the allowed range. The main code of the shellcode is encoded, also only using bytes in the allowed range. When the output shellcode is run, the decoder can modify its own code to be able to use any instruction it requires to function properly and then continues to decode the original shellcode. After decoding the shellcode the decoder transfers control to it, so it can be executed as normal. It has been shown that it is possible to create arbitrarily complex shellcode that looks like normal text in English.<ref name="Mason-Small-Monrose-MacManus_2009"/>

===Unicode proof shellcode===
Modern programs use [[Unicode]] strings to allow internationalization of text. Often, these programs will convert incoming [[ASCII]] strings to Unicode before processing them. Unicode strings encoded in [[UTF-16]] use two bytes to encode each character (or four bytes for some special characters). When an [[ASCII]] ([[Latin-1]] in general) string is transformed into UTF-16, a zero byte is inserted after each byte in the original string. Obscou proved in [[Phrack]] 61<ref name="Obscou_2003"/> that it is possible to write shellcode that can run successfully after this transformation. Programs that can automatically encode any shellcode into alphanumeric UTF-16-proof shellcode exist, based on the same principle of a small self-modifying decoder that decodes the original shellcode.