Editing String literal (section)

== Escape sequences ==
{{main|Escape sequence}}
Escape sequences are a general technique for representing characters that are otherwise difficult to represent directly, including delimiters, nonprinting characters (such as backspaces), newlines, and whitespace characters (which are otherwise impossible to distinguish visually), and have a long history. They are accordingly widely used in string literals, and adding an escape sequence (either to a single character or throughout a string) is known as '''escaping'''.

One character is chosen as a prefix to give encodings for characters that are difficult or impossible to include directly. Most commonly this is [[backslash]]; in addition to other characters, a key point is that backslash itself can be encoded as a double backslash <code>\\</code> and for delimited strings the delimiter itself can be encoded by escaping, say by <code>\"</code> for ". A regular expression for such escaped strings can be given as follows, as found in the [[ANSI C]] specification:<ref>{{cite web|url=http://www.lysator.liu.se/c/ANSI-C-grammar-l.html|title=ANSI C grammar (Lex)|work=liu.se|access-date=22 June 2016}}</ref>{{efn|The regex given here is not itself quoted or escaped, to reduce confusion.}}
 "<syntaxhighlight lang="c" style="background:none; border:none; color:inherit; padding: 0px 0px;" inline>(\\.|[^\\"])*</syntaxhighlight>"
meaning "a quote; followed by zero or more of either an escaped character (backslash followed by something, possibly backslash or quote), or a non-escape, non-quote character; ending in a quote" – the only issue is distinguishing the terminating quote from a quote preceded by a backslash, which may itself be escaped. Multiple characters can follow the backslash, such as <code>\uFFFF</code>, depending on the escaping scheme.

An escaped string must then itself be [[lexical analysis|lexically analyzed]], converting the escaped string into the unescaped string that it represents. This is done during the evaluation phase of the overall lexing of the computer language: the evaluator of the lexer of the overall language executes its own lexer for escaped string literals.

Among other things, it must be possible to encode the character that normally terminates the string constant, plus there must be some way to specify the escape character itself. Escape sequences are not always pretty or easy to use, so many compilers also offer other means of solving the common problems. Escape sequences, however, solve every delimiter problem and most compilers interpret escape sequences. When an escape character is inside a string literal, it means "this is the start of the escape sequence".  Every escape sequence specifies one character which is to be placed directly into the string.  The actual number of characters required in an escape sequence varies.  The escape character is on the top/left of the keyboard, but the editor will translate it, therefore it is not directly tapeable into a string. The backslash is used to represent the escape character in a string literal.

Many languages support the use of [[metacharacter]]s inside string literals. Metacharacters have varying interpretations depending on the context and language, but are generally a kind of 'processing command' for representing printing or nonprinting characters.

For instance, in a [[C string handling|C string]] literal, if the backslash is followed by a letter such as "b", "n" or "t", then this represents a nonprinting ''backspace'', ''newline'' or ''tab'' character respectively. Or if the backslash is followed by 1-3 [[octal]] digits, then this sequence is interpreted as representing the arbitrary code unit with the specified value in the literal's encoding (for example, the corresponding [[ASCII]] code for an ASCII literal). This was later extended to allow more modern [[hexadecimal]] character code notation:

<syntaxhighlight lang="c">"I said,\t\t\x22Can you hear me?\x22\n"</syntaxhighlight>

{| class="wikitable"
|-
! Escape Sequence !! Unicode !! Literal Characters placed into string
|-
| {{mono|\0}} || U+0000 || [[null character]]<ref name="haskell">{{cite web|url=http://book.realworldhaskell.org/read/characters-strings-and-escaping-rules.html|title=Appendix B. Characters, strings, and escaping rules|work=realworldhaskell.org|access-date=22 June 2016}}</ref><ref name="javascript">{{cite web|url=https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String|title=String|work=mozilla.org|access-date=22 June 2016}}</ref><br/>(typically as a special case of \ooo octal notation)
|-
| {{mono|\a}} || U+0007 || alert<ref name="msdn">{{cite web|url=http://msdn.microsoft.com/en-us/library/h21280bw(v=vs.80).aspx|title=Escape Sequences (C)|work=microsoft.com|access-date=22 June 2016}}</ref><ref name="Rationale_2003_C">{{cite web |title=Rationale for International Standard - Programming Languages - C |version=5.10 |date=April 2003 |pages=52, 153–154, 159 |url=http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf |access-date=2010-10-17 |url-status=live |archive-url=https://web.archive.org/web/20160606072228/http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf |archive-date=2016-06-06}}</ref>
|-
| {{mono|\b}} || U+0008 || backspace<ref name="msdn" />
|-
| {{mono|\f}} || U+000C || form feed<ref name="msdn" />
|-
| {{mono|\n}} || U+000A || line feed<ref name="msdn" /> (or newline in POSIX)
|-
| {{mono|\r}} || U+000D || carriage return<ref name="msdn" /> (or newline in Mac OS 9 and earlier)
|-
| {{mono|\t}} || U+0009 || horizontal tab<ref name="msdn" />
|-
| {{mono|\v}} || U+000B || vertical tab<ref name="msdn" />
|-
| {{mono|\e}} || U+001B || [[escape character]]<ref name="Rationale_2003_C"/> ([[GNU Compiler Collection|GCC]],<ref>{{citation |title=GCC 4.8.2 Manual |chapter=6.35 The Character <ESC> in Constants |chapter-url=https://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Character-Escapes.html#Character-Escapes |access-date=2014-03-08}}</ref> [[clang]] and [[Tiny C Compiler|tcc]])
|-
| {{mono|\u####}} || U+#### || 16-bit [[Unicode]] character where #### are four hex digits<ref name="javascript" />
|-
| {{mono|\U########}} || U+###### || 32-bit Unicode character where ######## are eight hex digits (Unicode character space is currently only 21 bits wide, so the first two hex digits will always be zero)
|-
| {{mono|\u{######}}} || U+###### || 21-bit Unicode character where ###### is a variable number of hex digits
|-í
| {{mono|\x##}} || Depends on encoding{{efn|name=encoding|Since this escape sequence represents a specific [[code unit]] instead of a specific character, what code point (if any) it represents depends on the encoding of the string literal it is found in.}}  || 8-bit character specification where # is a hex digit. The length of a hex escape sequence is not limited to two digits, instead being of an arbitrary length.<ref name="msdn" />
|-
| {{mono|\ooo}} || Depends on encoding{{efn|name=encoding}} || 8-bit character specification where o is an octal digit<ref name="msdn" />
|-
| {{mono|\"}} || U+0022 || double quote (")<ref name="msdn" />
|-
| {{mono|\&}} || || non-character used to delimit numeric escapes in Haskell<ref name="haskell" />
|-
| {{mono|\'}} || U+0027 || single quote (')<ref name="msdn" />
|-
| {{mono|\\}} || U+005C || backslash (\)<ref name="msdn" />
|-
| {{mono|\?}} || U+003F || question mark (?)<ref name="msdn" />
|}

Note:  Not all sequences in the list are supported by all parsers, and there may be other escape sequences which are not in the list.

===Nested escaping===
When code in one programming language is embedded inside another, embedded strings may require multiple levels of escaping. This is particularly common in regular expressions and SQL query within other languages, or other languages inside shell scripts. This double-escaping is often difficult to read and author.

Incorrect quoting of nested strings can present a security vulnerability. Use of untrusted data, as in data fields of an SQL query, should use [[prepared statement]]s to prevent a [[code injection]] attack. In [[PHP]] 2 through 5.3, there was a feature called [[magic quotes]] which automatically escaped strings (for convenience and security), but due to problems was removed from version 5.4 onward.

=== Raw strings ===
A few languages provide a method of specifying that a literal is to be processed without any language-specific interpretation. This avoids the need for escaping, and yields more legible strings.

Raw strings are particularly useful when a common character needs to be escaped, notably in regular expressions (nested as string literals), where backslash <code>\</code> is widely used, and in DOS/Windows [[Path (computing)|paths]], where backslash is used as a path separator. The profusion of backslashes is known as [[leaning toothpick syndrome]], and can be reduced by using raw strings. Compare escaped and raw pathnames in C#:
<syntaxhighlight lang="csharp">
 "The Windows path is C:\\Foo\\Bar\\Baz\\"
 @"The Windows path is C:\Foo\Bar\Baz\"
</syntaxhighlight>
Extreme examples occur when these are combined – [[Uniform Naming Convention]] paths begin with <code>\\</code>, and thus an escaped regular expression matching a UNC name begins with 8 backslashes, <code>"\\\\\\\\"</code>, due to needing to escape the string and the regular expression. Using raw strings reduces this to 4 (escaping in the regular expression), as in C# <code>@"\\\\"</code>.

In XML documents, [[CDATA#CDATA sections in XML|CDATA]] sections allows use of characters such as & and &lt; without an XML parser attempting to interpret them as part of the structure of the document itself. This can be useful when including literal text and scripting code, to keep the document [[Well-formed XML document|well formed]].
<syntaxhighlight lang="xml">
<![CDATA[  if (path!=null && depth<2) { add(path); }  ]]>
</syntaxhighlight>