Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Delimiter
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{short description|Characters that specify the boundary between regions in a data stream}} {{hatnote|This article is about delimiters in computing. For delimiters in human use, see [[Word divider]] and [[digit grouping]].}} [[File:Csv delimited000.svg|thumb|A stylistic depiction of values inside of a so-named [[comma-separated values]] (CSV) text file. The commas (shown in red) are used as field delimiters.|alt=]] A '''delimiter''' is a sequence of one or more [[Character (computing)|character]]s for specifying the boundary between separate, independent regions in [[plain text]], [[Expression (mathematics)|mathematical expressions]] or other [[Data stream|data streams]].<ref>{{Cite web |url=https://www.its.bldrdoc.gov/fs-1037/dir-011/_1544.htm |title=Definition: delimiter|work=Federal Standard 1037C - Telecommunications: Glossary of Telecommunication Terms |access-date=2019-11-25 |archive-url=https://web.archive.org/web/20130305032313/https://www.its.bldrdoc.gov/fs-1037/dir-011/_1544.htm |archive-date=2013-03-05 |url-status=live}}</ref><ref>{{Cite web|title=What is a Delimiter?|url=https://www.computerhope.com/jargon/d/delimite.htm|access-date=2020-08-09|website=www.computerhope.com|language=en}}</ref> An example of a delimiter is the [[comma]] character, which acts as a ''field delimiter'' in a sequence of [[comma-separated values]]. Another example of a delimiter is the time gap used to separate letters and words in the transmission of [[Morse code]].{{Citation needed|date=February 2024|reason=time gap is not a character}} In [[mathematics]], delimiters are often used to specify the scope of an [[Operation (mathematics)|operation]], and can occur both as isolated symbols (e.g., [[Colon (punctuation)|colon]] in "<math>1 : 4</math>") and as a pair of opposing-looking symbols (e.g., [[Angled bracket|angled brackets]] in <math>\langle a, b \rangle</math>). Delimiters represent one of various means of specifying boundaries in a [[data stream]]. [[String literal#Declarative notation|Declarative notation]], for example, is an alternate method (without the use of delimiters) that uses a length field at the start of a data stream to specify the number of characters that the data stream contains.<ref name="hollerity">{{cite book | last = Rohl | first = Jeffrey S. | title = Programming in Fortran | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 1973 | isbn = 978-0-7190-0555-8 }} describing the method in Hollerith notation under the Fortran programming language.</ref> == Overview == Delimiters may be characterized as field and record delimiters, or as bracket delimiters. ===Field and record delimiters=== Field delimiters separate data fields. Record delimiters separate groups of fields.<ref name="FldDelm">{{cite book | last = de Moor | first = Georges J. | title = Progress in Standardization in Health Care Informatics | publisher =IOS Press | year = 1993 | isbn =90-5199-114-2}} p. 141</ref> For example, the [[Comma-separated values|CSV format]] uses a comma as the delimiter between [[Field (computer science)|fields]], and an [[end-of-line]] indicator as the delimiter between [[Row (database)|records]]: <pre> fname,lname,age,salary nancy,davolio,33,$30000 erin,borakova,28,$25250 tony,raphael,35,$28700 </pre> This specifies a simple [[flat-file database]] [[Table (information)|table]] using the CSV file format. ===Bracket delimiters=== Bracket delimiters, also called block delimiters, region delimiters, or balanced delimiters, mark both the start and end of a region of text.<ref name="BalaDelm">{{cite book | last = Friedl | first = Jeffrey E. F. | title = Mastering Regular Expressions: Powerful Techniques for Perl and Other Tools | publisher = O'Reilly | year = 2002| isbn = 0-596-00289-0}} p. 319</ref><ref name="Scott000">{{cite book | title = Programming Language Pragmatics | first = Michael Lee | last = Scott | publisher = Morgan Kaufmann | year = 1999 | isbn = 1-55860-442-1 }}</ref> Common examples of bracket delimiters include:<ref name="programmingperl">{{cite book | title=Programming Perl |edition=Third | publisher=O'Reilly |date=July 2000 | isbn=0-596-00027-8 | last1=Wall | first1=Larry | first2=Jon |last2=Orwant | author-link1=Larry Wall | author-link3=Jon Orwant }}</ref> {| class="wikitable" |- ! Delimiters ! style="text-align:left" | Description |- ! <code>(</code> <code>)</code> | [[Bracket#Parentheses|Parentheses]]. The [[Lisp (programming language)|Lisp]] programming language syntax is cited as recognizable primarily by its use of parentheses.<ref name="Kaufmann000">{{cite book | title = Computer-Aided Reasoning: An Approach | first = Matt | last = Kaufmann | publisher = Springer | year = 2000 | isbn = 0-7923-7744-3 }}p. 3</ref> |- ! <code>{</code> <code>}</code> | Braces (also called [[Bracket#Curly brackets|curly brackets]]<ref name="curly_brace_cstyle">{{cite book | last = Meyer | first = Mark | title = Explorations in Computer Science | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 2005 | isbn = 978-0-7637-3832-7 }} references C-style programming languages prominently featuring curly brackets and semicolons.</ref>). |- ! <code>[</code> <code>]</code> | Brackets (commonly used to denote a subscript). |- ! <code><</code> <code>></code> | [[Bracket#Angle brackets|Angle brackets]].<ref name="id_1268443793898_27">{{cite book | last = Dilligan | first = Robert | title = Computing in the Web Age | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 1998 | isbn = 978-0-306-45972-6 }}Describes syntax and delimiters used in HTML.</ref> |- ! <code>"</code> <code>"</code> | commonly used to denote [[string literal]]s.<ref name="id_1268443910269_75">{{cite book | last = Schwartz | first = Randal |author-link=Randal Schwartz | title = Learning Perl | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 2005 | isbn = 978-0-596-10105-3 | url = https://archive.org/details/isbn_9780596101053 }}Describes [[string literal]]s.</ref> |- ! <code>'</code> <code>'</code> | commonly used to denote [[character literal]]s.<ref name="id_1268443910269_75"/> |- ! <code><?</code> <code>?></code> | used to indicate XML [[processing instruction]]s.<ref name="id_1268443998814_32">{{cite book | last = Watt | first = Andrew | title = Sams Teach Yourself Xml in 10 Minutes | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 2003 | isbn = 978-0-672-32471-0 | url-access = registration | url = https://archive.org/details/samsteachyoursel0000watt }} Describes XML processing instruction. p. 21.</ref> |- ! <code>/*</code> <code>*/</code> | used to denote [[comment (computer programming)|comment]]s in some programming languages.<ref name="id_1268444112328_77">{{cite book | last = Cabrera | first = Harold | title = C# for Java Programmers | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 2002 | isbn = 978-1-931836-54-8 }} Describes single-line and multi-line comments. p. 72.</ref> |- ! <code><%</code> <code>%></code> | used in some [[web template]]s to specify language boundaries.<ref>{{cite web|url=https://github.com/jakartaee/pages/blob/master/spec/src/main/asciidoc/ServerPages.adoc#jakarta-server-pages-specification-version-40|title=Jakarta Server Pages Specification, Version 4.0akarta Server Pages Specification, Version 4.0|website=[[GitHub]] |access-date=2023-02-10}}</ref> |} ===Conventions=== Historically, computing platforms have used certain delimiters by convention.<ref>{{cite iso-ir|date=December 1, 1975|number=1|title=The set of control characters for ISO 646||sponsor=ISO/TC 97/SC 2 |sponsor-link=ISO/IEC JTC 1/SC 2#History}}</ref><ref>{{cite iso-ir|date=December 1, 1975|number=6|title=ASCII graphic character set|sponsor=[[American National Standards Institute]]}}</ref> The following tables depict a few examples for comparison. '''Programming languages''' (''See also'', [[Comparison of programming languages (syntax)]]). {| class="wikitable" ! !! String Literal !! End of Statement |- ! Pascal | singlequote || semicolon |- ! Python | doublequote, singlequote || [[end of line]] (EOL) |- |} '''Field and Record delimiters''' (''See also'', [[ASCII]], [[Control character]]). {| class="wikitable" ! !! End of Field !! End of Record !! End of File |- ! [[Unix-like]] systems including [[macOS]], [[AmigaOS]] | [[Tab key|Tab]] || [[Line feed|LF]] || none |- ! [[Windows]], [[MS-DOS]], [[OS/2]], [[CP/M]] | [[Tab key|Tab]] || [[CRLF]] || none (except in CP/M), [[Control-Z]]<ref name="id_1268444696385_5">{{cite book | last = Lewine | first = Donald | title = Posix Programmer's Guide | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 1991 | isbn = 978-0-937175-73-6 | url = https://archive.org/details/posixprogrammers00lewi }} Describes use of control-z. p. 156,</ref> |- ! [[Classic Mac OS]], [[Apple DOS]], [[ProDOS]], [[GS/OS]] | [[Tab key|Tab]] || [[Carriage return|CR]] || none |- ! ASCII/Unicode | [[C0 and C1 control codes#Field_separators|UNIT SEPARATOR]]<br>Position 31 (U+001F) || RECORD SEPARATOR<br>Position 30 (U+001E) || FILE SEPARATOR<br>Position 28 (U+001C) |} ==Delimiter collision==<!-- This section is linked from several articles; [[Special:WhatLinksHere/Delimiter_collision]] --> '''Delimiter collision''' is a problem that occurs when an author or programmer introduces delimiters into text without actually intending them to be interpreted as boundaries between separate regions.<ref name="FldDelm"/><ref name="mre_embed_problem">{{cite book | last = Friedl | first = Jeffrey | title = Mastering Regular Expressions | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 2006 | isbn = 978-0-596-52812-6 }} describing solutions for embedded-delimiter problems p. 472.</ref> In the case of XML, for example, this can occur whenever an author attempts to specify an [[angle bracket]] character. In most file types there is both a field delimiter and a record delimiter, both of which are subject to collision. In the case of [[comma-separated values]] files, for example, field collision can occur whenever an author attempts to include a comma as part of a field value (e.g., salary = "$30,000"), and record delimiter collision would occur whenever a field contained multiple lines. Both record and field delimiter collision occur frequently in text files. In some contexts, a malicious user or attacker may seek to exploit this problem intentionally. Consequently, delimiter collision can be the source of security [[Vulnerability (computing)|vulnerabilities]] and [[Exploit (computer security)|exploits]]. Malicious users can take advantage of delimiter collision in languages such as [[SQL]] and [[HTML]] to deploy such well-known attacks as [[SQL injection]] and [[cross-site scripting]], respectively. ===Solutions=== Because delimiter collision is a very common problem, various methods for avoiding it have been invented. Some authors may attempt to avoid the problem by choosing a delimiter character (or sequence of characters) that is not likely to appear in the data stream itself. This ''ad hoc'' approach may be suitable, but it necessarily depends on a correct guess of what will appear in the data stream, and offers no security against malicious collisions. Other, more formal conventions are therefore applied as well. ====ASCII delimited text==== The ASCII and Unicode character sets were designed to solve this problem by the provision of non-printing characters that can be used as delimiters. These are the range from ASCII 28 to 31. {| class="wikitable" |- ! ASCII [[Decimal|Dec]] ! Symbol ! Unicode Name ! Common Name ! Usage |- ! 28 ! {{resize|200%|β}} | INFORMATION SEPARATOR FOUR | [[file separator]] | End of file. Or between a concatenation of what might otherwise be separate files. |- ! 29 ! {{resize|200%|β}} | INFORMATION SEPARATOR THREE | [[group separator]] | Between sections of data. Not needed in simple data files. |- ! 30 ! {{resize|200%|β}} | INFORMATION SEPARATOR TWO | [[record separator]] | End of a record or row. |- ! 31 ! {{resize|200%|β}} | INFORMATION SEPARATOR ONE | [[unit separator]] | Between fields of a record, or members of a row. |} The use of ASCII 31 [[Unit separator]] as a field separator and ASCII 30 [[Record separator]] solves the problem of both field and record delimiters that appear in a text data stream.<ref>[http://ronaldduncan.wordpress.com/2009/10/31/text-file-formats-ascii-delimited-text-not-csv-or-tab-delimited-text/ Discussion on ASCII Delimited Text vs CSV and Tab Delimited]</ref> ====Escape character==== One method for avoiding delimiter collision is to use [[escape character]]s. From a language design standpoint, these are adequate, but they have drawbacks: * text can be rendered unreadable when littered with numerous escape characters, a problem referred to as [[leaning toothpick syndrome]] (due to use of \ to escape / in [[Perl]] [[regular expression]]s, leading to sequences such as "\/\/"); * text becomes difficult to parse through regular expression * they require a mechanism to "escape the escapes" when not intended as escape characters; and * although easy to type, they can be cryptic to someone unfamiliar with the language.<ref name="Kahrel000">{{cite book | title = Automating InDesign with Regular Expressions | first = Peter | last = Kahrel | publisher = O'Reilly | year = 2006 | isbn = 0-596-52937-6 | page = 11 }}</ref> * they do not protect against injection attacks {{citation needed|date=March 2014}} ====Escape sequence==== Escape sequences are similar to escape characters, except they usually consist of some kind of mnemonic instead of just a single character. One use is in [[string literal]]s that include a doublequote (") character. For example in [[Perl]], the code: <syntaxhighlight lang="perl"> print "Nancy said \x22Hello World!\x22 to the crowd."; ### use \x22 </syntaxhighlight> produces the same output as: <syntaxhighlight lang="perl"> print "Nancy said \"Hello World!\" to the crowd."; ### use escape char </syntaxhighlight> One drawback of escape sequences, when used by people, is the need to memorize the codes that represent individual characters (see also: [[character entity reference]], [[numeric character reference]]). ====Dual quoting delimiters==== In contrast to escape sequences and escape characters, dual delimiters provide yet another way to avoid delimiter collision. Some languages, for example, allow the use of either a single quote (') or a double quote (") to specify a string literal. For example, in [[Perl]]: <syntaxhighlight lang="perl"> print 'Nancy said "Hello World!" to the crowd.'; </syntaxhighlight> produces the desired output without requiring escapes. This approach, however, only works when the string does not contain ''both'' types of quotation marks. ====Padding quoting delimiters==== In contrast to escape sequences and escape characters, padding delimiters provide yet another way to avoid delimiter collision. [[Visual Basic]], for example, uses double quotes as delimiters. This is similar to escaping the delimiter. <syntaxhighlight lang="basic"> print "Nancy said ""Hello World!"" to the crowd." </syntaxhighlight> produces the desired output without requiring escapes. Like regular escaping it can, however, become confusing when many quotes are used. The code to print the above source code would look more confusing: <syntaxhighlight lang="basic"> print "print ""Nancy said """"Hello World!"""" to the crowd.""" </syntaxhighlight> ==== Configurable alternative quoting delimiters ==== In contrast to dual delimiters, multiple delimiters are even more flexible for avoiding delimiter collision.<ref name="programmingperl" />{{rp|63}} For example, in [[Perl]]: <syntaxhighlight lang="perl"> print qq^Nancy doesn't want to say "Hello World!" anymore.^; print qq@Nancy doesn't want to say "Hello World!" anymore.@; print qq(Nancy doesn't want to say "Hello World!" anymore.); </syntaxhighlight> all produce the desired output through use of [http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Operators quote operators], which allow any convenient character to act as a delimiter. Although this method is more flexible, few languages support it. Perl and [[Ruby (programming language)|Ruby]] are two that do.<ref name="programmingperl" />{{rp|62}}<ref name="Ruby000">{{cite book |last = Yukihiro |first = Matsumoto |title = Ruby in a Nutshell |publisher = O'Reilly |year = 2001 |isbn = 0-596-00214-9 |url = https://archive.org/details/rubyinnutshellde00mats }} In Ruby, these are indicated as ''general delimited strings''. p. 11</ref> ====Content boundary==== A '''content boundary''' is a special type of delimiter that is specifically designed to resist delimiter collision. It works by allowing the author to specify a sequence of characters that is guaranteed to always indicate a boundary between parts in a multi-part message, with no other possible interpretation.<ref name="Mime000">{{cite book | title = Network Protocols Handbook | publisher = Javvin Technologies Inc. | year = 2005 | isbn = 0-9740945-2-8 }} p. 26</ref> The delimiter is frequently generated from a random sequence of characters that is statistically improbable to occur in the content. This may be followed by an identifying mark such as a [[UUID]], a [[timestamp]], or some other distinguishing mark. Alternatively, the content may be scanned to guarantee that a delimiter does not appear in the text. This may allow the delimiter to be shorter or simpler, and increase the human readability of the document. (''See e.g.'', [[MIME#Multipart messages|MIME]], [[Here document]]s). ====Whitespace or indentation==== Some programming and computer languages allow the use of [[String literal#Whitespace delimiters|whitespace delimiters]] or [[Indent style|indentation]] as a means of specifying boundaries between independent regions in text.<ref name="id_1268444524465_10">{{cite book | title = Computational Linguistics and Intelligent Text Processing | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 2001 | isbn = 978-3-540-41687-6 }} Describes whitespace delimiters. p. 258.</ref> ==== Regular expression syntax ==== {{see also|Regular expression examples}} In specifying a [[regular expression]], alternate delimiters may also be used to simplify the syntax for '''match''' and '''substitution''' operations in [[Perl]].<ref name="Friedl000">{{cite book | last = Friedl | first = Jeffrey | title = Mastering Regular Expressions | publisher = Oxford University Press | location = Oxford Oxfordshire | year = 2006 | isbn = 978-0-596-52812-6 }} page 472.</ref> For example, a simple match operation may be specified in Perl with the following syntax: <syntaxhighlight lang="perl"> $string1 = 'Nancy said "Hello World!" to the crowd.'; # specify a target string print $string1 =~ m/[aeiou]+/; # match one or more vowels </syntaxhighlight> The syntax is flexible enough to specify match operations with alternate delimiters, making it easy to avoid delimiter collision: <syntaxhighlight lang="perl"> $string1 = 'Nancy said "http://Hello/World.htm" is not a valid address.'; # target string print $string1 =~ m@http://@; # match using alternate regular expression delimiter print $string1 =~ m{http://}; # same as previous, but different delimiter print $string1 =~ m!http://!; # same as previous, but different delimiter. </syntaxhighlight> ==== Here document ==== A [[Here document]] allows the inclusion of arbitrary content by describing a special end sequence. Many languages support this including [[PHP]], [[Bash (Unix shell)|bash scripts]], [[Ruby (programming language)|ruby]] and [[perl]]. A here document starts by describing what the end sequence will be and continues until that sequence is seen at the start of a new line.<ref>[http://perldoc.perl.org/perlop.html Perl operators and precedence]</ref> Here is an example in perl: <syntaxhighlight lang="perl"> print <<ENDOFHEREDOC; It's very hard to encode a string with "certain characters". Newlines, commas, and other characters can cause delimiter collisions. ENDOFHEREDOC </syntaxhighlight> This code would print: <pre> It's very hard to encode a string with "certain characters". Newlines, commas, and other characters can cause delimiter collisions. </pre> By using a special end sequence all manner of characters are allowed in the string. ====ASCII armor==== Although principally used as a mechanism for text encoding of binary data, [[ASCII armoring]] is a programming and systems administration technique that also helps to avoid delimiter collision in some circumstances.<ref name="Rhee000">{{cite book | title = Internet Security: Cryptographic Principles, Algorithms and Protocols | first = Man | last = Rhee | publisher = John Wiley and Sons | year = 2003 | isbn = 0-470-85285-2 }}(an example usage of ASCII armoring in encryption applications)</ref><ref name="Gross000">{{cite book | title = Open Source for Windows Administrators | url = https://archive.org/details/opensourceforwin0000gros | url-access = registration | first = Christian | last = Gross | publisher = Charles River Media | year = 2005 | isbn = 1-58450-347-5 }}(an example usage of ASCII armoring in encryption applications)</ref> This technique is contrasted from the other approaches described above because it is more complicated, and therefore not suitable for small applications and simple data storage formats. The technique employs a special encoding scheme, such as [[base64]], to ensure that delimiter or other significant characters do not appear in transmitted data. The purpose is to prevent multilayered [[Escape character|escaping]], i.e. for [[Quoting escape|doublequotes]]. This technique is used, for example, in [[Microsoft]]'s [[ASP.NET]] web development technology, and is closely associated with the "VIEWSTATE" component of that system.<ref name="Kalani000">{{cite book | title = Developing and Implementing Web Applications with Visual C# . NET and Visual Studio . NET | first = Amit | last = Kalani | publisher = Que | year = 2004 | isbn = 0-7897-2901-6 }}(describes the use of Base64 encoding and VIEWSTATE inside HTML source code)</ref> ===== Example ===== The following simplified example demonstrates how this technique works in practice. The first code fragment shows a simple [[HTML tag]] in which the VIEWSTATE value contains characters that are incompatible with the delimiters of the HTML tag itself: <syntaxhighlight lang="xml"> <input type="hidden" name="__VIEWSTATE" value="BookTitle:Nancy doesn't say "Hello World!" anymore." /> </syntaxhighlight> This first code fragment is not [[Well-formed element|well-formed]], and would therefore not work properly in a "real world" deployed system. To store arbitrary text in an HTML attribute, [[List of XML and HTML character entity references|HTML entities]] can be used. In this case "&quot;" stands in for the double-quote: <syntaxhighlight lang="xml"> <input type="hidden" name="__VIEWSTATE" value="BookTitle:Nancy doesn't say "Hello World!" anymore." /> </syntaxhighlight> Alternatively, any encoding could be used that doesn't include characters that have special meaning in the context, such as base64: <syntaxhighlight lang="xml"> <input type="hidden" name="__VIEWSTATE" value="Qm9va1RpdGxlOk5hbmN5IGRvZXNuJ3Qgc2F5ICJIZWxsbyBXb3JsZCEiIGFueW1vcmUu" /> </syntaxhighlight> Or [[percent-encoding]]: <syntaxhighlight lang="xml"> <input type="hidden" name="__VIEWSTATE" value="BookTitle:Nancy%20doesn%27t%20say%20%22Hello%20World!%22%20anymore." /> </syntaxhighlight> This prevents delimiter collision and ensures that incompatible characters will not appear inside the HTML code, regardless of what characters appear in the original (decoded) text.<ref name="Kalani000" /> ==See also== * [[CDATA]] * [[Decimal separator]] * [[Delimiter-separated values]] * [[Escape sequence]] * [[String literal]] * [[Tab-separated values]] ==References== {{reflist}} ==External links== * [http://www.catb.org/esr/writings/taoup/html/ch05s02.html Data File Metaformats] from [[The Art of Unix Programming]] by [[Eric Steven Raymond]] [[Category:Markup languages]] [[Category:Pattern matching]] [[Category:Programming constructs]] [[Category:String (computer science)]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Citation needed
(
edit
)
Template:Cite book
(
edit
)
Template:Cite iso-ir
(
edit
)
Template:Cite web
(
edit
)
Template:Hatnote
(
edit
)
Template:Reflist
(
edit
)
Template:Resize
(
edit
)
Template:Rp
(
edit
)
Template:See also
(
edit
)
Template:Short description
(
edit
)