Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
S-expression
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
{{Short description|Data serialization format}} [[File:Corrected_S-expression_tree_2.svg|thumb|[[Tree (data structure)|Tree]] data structure representing the S-expression <code>(* 2 (+ 3 4))</code> ]] In [[computer programming]], an '''S-expression''' (or '''symbolic expression''', abbreviated as '''sexpr''' or '''sexp''') is an expression in a like-named notation for nested [[List (computing)|list]] ([[Tree (data structure)|tree]]-structured) data. S-expressions were invented for, and popularized by, the programming language [[Lisp (programming language)|Lisp]], which uses them for [[source code]] as well as data. == Characteristics == In the usual parenthesized [[Syntax (programming languages)|syntax]] of Lisp, an S-expression is classically defined<ref name="McCarthy1960">John McCarthy (1960/2006). [http://www-formal.stanford.edu/jmc/recursive/recursive.html Recursive functions of symbolic expressions] {{webarchive|url=https://web.archive.org/web/20040202215021/http://www-formal.stanford.edu/jmc/recursive/recursive.html |date=2004-02-02 }}. Originally published in [[Communications of the ACM]].</ref> as # an atom of the form <code>''x''</code>, or # an [[Expression (computer science)|expression]] of the form <code>(''x'' . ''y'')</code> where ''x'' and ''y'' are S-expressions. This definition reflects LISP's representation of a list as a series of "cells", each one an [[ordered pair]]. In plain lists, ''y'' points to the next cell (if any), thus forming a [[Linked list|list]]. The [[Recursion|recursive]] clause of the definition means that both this representation and the S-expression notation can represent any [[binary tree]]. However, the representation can in principle allow [[circular reference]]s, in which case the structure is not a tree at all, but a [[cyclic graph]], and cannot be represented in classical S-expression notation unless a convention for cross-reference is provided (analogous to SQL [[foreign key]]s, [[SGML]]/[[XML]] IDREFs, etc.). Modern Lisp dialects such as [[Common Lisp]]<ref>{{cite web|title=Common Lisp HyperSpec: 22.4 - The Printer Dictionary: *PRINT-CIRCLE*|url=http://clhs.lisp.se/Body/v_pr_cir.htm|date=2018-12-28}}</ref> and [[Scheme (programming language)|Scheme]]<ref>{{cite web|title=Revised<sup>7</sup> Report on the Algorithmic Language Scheme: Section 2.4: Datum Labels|url=https://small.r7rs.org/attachment/r7rs.pdf#section.2.4|date=2013-07-06}}</ref> provide such syntax via ''datum labels'', with which objects can be marked, which can then recur elsewhere, indicating shared rather than duplicated structure, enabling the [[Lisp reader|reader]] or [[Format (Common Lisp)|printer]] to detect and thus trigger evaluation or display of cycles without infinitely recursing : <code>#n=(''x'' ''y'' . #n#)</code> The definition of an atom varies per context; in the original definition by [[John McCarthy (computer scientist)|John McCarthy]],<ref name="McCarthy1960"/> it was assumed that there existed "an infinite set of distinguishable [[Symbol (programming)|atomic symbols]]" represented as "strings of capital [[Latin script|Latin letter]]s and digits with single embedded blanks" (a subset of [[String (computing)|character string]] and numeric [[Literal (computer programming)|literal]]s). Most modern sexpr notations allow more general quoted strings (for example including punctuation or full [[Unicode]]), and use an abbreviated notation to represent lists with more than 2 members, so that : <code>(''x'' ''y'' ''z'')</code> stands for : <code>(''x'' . (''y'' . (''z'' . NIL)))</code> <code>NIL</code> is the special end-of-list [[Object (computer science)|object]] (alternatively written <code>()</code>, which is the only representation in [[Scheme (programming language)|Scheme]]<ref>{{Cite web|url=https://schemers.org/Documents/Standards/R5RS/HTML/r5rs-Z-H-9.html|title=Revised^5 Report on the Algorithmic Language Scheme|website=schemers.org}}</ref>). In the Lisp family of programming languages, S-expressions are used to represent both source code and data. Other uses of S-expressions are in Lisp-derived languages such as [[Document Style Semantics and Specification Language|DSSSL]], and as [[Markup language|mark-up]] in [[communication protocol]]s like [[Internet Message Access Protocol|IMAP]] and [[John McCarthy (computer scientist)|John McCarthy]]'s [[Common Business Communication Language|CBCL]]. It is also used as text representation of [[WebAssembly]]. The details of the syntax and supported [[data type]]s vary in the different languages, but the most common feature among these languages is the use of S-expressions and prefix notation. ==Datatypes and syntax== There are many variants of the S-expression format, supporting a variety of different syntaxes for different datatypes. The most widely supported are: * ''Lists and pairs'': <code>(1 () (2 . 3) (4))</code> * ''Symbols'': <code>with-hyphen</code> <code>?@!$</code> <code>|a symbol with spaces|</code> * ''Strings'': <code>"Hello, world!"</code> * ''Integers'': <code>-9876543210</code> * ''Floating-point numbers'': <code>-0.0</code> <code>6.28318</code> <code>6.022e23</code> The character <code>#</code> is often used to prefix extensions to the syntax, e.g. <code>#x10</code> for hexadecimal integers, or <code>#\C</code> for characters. ==Use in Lisp== When representing source code in Lisp, the first element of an S-expression is commonly an operator or function name and any remaining elements are treated as arguments. This is called "prefix notation" or "[[Polish notation]]". As an example, the [[Boolean logic|Boolean]] expression written {{nowrap begin}}<code>4 == (2 + 2)</code>{{nowrap end}} in [[C (programming language)|C]], is represented as {{nowrap begin}}<code>(= 4 (+ 2 2))</code>{{nowrap end}} in Lisp's s-expr-based prefix notation. As noted above, the precise definition of "atom" varies across LISP-like languages. A quoted string can typically contain anything but a quote, while an unquoted identifier atom can typically contain anything but quotes, whitespace characters, parentheses, brackets, braces, backslashes, and semicolons. In either case, a prohibited character can typically be included by escaping it with a preceding backslash. [[Unicode]] support varies. The recursive case of the s-expr definition is traditionally implemented using [[cons cell]]s. S-expressions were originally intended only for data to be manipulated by [[M-expression]]s, but the first implementation of Lisp was an interpreter of S-expression encodings of M-expressions, and Lisp programmers soon became accustomed to using S-expressions for both code and data. This means that Lisp is [[homoiconic]]; that is, the primary representation of programs is also a data structure in a primitive type of the language itself. Nested lists can be written as S-expressions: <code>((milk juice) (honey marmalade))</code> is a two-element S-expression whose elements are also two-element S-expressions. The whitespace-separated notation used in Lisp (and this article) is typical. Line breaks (newline characters) usually qualify as separators. This is a simple [[context-free grammar]] for a tiny subset of English written as an S-expression (Gazdar/Melish, Natural Language Processing in Lisp), where S=sentence, NP=Noun Phrase, VP=Verb Phrase, V=Verb: <syntaxhighlight lang="lisp"> (((S) (NP VP)) ((VP) (V)) ((VP) (V NP)) ((V) died) ((V) employed) ((NP) nurses) ((NP) patients) ((NP) Medicenter) ((NP) "Dr Chan")) </syntaxhighlight> Program code can be written in S-expressions, usually using prefix notation. Example in [[Common Lisp]]: <syntaxhighlight lang="lisp"> (defun factorial (x) (if (zerop x) 1 (* x (factorial (- x 1))))) </syntaxhighlight> S-expressions can be read in Lisp using the function READ. READ reads the textual representation of an S-expression and returns Lisp data. The function PRINT can be used to output an S-expression. The output then can be read with the function READ, when all printed data objects have a readable representation. Lisp has readable representations for numbers, strings, symbols, lists and many other data types. Program code can be formatted as pretty printed S-expressions using the function PPRINT (note: with two Ps, short for ''pretty''-print). Lisp programs are valid S-expressions, but not all S-expressions are valid Lisp programs. <code>(1.0 + 3.1)</code> is a valid S-expression, but not a valid Lisp program, since Lisp uses prefix notation and a floating point number (here 1.0) is not valid as an operation (the first element of the expression). An S-expression preceded by a single quotation mark, as in <code>'x</code>, is [[syntactic sugar]] for a [[Lisp (programming language)#Self-evaluating forms and quoting|quoted S-expression]], in this case <code>(quote x)</code>. == Relation to XML == S-expressions are often compared to [[XML]]: one key difference is that S-expressions have just one form of containment, the dotted pair, while XML tags can contain simple attributes, other tags, or [[CDATA]], each using different syntax. Another is that S-expressions do not define a reference mechanism, whereas XML provides a notion of unique identifiers and references to them. For simple use cases, S-expressions are simpler than XML, but for more advanced use cases, XML has a query language called [[XPath]] which many tools and third party libraries use to simplify the handling of XML data. == Standardization == {{See also|Canonical S-expressions}} Standards for some Lisp-derived programming languages include a specification for their S-expression syntax. These include [[Common Lisp]] (ANSI standard document ANSI INCITS 226-1994 (R2004)), [[Scheme (programming language)|Scheme]] (R5RS and [[R6RS]]<ref>{{Cite journal|title=Revised6 Report on the Algorithmic Language Scheme|date=Aug 12, 2009|journal=Journal of Functional Programming|volume=19|issue=S1|pages=1–301|doi=10.1017/S0956796809990074|last1=Sperber|first1=Michael|last2=Dybvig|first2=R. Kent|last3=Flatt|first3=Matthew|last4=Van Straaten|first4=Anton|last5=Findler|first5=Robby|last6=Matthews|first6=Jacob|citeseerx=10.1.1.372.373|s2cid=267822156 }}</ref>), and [[ISLISP]]. In May 1997, [[Ron Rivest]] submitted an [[Internet Draft]]<ref>[https://web.archive.org/web/20230223024606/http://people.csail.mit.edu/rivest/Sexp.txt S-expressions], Network Working Group, Internet Draft, Expires November 4, 1997 - R. Rivest, May 4, 1997 draft-rivest-sexp-00.txt, Ronald L. Rivest, CSAIL MIT website</ref> to be considered for publication as an [[Request for Comments|RFC]]. The draft defined a syntax based on Lisp S-expressions but intended for general-purpose data storage and exchange (similar to [[XML]]) rather than specifically for programming. It was never approved as an RFC, but it has since been cited and used by other RFCs (e.g. RFC 2693) and several other publications.<ref>[https://scholar.google.com/scholar?hl=en&lr=&safe=off&q=rivest+sexp&btnG=Search rivest sexp], Google Scholar (search)</ref> It was originally intended for use in [[Simple public key infrastructure|SPKI]]. Rivest's format defines an S-expression as being either an octet-string (a series of [[byte]]s) or a finite list of other S-expressions. It describes three interchange formats for expressing this structure. One is the "advanced transport", which is very flexible in terms of formatting, and is syntactically similar to Lisp-style expressions, but they are not identical. The advanced transport, for example, allows octet-strings to be represented verbatim (the string's length followed by a colon and the entire raw string), a quoted form allowing escape characters, [[hexadecimal]], [[Base64]], or placed directly as a "token" if it meets certain conditions. (Rivest's tokens differ from Lisp tokens in that the former are just for convenience and aesthetics, and treated exactly like other strings, while the latter have specific syntactical meaning.) Rivest's draft defines a [[Canonical S-expressions|canonical representation]] "for digital signature purposes". It is intended to be compact, easier to parse, and unique for any abstract S-expression. It only allows verbatim strings, and prohibits whitespace as formatting outside strings. Finally there is the "basic transport representation", which is either the canonical form or the same encoded as Base64 and surrounded by [[Bracket|braces]], the latter intended to safely transport a canonically encoded S-expression in a system which might change spacing (e.g. an email system which has 80-character-wide lines and wraps anything longer than that). This format has not been widely adapted for use outside of SPKI (some of the users being [[GnuPG]], libgcrypt, [[Nettle (cryptographic library)|Nettle]], and [[GNU]] lsh). Rivest's S-expressions web page provides [[C (programming language)|C]] source code for a parser and generator (available under the [[MIT license]]), which could be adapted and embedded into other programs.<ref>{{Cite web|url=http://people.csail.mit.edu/rivest/Sexp.txt|title=SEXP (S-expressions)|website=people.csail.mit.edu|access-date=2023-05-05|archive-date=2023-02-23|archive-url=https://web.archive.org/web/20230223024606/http://people.csail.mit.edu/rivest/Sexp.txt|url-status=dead}}</ref> In addition, there are no restrictions on independently implementing the format. ==See also== * [[cons]] * [[CAR and CDR]] * [[Fexpr]] * [[Lambda calculus]] * [[M-expression]] * [[Canonical S-expressions]] * [[Comparison of data serialization formats]] ==References== {{Reflist}} ==External links== * [https://github.com/mjsottile/sfsexp sfsexp] the small, fast S-expression library for C/C++ on GitHub * [http://leon.bottou.org/projects/minilisp minilisp], by Léon Bottou. * [http://rosettacode.org/wiki/S-expressions S-expressions on Rosettacode] has implementations of readers and writers in many languages. {{Lisp programming language}} [[Category:Lisp (programming language)]] [[Category:Data serialization formats]]
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)
Pages transcluded onto the current version of this page
(
help
)
:
Template:Cite journal
(
edit
)
Template:Cite web
(
edit
)
Template:Lisp programming language
(
edit
)
Template:Nowrap begin
(
edit
)
Template:Nowrap end
(
edit
)
Template:Reflist
(
edit
)
Template:See also
(
edit
)
Template:Short description
(
edit
)
Template:Webarchive
(
edit
)