Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
String (computer science)
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Formal theory == {{See also|Tuple}} Let Σ be a [[finite set]] of distinct, unambiguous symbols (alternatively called characters), called the [[Alphabet (formal languages)|alphabet]]. A '''string''' (or '''word'''<ref>{{cite book |last1=Fletcher |first1=Peter |last2=Hoyle |first2=Hughes |last3=Patty |first3=C. Wayne |year=1991 |title=Foundations of Discrete Mathematics |publisher=PWS-Kent |isbn= 0-53492-373-9 |page=114 |quote=Let Σ be an alphabet. A '''nonempty word over Σ''' is a finite sequence with domain ''I<sub>n</sub>'' (for some ''n'' ∈ ℕ) and codomain Σ.}}</ref> or '''expression'''<ref>{{cite book |last=Shoenfield |first=Joseph R. |authorlink=Joseph R. Shoenfield |year=2010 |orig-date=1967 |title=Mathematical Logic |edition=Reprint |publisher=CRC Press |isbn=978-156881135-2 |page=2 |quote=Any finite sequence of symbols of a language is called an ''expression'' of that language.}}</ref>) over Σ is any finite [[sequence]] of symbols from Σ.<ref name="partee">{{cite book |author1=Barbara H. Partee |author2=Alice ter Meulen|author2-link=Alice ter Meulen |author3=Robert E. Wall |title=Mathematical Methods in Linguistics |publisher=Kluwer |year=1990}}</ref> For example, if Σ = {0, 1}, then ''01011'' is a string over Σ. The ''[[length]]'' of a string ''s'' is the number of symbols in ''s'' (the length of the sequence) and can be any [[non-negative integer]]; it is often denoted as |''s''|. The ''[[empty string]]'' is the unique string over Σ of length 0, and is denoted ''ε'' or ''λ''.<ref name="partee"/><ref>{{cite book| author=John E. Hopcroft, Jeffrey D. Ullman| title=Introduction to Automata Theory, Languages, and Computation| year=1979| publisher=Addison-Wesley| isbn=0-201-02988-X| url-access=registration| url=https://archive.org/details/introductiontoau00hopc}} Here: sect.1.1, p.1</ref> The set of all strings over Σ of length ''n'' is denoted Σ<sup>''n''</sup>. For example, if Σ = {0, 1}, then Σ<sup>2</sup> = {00, 01, 10, 11}. We have Σ<sup>0</sup> = {ε} for every alphabet Σ. The set of all strings over Σ of any length is the [[Kleene star|Kleene closure]] of Σ and is denoted Σ<sup>*</sup>. In terms of Σ<sup>''n''</sup>, :<math>\Sigma^{*} = \bigcup_{n \in \mathbb{N} \cup \{0\}} \Sigma^{n}</math> For example, if Σ = {0, 1}, then Σ<sup>*</sup> = {ε, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, ...}. Although the set Σ<sup>*</sup> itself is [[countably infinite]], each element of Σ<sup>*</sup> is a string of finite length. A set of strings over Σ (i.e. any [[subset]] of Σ<sup>*</sup>) is called a ''[[formal language]]'' over Σ. For example, if Σ = {0, 1}, the set of strings with an even number of zeros, {ε, 1, 00, 11, 001, 010, 100, 111, 0000, 0011, 0101, 0110, 1001, 1010, 1100, 1111, ...}, is a formal language over Σ. === Concatenation and substrings === ''[[Concatenation]]'' is an important [[binary operation]] on Σ<sup>*</sup>. For any two strings ''s'' and ''t'' in Σ<sup>*</sup>, their concatenation is defined as the sequence of symbols in ''s'' followed by the sequence of characters in ''t'', and is denoted ''st''. For example, if Σ = {a, b, ..., z}, ''s'' = {{code|bear}}, and ''t'' = {{code|hug}}, then ''st'' = {{code|bearhug}} and ''ts'' = {{code|hugbear}}. String concatenation is an [[associative]], but non-[[commutative]] operation. The empty string ε serves as the [[identity element]]; for any string ''s'', ε''s'' = ''s''ε = ''s''. Therefore, the set Σ<sup>*</sup> and the concatenation operation form a [[monoid]], the [[free monoid]] generated by Σ. In addition, the length function defines a [[monoid homomorphism]] from Σ<sup>*</sup> to the non-negative integers (that is, a function <math>L: \Sigma^{*} \mapsto \mathbb{N} \cup \{0\}</math>, such that <math>L(st)=L(s)+L(t)\quad \forall s,t\in\Sigma^*</math>). A string ''s'' is said to be a ''[[substring]]'' or ''factor'' of ''t'' if there exist (possibly empty) strings ''u'' and ''v'' such that ''t'' = ''usv''. The [[binary relation|relation]] "is a substring of" defines a [[partial order]] on Σ<sup>*</sup>, the [[least element]] of which is the empty string. === Prefixes and suffixes === A string ''s'' is said to be a [[Substring#Prefix|prefix]] of ''t'' if there exists a string ''u'' such that ''t'' = ''su''. If ''u'' is nonempty, ''s'' is said to be a ''proper'' prefix of ''t''. Symmetrically, a string ''s'' is said to be a [[Substring#Suffix|suffix]] of ''t'' if there exists a string ''u'' such that ''t'' = ''us''. If ''u'' is nonempty, ''s'' is said to be a ''proper'' suffix of ''t''. Suffixes and prefixes are substrings of ''t''. Both the relations "is a prefix of" and "is a suffix of" are [[prefix order]]s. === Reversal === The reverse of a string is a string with the same symbols but in reverse order. For example, if ''s'' = abc (where a, b, and c are symbols of the alphabet), then the reverse of ''s'' is cba. A string that is the reverse of itself (e.g., ''s'' = madam) is called a [[palindrome#Computation theory|palindrome]], which also includes the empty string and all strings of length 1. === Rotations === A string ''s'' = ''uv'' is said to be a rotation of ''t'' if ''t'' = ''vu''. For example, if Σ = {0, 1} the string 0011001 is a rotation of 0100110, where ''u'' = 00110 and ''v'' = 01. As another example, the string abc has three different rotations, viz. abc itself (with ''u''=abc, ''v''=ε), bca (with ''u''=bc, ''v''=a), and cab (with ''u''=c, ''v''=ab). === Lexicographical ordering === It is often useful to define an [[order theory|ordering]] on a set of strings. If the alphabet Σ has a [[total order]] (cf. [[alphabetical order]]) one can define a total order on Σ<sup>*</sup> called [[lexicographical order]]. The lexicographical order is [[total order|total]] if the alphabetical order is, but is not [[well-founded]] for any nontrivial alphabet, even if the alphabetical order is. For example, if Σ = {0, 1} and 0 < 1, then the lexicographical order on Σ<sup>*</sup> includes the relationships ε < 0 < 00 < 000 < ... < 0001 < ... < 001 < ... < 01 < 010 < ... < 011 < 0110 < ... < 01111 < ... < 1 < 10 < 100 < ... < 101 < ... < 111 < ... < 1111 < ... < 11111 ... With respect to this ordering, e.g. the infinite set { 1, 01, 001, 0001, 00001, 000001, ... } has no minimal element. See [[Shortlex]] for an alternative string ordering that preserves well-foundedness. For the example alphabet, the shortlex order is ε < 0 < 1 < 00 < 01 < 10 < 11 < 000 < 001 < 010 < 011 < 100 < 101 < 0110 < 111 < 0000 < 0001 < 0010 < 0011 < ... < 1111 < 00000 < 00001 ... === String operations === A number of additional operations on strings commonly occur in the formal theory. These are given in the article on [[string operations]]. === Topology === [[File:Hamming distance 3 bit binary.svg|thumb|150px|(Hyper)cube of binary strings of length 3]] Strings admit the following interpretation as nodes on a graph, where ''k'' is the number of symbols in Σ: * Fixed-length strings of length ''n'' can be viewed as the integer locations in an ''n''-dimensional [[hypercube]] with sides of length ''k''-1. * Variable-length strings (of finite length) can be viewed as nodes on a [[k-ary tree|perfect ''k''-ary tree]]. * [[Infinite string]]s (otherwise not considered here) can be viewed as infinite paths on a ''k''-node [[complete graph]]. The natural topology on the set of fixed-length strings or variable-length strings is the discrete topology, but the natural topology on the set of infinite strings is the [[limit topology]], viewing the set of infinite strings as the [[inverse limit]] of the sets of finite strings. This is the construction used for the [[p-adic|''p''-adic numbers]] and some constructions of the [[Cantor set]], and yields the same topology. [[Isomorphism]]s between string representations of topologies can be found by normalizing according to the [[lexicographically minimal string rotation]].
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)