Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Regular expression
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Formal definition=== Regular expressions consist of constants, which denote sets of strings, and operator symbols, which denote operations over these sets. The following definition is standard, and found as such in most textbooks on formal language theory.<ref name="HopcroftMotwaniUllman01">{{harvtxt|Hopcroft|Motwani|Ullman|2000}}</ref><ref>{{harvtxt|Sipser|1998}}</ref> Given a finite [[alphabet (computer science)|alphabet]] Σ, the following constants are defined as regular expressions: * (''empty set'') ∅ denoting the set ∅. * (''[[empty string]]'') ε denoting the set containing only the "empty" string, which has no characters at all. * (''[[string literal|literal character]]'') <code>a</code> in Σ denoting the set containing only the character ''a''. Given regular expressions R and S, the following operations over them are defined to produce regular expressions: * (''[[concatenation]]'') <code>(RS)</code> denotes the set of strings that can be obtained by concatenating a string accepted by R and a string accepted by S (in that order). For example, let R denote {"ab", "c"} and S denote {"d", "ef"}. Then, <code>(RS)</code> denotes {"abd", "abef", "cd", "cef"}. * (''[[alternation (formal language theory)|alternation]]'') <code>(R|S)</code> denotes the [[set union]] of sets described by R and S. For example, if R describes {"ab", "c"} and S describes {"ab", "d", "ef"}, expression <code>(R|S)</code> describes {"ab", "c", "d", "ef"}. * (''[[Kleene star]]'') <code>(R*)</code> denotes the smallest [[subset|superset]] of the set described by ''R'' that contains ε and is [[closure (mathematics)|closed]] under string concatenation. This is the set of all strings that can be made by concatenating any finite number (including zero) of strings from the set described by R. For example, if R denotes {"0", "1"}, <code>(R*)</code> denotes the set of all finite [[binary string]]s (including the empty string). If R denotes {"ab", "c"}, <code>(R*)</code> denotes {ε, "ab", "c", "abab", "abc", "cab", "cc", "ababab", "abcab", ...}. To avoid parentheses, it is assumed that the Kleene star has the highest priority followed by concatenation, then alternation. If there is no ambiguity, then parentheses may be omitted. For example, <code>(ab)c</code> can be written as <code>abc</code>, and <code>a|(b(c*))</code> can be written as <code>a|bc*</code>. Many textbooks use the symbols ∪, +, or ∨ for alternation instead of the vertical bar. '''Examples:''' * <code>a|b*</code> denotes {ε, "a", "b", "bb", "bbb", ...} * <code>(a|b)*</code> denotes the set of all strings with no symbols other than "a" and "b", including the empty string: {ε, "a", "b", "aa", "ab", "ba", "bb", "aaa", ...} * <code>ab*(c|ε)</code> denotes the set of strings starting with "a", then zero or more "b"s and finally optionally a "c": {"a", "ac", "ab", "abc", "abb", "abbc", ...} * <code>(0|(1(01*0)*1))*</code> denotes the set of binary numbers that are multiples of 3: { ε, "0", "00", "11", "000", "011", "110", "0000", "0011", "0110", "1001", "1100", "1111", "00000", ...}
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)