Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Regular expression
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Expressive power and compactness=== The formal definition of regular expressions is minimal on purpose, and avoids defining <code>?</code> and <code>+</code>—these can be expressed as follows: <code>a+</code>=<code>aa*</code>, and <code>a?</code>=<code>(a|ε)</code>. Sometimes the [[set complement|complement]] operator is added, to give a ''generalized regular expression''; here ''R<sup>c</sup>'' matches all strings over Σ* that do not match ''R''. In principle, the complement operator is redundant, because it does not grant any more expressive power. However, it can make a regular expression much more concise—eliminating a single complement operator can cause a [[double exponential function|double exponential]] blow-up of its length.<ref>{{harvtxt|Gelade|Neven|2008|p=332|loc=Thm.4.1}}</ref><ref>{{harvtxt|Gruber|Holzer|2008}}</ref><ref>Based on {{harvtxt|Gelade|Neven|2008}}, a regular expression of length about 850 such that its complement has a length about 2<sup>32</sup> can be found at [[:File:RegexComplementBlowup.png]].</ref> Regular expressions in this sense can express the regular languages, exactly the class of languages accepted by [[deterministic finite automata]]. There is, however, a significant difference in compactness. Some classes of regular languages can only be described by deterministic finite automata whose size grows [[exponential growth|exponentially]] in the size of the shortest equivalent regular expressions. The standard example here is the languages ''L<sub>k</sub>'' consisting of all strings over the alphabet {''a'',''b''} whose ''k''th-from-last letter equals ''a''. On the one hand, a regular expression describing ''L''<sub>4</sub> is given by <math>(a\mid b)^*a(a\mid b)(a\mid b)(a\mid b)</math>. Generalizing this pattern to ''L<sub>k</sub>'' gives the expression: : <math>(a\mid b)^*a\underbrace{(a\mid b)(a\mid b)\cdots(a\mid b)}_{k-1\text{ times}}. \, </math> On the other hand, it is known that every deterministic finite automaton accepting the language ''L<sub>k</sub>'' must have at least 2<sup>''k''</sup> states. Luckily, there is a simple mapping from regular expressions to the more general [[nondeterministic finite automata]] (NFAs) that does not lead to such a blowup in size; for this reason NFAs are often used as alternative representations of regular languages. NFAs are a simple variation of the type-3 [[formal grammar|grammars]] of the [[Chomsky hierarchy]].<ref name="HopcroftMotwaniUllman01"/> In the opposite direction, there are many languages easily described by a DFA that are not easily described by a regular expression. For instance, determining the validity of a given [[ISBN]] requires computing the modulus of the integer base 11, and can be easily implemented with an 11-state DFA. However, converting it to a regular expression results in a 2,14 megabytes file .<ref>{{cite web |title=Regular expressions for deciding divisibility |url=https://s3.boskent.com/divisibility-regex/divisibility-regex.html |access-date=2024-02-21 |website=s3.boskent.com}}</ref> Given a regular expression, [[Thompson's construction algorithm]] computes an equivalent nondeterministic finite automaton. A conversion in the opposite direction is achieved by [[Kleene's algorithm]]. Finally, many real-world "regular expression" engines implement features that cannot be described by the regular expressions in the sense of formal language theory; rather, they implement ''regexes''. See [[#Patterns for non-regular languages|below]] for more on this.
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)