Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Regular expression
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Patterns for non-regular languages== Many features found in virtually all modern regular expression libraries provide an expressive power that exceeds the [[regular language]]s. For example, many implementations allow grouping subexpressions with parentheses and recalling the value they match in the same expression (''{{visible anchor|backreferences}}''). This means that, among other things, a pattern can match strings of repeated words like "papa" or "WikiWiki", called ''squares'' in formal language theory. The pattern for these strings is <code>(.+)\1</code>. The language of squares is not regular, nor is it [[context-free language|context-free]], due to the [[pumping lemma for context-free languages|pumping lemma]]. However, [[pattern matching]] with an unbounded number of backreferences, as supported by numerous modern tools, is still [[context-sensitive language|context sensitive]].<ref>{{cite journal |author=Cezar Câmpeanu |author2=Kai Salomaa |author3=Sheng Yu |name-list-style=amp |title=A Formal Study of Practical Regular Expressions |journal=International Journal of Foundations of Computer Science |volume=14 |number=6 |pages=1007–1018 |url=http://137.149.157.5/Articles/index.php?aid=1<!---This url was taken from Câmpeanu's publications page http://www.csit.upei.ca/~ccampeanu/Research/RJ---> |date=Dec 2003 |doi=10.1142/S012905410300214X |access-date=2015-07-03 |archive-date=2015-07-04 |archive-url=https://web.archive.org/web/20150704141706/http://137.149.157.5/Articles/index.php?aid=1 |url-status=live}} Theorem 3 (p.9)</ref> The general problem of matching any number of backreferences is [[NP-complete]], and the execution time for known algorithms grows exponentially by the number of backreference groups used.<ref>{{cite web |title=Perl Regular Expression Matching is NP-Hard |url=https://perl.plover.com/NPC/ |website=perl.plover.com |access-date=2019-11-21 |archive-date=2020-10-07 |archive-url=https://web.archive.org/web/20201007183205/https://perl.plover.com/NPC/ |url-status=live}}</ref> However, many tools, libraries, and engines that provide such constructions still use the term ''regular expression'' for their patterns. This has led to a nomenclature where the term regular expression has different meanings in [[formal language|formal language theory]] and pattern matching. For this reason, some people have taken to using the term ''regex'', ''regexp'', or simply ''pattern'' to describe the latter. [[Larry Wall]], author of the Perl programming language, writes in an essay about the design of Raku: {{Blockquote|1="Regular expressions" […] are only marginally related to real regular expressions. Nevertheless, the term has grown with the capabilities of our pattern matching engines, so I'm not going to try to fight linguistic necessity here. I will, however, generally call them "regexes" (or "regexen", when I'm in an Anglo-Saxon mood).<ref name="Apocalypse5"/>}} ===Assertions=== {| id="lookbehind" class="floatright wikitable" |- ! Assertion !! Lookbehind !! Lookahead |- ! Positive | style="text-align:center;font-size:125%;"|<code>(?'''<='''{{box|inline=yes|span=yes|type=black|radius=1ex|padding=0 0.5ex|font size=80%|pattern}})</code> | style="text-align:center;font-size:125%;"|<code>(?'''='''{{box|inline=yes|span=yes|type=black|radius=1ex|padding=0 0.5ex|font size=80%|pattern}})</code> |- ! Negative | style="text-align:center;font-size:125%;"|<code>(?'''<!'''{{box|inline=yes|span=yes|type=black|radius=1ex|font size=80%|padding=0 0.5ex|pattern}})</code> | style="text-align:center;font-size:125%;"|<code>(?<span style="padding:1px;">'''!'''</span>{{box|inline=yes|span=yes|type=black|radius=1ex|padding=0 0.5ex|font size=80%|pattern}})</code> |- | colspan="3"|Lookbehind and lookahead assertions<br/>in [[Perl]] regular expressions |} Other features not found in describing regular languages include assertions. These include the ubiquitous {{code|^}} and {{code|$}}, used since at least 1970,<ref>{{cite book |last1=Ritchie |first1=D. M. |last2=Thompson |first2=K. L. |title=QED Text Editor |url=http://cm.bell-labs.com/cm/cs/who/dmr/qedman.pdf |id=MM-70-1373-3 |date=June 1970 |access-date=2022-09-05 |archive-date=2015-02-03 |archive-url=https://wayback.archive-it.org/all/20150203071645/http://cm.bell-labs.com/cm/cs/who/dmr/qedman.pdf |url-status=dead}} Reprinted as "QED Text Editor Reference Manual", MHCC-004, Murray Hill Computing, Bell Laboratories (October 1972).</ref> as well as some more sophisticated extensions like lookaround that appeared in 1994.{{r|perl5}} Lookarounds define the surrounding of a match and do not spill into the match itself, a feature only relevant for the use case of string searching.{{citation needed|date=June 2023}} Some of them can be simulated in a regular language by treating the surroundings as a part of the language as well.<ref>{{cite web |author=Wandering Logic |title=How to simulate lookaheads and lookbehinds in finite state automata? |url=https://cs.stackexchange.com/a/40058 |website=Computer Science Stack Exchange |access-date=24 November 2019 |archive-date=7 October 2020 |archive-url=https://web.archive.org/web/20201007183206/https://cs.stackexchange.com/questions/2557/how-to-simulate-backreferences-lookaheads-and-lookbehinds-in-finite-state-auto/40058 |url-status=live}}</ref> The {{visible anchor |look-ahead assertions}} {{nowrap|1=<code>(?=...)</code>}} and {{nowrap|<code>(?!...)</code>}} have been attested since at least 1994, starting with Perl 5.<ref name=perl5>{{cite web |last=Wall |first=Larry |title=Perl 5: perlre.pod |url=https://github.com/Perl/perl5/blob/a0d0e21ea6ea90a22318550944fe6cb09ae10cda/pod/perlre.pod |date=1994-10-18 |website=GitHub}}</ref> The lookbehind assertions {{nowrap|1=<code>(?<=...)</code>}} and {{nowrap|<code>(?<!...)</code>}} are attested since 1997 in a commit by Ilya Zakharevich to Perl 5.005.<ref>{{cite web |last=Zakharevich |first=Ilya |title=Jumbo Regexp Patch Applied (with Minor Fix-Up Tweaks): Perl/perl5@c277df4 |url=https://github.com/Perl/perl5/commit/c277df42229d99fecbc76f5da53793a409ac66e1 |website=GitHub |date=1997-11-19}}</ref><!-- I emailed Ilya Zakharevich to confirm, and he confirmed that he came up with the notation. -->
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)