Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Regular expression
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Syntax== <!-- 'Google Codesearch FAQ' links here. --> A regex ''pattern'' matches a target ''string''. The pattern is composed of a sequence of ''atoms''. An atom is a single point within the regex pattern which it tries to match to the target string. The simplest atom is a literal, but grouping parts of the pattern to match an atom will require using <code>( )</code> as metacharacters. Metacharacters help form: ''atoms''; ''quantifiers'' telling how many atoms (and whether it is a [[#Lazy matching|''greedy'' quantifier]] or not); a logical OR character, which offers a set of alternatives, and a logical NOT character, which negates an atom's existence; and backreferences to refer to previous atoms of a completing pattern of atoms. A match is made, not when all the atoms of the string are matched, but rather when all the pattern atoms in the regex have matched. The idea is to make a small pattern of characters stand for a large number of possible strings, rather than compiling a large list of all the literal possibilities. Depending on the regex processor there are about fourteen metacharacters, characters that may or may not have their [[literal (computer programming)|literal]] character meaning, depending on context, or whether they are "escaped", i.e. preceded by an [[escape sequence]], in this case, the backslash <code>\</code>. Modern and POSIX extended regexes use metacharacters more often than their literal meaning, so to avoid "backslash-osis" or [[leaning toothpick syndrome]], they have a metacharacter escape to a literal mode; starting out, however, they instead have the four bracketing metacharacters <code>( )</code> and <code>{ }</code> be primarily literal, and "escape" this usual meaning to become metacharacters. Common standards implement both. The usual metacharacters are <code> {}[]()^$.|*+?</code> and <code>\</code>. The usual characters that become metacharacters when escaped are <code>dswDSW</code> and <code>N</code>. ===Delimiters=== When entering a regex in a programming language, they may be represented as a usual string literal, hence usually quoted; this is common in C, Java, and Python for instance, where the regex <code>re</code> is entered as <code>"re"</code>. However, they are often written with slashes as [[delimiter]]s, as in <code>/re/</code> for the regex <code>re</code>. This originates in [[ed (text editor)|ed]], where <code>/</code> is the editor command for searching, and an expression <code>/re/</code> can be used to specify a range of lines (matching the pattern), which can be combined with other commands on either side, most famously <code>g/re/p</code> as in [[grep]] ("global regex print"), which is included in most [[Unix]]-based operating systems, such as [[Linux]] distributions. A similar convention is used in [[sed]], where search and replace is given by <code>s/re/replacement/</code> and patterns can be joined with a comma to specify a range of lines as in <code>/re1/,/re2/</code>. This notation is particularly well known due to its use in [[Perl]], where it forms part of the syntax distinct from normal string literals. In some cases, such as sed and Perl, alternative delimiters can be used to avoid collision with contents, and to avoid having to escape occurrences of the delimiter character in the contents. For example, in sed the command <code>s,/,X,</code> will replace a <code>/</code> with an <code>X</code>, using commas as delimiters. ===IEEE POSIX Standard <span class="anchor" id="POSIX"></span>=== The [[Institute of Electrical and Electronics Engineers|IEEE]] [[POSIX]] standard has three sets of compliance: '''BRE''' (Basic Regular Expressions),<ref>ISO/IEC 9945-2:1993 ''Information technology β Portable Operating System Interface (POSIX) β Part 2: Shell and Utilities'', successively revised as ISO/IEC 9945-2:2002 ''Information technology β Portable Operating System Interface (POSIX) β Part 2: System Interfaces'', ISO/IEC 9945-2:2003, and currently ISO/IEC/IEEE 9945:2009 ''Information technology β Portable Operating System Interface (POSIX) Base Specifications, Issue 7''</ref> '''ERE''' (Extended Regular Expressions), and '''SRE''' (Simple Regular Expressions). SRE is [[deprecation|deprecated]],<ref>The Single Unix Specification (Version 2)</ref> in favor of BRE, as both provide [[backward compatibility]]. The subsection below covering the ''character classes'' applies to both BRE and ERE. BRE and ERE work together. ERE adds <code>?</code>, <code>+</code>, and <code>|</code>, and it removes the need to escape the metacharacters <code>( )</code> and <code>{ }</code>, which are ''required'' in BRE. Furthermore, as long as the POSIX standard syntax for regexes is adhered to, there can be, and often is, additional syntax to serve specific (yet POSIX compliant) applications. Although POSIX.2 leaves some implementation specifics undefined, BRE and ERE provide a "standard" which has since been adopted as the default syntax of many tools, where the choice of BRE or ERE modes is usually a supported option. For example, [[GNU]] <code>grep</code> has the following options: "<code>grep -E</code>" for ERE, and "<code>grep -G</code>" for BRE (the default), and "<code>grep -P</code>" for [[Perl]] regexes. Perl regexes have become a de facto standard, having a rich and powerful set of atomic expressions. Perl has no "basic" or "extended" levels. As in POSIX EREs, <code>( )</code> and <code>{ }</code> are treated as metacharacters unless escaped; other metacharacters are known to be literal or symbolic based on context alone. Additional functionality includes [[#Lazy matching|lazy matching]], [[#backreferences|backreferences]], named capture groups, and [[recursion (computer science)|recursive]] patterns. ====POSIX basic and extended==== In the [[POSIX]] standard, Basic Regular Syntax ('''BRE''') requires that the [[metacharacter]]s <code>( )</code> and <code>{ }</code> be designated <code>\(\)</code> and <code>\{\}</code>, whereas Extended Regular Syntax ('''ERE''') does not. {| class="wikitable" |- ! Metacharacter ! Description |- valign="top" !<code>^</code> |Matches the starting position within the string. In line-based tools, it matches the starting position of any line. |- valign="top" !<code>.</code> |Matches any single character (many applications exclude [[newline]]s, and exactly which characters are considered newlines is flavor-, character-encoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, <code>a.c</code> matches "abc", etc., but <code>[a.c]</code> matches only "a", ".", or "c". |- valign="top" !<code>[ ]</code> |A bracket expression. Matches a single character that is contained within the brackets. For example, <code>[abc]</code> matches "a", "b", or "c". <code>[a-z]</code> specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: <code>[abcx-z]</code> matches "a", "b", "c", "x", "y", or "z", as does <code>[a-cx-z]</code>. The <code>-</code> character is treated as a literal character if it is the last or the first (after the <code>^</code>, if present) character within the brackets: <code>[abc-]</code>, <code>[-abc]</code>, <code>[^-abc]</code>. Backslash escapes are not allowed. The <code>]</code> character can be included in a bracket expression if it is the first (after the <code>^</code>, if present) character: <code>[]abc]</code>, <code>[^]abc]</code>. |- valign="top" !<code>[^ ]</code> |Matches a single character that is not contained within the brackets. For example, <code>[^abc]</code> matches any character other than "a", "b", or "c". <code>[^a-z]</code> matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed. |- valign="top" !<code>$</code> |Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line. |- valign="top" !<code>( )</code> |Defines a marked subexpression, also called a capturing group, which is essential for extracting the desired part of the text (See also the next entry, <code>\''n''</code>). ''BRE mode requires {{nowrap|<code>\( \)</code>}}.'' |- valign="top" !<code>\''n''</code> |Matches what the ''n''th marked subexpression matched, where ''n'' is a digit from 1 to 9. This construct is defined in the POSIX standard.<ref>{{cite book |section-url=https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_06 |publisher=The Open Group |title=The Open Group Base Specifications Issue 7, 2018 edition |section=9.3.6 BREs Matching Multiple Characters |year=2017 |access-date=December 10, 2023}}</ref> Some tools allow referencing more than nine capturing groups. Also known as a back-reference, this feature is supported in BRE mode. |- valign="top" !<code>*</code> |Matches the preceding element zero or more times. For example, <code>ab*c</code> matches "ac", "abc", "abbbc", etc. <code>[xyz]*</code> matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. <code>(ab)*</code> matches "", "ab", "abab", "ababab", and so on. |- valign="top" !{{nowrap|<code>{''m'',''n''}</code>}} |Matches the preceding element at least ''m'' and not more than ''n'' times. For example, <code>a{3,5}</code> matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regexes. BRE mode requires <code>{{nowrap|\{''m'',''n''\}}}</code>. |} '''Examples:''' * <code>.at</code> matches any three-character string ending with "at", including "hat", "cat", "bat", "4at", "#at" and " at" (starting with a space). * <code>[hc]at</code> matches "hat" and "cat". * <code>[^b]at</code> matches all strings matched by <code>.at</code> except "bat". * <code>[^hc]at</code> matches all strings matched by <code>.at</code> other than "hat" and "cat". * <code>^[hc]at</code> matches "hat" and "cat", but only at the beginning of the string or line. * <code>[hc]at$</code> matches "hat" and "cat", but only at the end of the string or line. * <code>\[.\]</code> matches any single character surrounded by "[" and "]" since the brackets are escaped, for example: "[a]", "[b]", "[7]", "[@]", "[]]", and "[ ]" (bracket space bracket). * <code>s.*</code> matches s followed by zero or more characters, for example: "s", "saw", "seed", "s3w96.7", and "s6#h%(>>>m n mQ". According to Russ Cox, the POSIX specification requires ambiguous subexpressions to be handled in a way different from Perl's. The committee replaced Perl's rules with one that is simple to explain, but the new "simple" rules are actually more complex to implement: they were incompatible with pre-existing tooling and made it essentially impossible to define a "lazy match" (see below) extension. As a result, very few programs actually implement the POSIX subexpression rules (even when they implement other parts of the POSIX syntax).<ref>{{cite web |title=Regular Expression Matching: the Virtual Machine Approach |url=https://swtch.com/~rsc/regexp/regexp2.html |author=Russ Cox |year=2009 |website=swtch.com |quote=Digression: POSIX Submatching}}</ref> ====Metacharacters in POSIX extended==== The meaning of metacharacters [[escape character|escaped]] with a backslash is reversed for some characters in the POSIX Extended Regular Expression ('''ERE''') syntax. With this syntax, a backslash causes the metacharacter to be treated as a literal character. So, for example, <code>\( \)</code> is now <code>( )</code> and <code>\{ \}</code> is now <code>{ }</code>. Additionally, support is removed for <code>\''n''</code> backreferences and the following metacharacters are added: {| class="wikitable" |- ! Metacharacter ! Description |- valign="top" ! <code>?</code> | Matches the preceding element zero or one time. For example, <code>ab?c</code> matches only "ac" or "abc". |- ! <code>+</code> | Matches the preceding element one or more times. For example, <code>ab+c</code> matches "abc", "abbc", "abbbc", and so on, but not "ac". |- ! <code><nowiki>|</nowiki></code> | The choice (also known as alternation or set union) operator matches either the expression before or the expression after the operator. For example, <code><nowiki>abc|def</nowiki></code> matches "abc" or "def". |} '''Examples:''' * <code>[hc]?at</code> matches "at", "hat", and "cat". * <code>[hc]*at</code> matches "at", "hat", "cat", "hhat", "chat", "hcat", "cchchat", and so on. * <code>[hc]+at</code> matches "hat", "cat", "hhat", "chat", "hcat", "cchchat", and so on, but not "at". * <code>cat|dog</code> matches "cat" or "dog". POSIX Extended Regular Expressions can often be used with modern Unix utilities by including the [[command line]] flag <var>-E</var>. ====Character classes==== The character class is the most basic regex concept after a literal match. It makes one small sequence of characters match a larger set of characters. For example, <syntaxhighlight lang="ragel" inline>[A-Z]</syntaxhighlight> could stand for any uppercase letter in the English alphabet, and <syntaxhighlight lang="ragel" inline>\d</syntaxhighlight> could mean any digit. Character classes apply to both POSIX levels. When specifying a range of characters, such as <syntaxhighlight lang="ragel" inline>[a-Z]</syntaxhighlight> (i.e. lowercase ''<syntaxhighlight lang="ragel" inline>a</syntaxhighlight>'' to uppercase ''<syntaxhighlight lang="ragel" inline>Z</syntaxhighlight>''), the computer's locale settings determine the contents by the numeric ordering of the character encoding. They could store digits in that sequence, or the ordering could be ''abc...zABC...Z'', or ''aAbBcC...zZ''. So the POSIX standard defines a character class, which will be known by the regex processor installed. Those definitions are in the following table: {| class="wikitable sortable" |- ! Description ! POSIX !! Perl/Tcl !! Vim !! Java !! ASCII |- | ASCII characters | | | | <syntaxhighlight lang="ragel" inline>\p{ASCII}</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[\x00-\x7F]</syntaxhighlight> |- | Alphanumeric characters | <syntaxhighlight lang="ragel" inline>[:alnum:]</syntaxhighlight> | | | <syntaxhighlight lang="ragel" inline>\p{Alnum}</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[A-Za-z0-9]</syntaxhighlight> |- | Alphanumeric characters plus "_" | | <syntaxhighlight lang="ragel" inline>\w</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\w</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\w</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[A-Za-z0-9_]</syntaxhighlight> |- | Non-word characters | | <syntaxhighlight lang="ragel" inline>\W</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\W</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\W</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[^A-Za-z0-9_]</syntaxhighlight> |- | Alphabetic characters | <syntaxhighlight lang="ragel" inline>[:alpha:]</syntaxhighlight> | | <syntaxhighlight lang="ragel" inline>\a</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\p{Alpha}</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[A-Za-z]</syntaxhighlight> |- | Space and tab | <syntaxhighlight lang="ragel" inline>[:blank:]</syntaxhighlight> | | <syntaxhighlight lang="ragel" inline>\s</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\p{Blank}</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[ \t]</syntaxhighlight> |- | Word boundaries | | <syntaxhighlight lang="ragel" inline>\b</syntaxhighlight> | <code>\< \></code> | <syntaxhighlight lang="ragel" inline>\b</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>(?<=\W)(?=\w)|(?<=\w)(?=\W)</syntaxhighlight> |- | Non-word boundaries | | | | <syntaxhighlight lang="ragel" inline>\B</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>(?<=\W)(?=\W)|(?<=\w)(?=\w)</syntaxhighlight> |- | [[Control character]]s | <syntaxhighlight lang="ragel" inline>[:cntrl:]</syntaxhighlight> | | | <syntaxhighlight lang="ragel" inline>\p{Cntrl}</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[\x00-\x1F\x7F]</syntaxhighlight> |- | Digits | <syntaxhighlight lang="ragel" inline>[:digit:]</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\d</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\d</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\p{Digit}</syntaxhighlight> or <syntaxhighlight lang="ragel" inline>\d</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[0-9]</syntaxhighlight> |- | Non-digits | | <syntaxhighlight lang="ragel" inline>\D</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\D</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\D</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[^0-9]</syntaxhighlight> |- | Visible characters | <syntaxhighlight lang="ragel" inline>[:graph:]</syntaxhighlight> | | | <syntaxhighlight lang="ragel" inline>\p{Graph}</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[\x21-\x7E]</syntaxhighlight> |- | Lowercase letters | <syntaxhighlight lang="ragel" inline>[:lower:]</syntaxhighlight> | | <syntaxhighlight lang="ragel" inline>\l</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\p{Lower}</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[a-z]</syntaxhighlight> |- | Visible characters and the space character | <syntaxhighlight lang="ragel" inline>[:print:]</syntaxhighlight> | | <syntaxhighlight lang="ragel" inline>\p</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\p{Print}</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[\x20-\x7E]</syntaxhighlight> |- | Punctuation characters | <syntaxhighlight lang="ragel" inline>[:punct:]</syntaxhighlight> | | | <syntaxhighlight lang="ragel" inline>\p{Punct}</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[][!"#$%&'()*+,./:;<=>?@\^_`{|}~-]</syntaxhighlight> |- | [[Whitespace character]]s | <syntaxhighlight lang="ragel" inline>[:space:]</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\s</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\_s</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\p{Space}</syntaxhighlight> or <syntaxhighlight lang="ragel" inline>\s</syntaxhighlight> | <code>[ [[\t]][[\r]][[\n]][[\v]][[\f]]]</code> |- | Non-whitespace characters | | <syntaxhighlight lang="ragel" inline>\S</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\S</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\S</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[^ \t\r\n\v\f]</syntaxhighlight> |- | Uppercase letters | <syntaxhighlight lang="ragel" inline>[:upper:]</syntaxhighlight> | | <syntaxhighlight lang="ragel" inline>\u</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\p{Upper}</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[A-Z]</syntaxhighlight> |- | Hexadecimal digits | <syntaxhighlight lang="ragel" inline>[:xdigit:]</syntaxhighlight> | | <syntaxhighlight lang="ragel" inline>\x</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>\p{XDigit}</syntaxhighlight> | <syntaxhighlight lang="ragel" inline>[A-Fa-f0-9]</syntaxhighlight> |} POSIX character classes can only be used within bracket expressions. For example, <syntaxhighlight lang="ragel" inline>[[:upper:]ab]</syntaxhighlight> matches the uppercase letters and lowercase "a" and "b". An additional non-POSIX class understood by some tools is <syntaxhighlight lang="ragel" inline>[:word:]</syntaxhighlight>, which is usually defined as <syntaxhighlight lang="ragel" inline>[:alnum:]</syntaxhighlight> plus underscore. This reflects the fact that in many programming languages these are the characters that may be used in identifiers. The editor [[Vim (text editor)|Vim]] further distinguishes ''word'' and ''word-head'' classes (using the notation <syntaxhighlight lang="ragel" inline>\w</syntaxhighlight> and <syntaxhighlight lang="ragel" inline>\h</syntaxhighlight>) since in many programming languages the characters that can begin an identifier are not the same as those that can occur in other positions: numbers are generally excluded, so an identifier would look like <syntaxhighlight lang="ragel" inline>\h\w*</syntaxhighlight> or <syntaxhighlight lang="ragel" inline>[[:alpha:]_][[:alnum:]_]*</syntaxhighlight> in POSIX notation. Note that what the POSIX regex standards call ''character classes'' are commonly referred to as ''POSIX character classes'' in other regex flavors which support them. With most other regex flavors, the term ''character class'' is used to describe what POSIX calls ''bracket expressions''. ===Perl and PCRE=== {{See also|Perl Compatible Regular Expressions}} Because of its expressive power and (relative) ease of reading, many other utilities and programming languages have adopted syntax similar to [[Perl]]'sβfor example, [[Java (programming language)|Java]], [[JavaScript]], [[Julia (programming language)|Julia]], [[Python (programming language)|Python]], [[Ruby (programming language)|Ruby]], [[Qt (software)|Qt]], Microsoft's [[.NET Framework]], and [[XML Schema (W3C)|XML Schema]]. Some languages and tools such as [[Boost C++ Libraries|Boost]] and [[PHP]] support multiple regex flavors. Perl-derivative regex implementations are not identical and usually implement a subset of features found in Perl 5.0, released in 1994. Perl sometimes does incorporate features initially found in other languages. For example, Perl 5.10 implements syntactic extensions originally developed in PCRE and Python.<ref>{{cite web |title=Perl Regular Expression Documentation |publisher=perldoc.perl.org |url=https://perldoc.perl.org/perlre#PCRE%2fPython-Support |access-date=November 5, 2024 |url-status=live |archive-date=December 31, 2009 |archive-url=https://web.archive.org/web/20091231010052/http://perldoc.perl.org/perlre.html#PCRE%2fPython-Support}}</ref> ===Lazy matching=== In Python and some<!---should be 'many', or 'most', or 'a few', or ...?---> other implementations (e.g. Java), the three common quantifiers (<code>*</code>, <code>+</code> and <code>?</code>) are [[greedy algorithm|greedy]] by default because they match as many characters as possible.<ref name=py-re>{{cite web |title=Regular Expression Syntax |url=https://docs.python.org/3/library/re.html#regular-expression-syntax |website=Python 3.5.0 documentation |publisher=[[Python Software Foundation]] |access-date=10 October 2015 |archive-date=18 July 2018 |archive-url=https://web.archive.org/web/20180718132241/https://docs.python.org/3/library/re.html#regular-expression-syntax}}</ref> The regex <code>".+"</code> (including the double-quotes) applied to the string "Ganymede," he continued, "is the largest moon in the Solar System." matches the entire line (because the entire line begins and ends with a double-quote) instead of matching only the first part, <code>"Ganymede,"</code>. The aforementioned quantifiers may, however, be made ''lazy'' or ''minimal'' or ''reluctant'', matching as few characters as possible, by appending a question mark: <code>".+?"</code> matches only <code>"Ganymede,"</code>.<ref name="py-re"/> ===Possessive matching=== In Java and Python 3.11+,<ref>[https://github.com/python/cpython/issues/34627/ SRE: Atomic Grouping (?>...) is not supported #34627]</ref> quantifiers may be made ''possessive'' by appending a plus sign, which disables backing off (in a backtracking engine), even if doing so would allow the overall match to succeed:<ref name=es-re>{{cite web |title=Essential classes: Regular Expressions: Quantifiers: Differences Among Greedy, Reluctant, and Possessive Quantifiers |url=https://docs.oracle.com/javase/tutorial/essential/regex/quant.html#difs |website=The Java Tutorials |publisher=[[Oracle Corporation|Oracle]] |access-date=23 December 2016 |archive-date=7 October 2020 |archive-url=https://web.archive.org/web/20201007183203/https://docs.oracle.com/javase/tutorial/essential/regex/quant.html#difs |url-status=live}}</ref> While the regex <code>".*"</code> applied to the string "Ganymede," he continued, "is the largest moon in the Solar System." matches the entire line, the regex <code>".*+"</code> does {{em|not match at all}}, because <code>.*+</code> consumes the entire input, including the final <code>"</code>. Thus, possessive quantifiers are most useful with negated character classes, e.g. <code>"[^"]*+"</code>, which matches <code>"Ganymede,"</code> when applied to the same string. Another common extension serving the same function is atomic grouping, which disables backtracking for a parenthesized group. The typical syntax is {{mono|1=(?>group)}}. For example, while {{mono|1=^(wi{{!}}w)i$}} matches both {{mono|wi}} and {{mono|wii}}, {{mono|1=^(?>wi{{!}}w)i$}} only matches {{mono|wii}} because the engine is forbidden from backtracking and so cannot try setting the group to "w" after matching "wi".<ref>{{cite web |title=Atomic Grouping |url=https://www.regular-expressions.info/atomic.html |website=Regex Tutorial |access-date=24 November 2019 |archive-date=7 October 2020 |archive-url=https://web.archive.org/web/20201007183204/https://www.regular-expressions.info/atomic.html |url-status=live}}</ref> Possessive quantifiers are easier to implement than greedy and lazy quantifiers, and are typically more efficient at runtime.<ref name=es-re/> ===IETF I-Regexp=== IETF RFC 9485 describes "I-Regexp: An Interoperable Regular Expression Format". It specifies a limited subset of regular-expression idioms designed to be interoperable, i.e. produce the same effect, in a large number of regular-expression libraries. I-Regexp is also limited to matching, i.e. providing a true or false match between a regular expression and a given piece of text. Thus, it lacks advanced features such as capture groups, lookahead, and backreferences.<ref>{{cite IETF |last1=Bormann |first1=Carsten |last2=Bray |first2=Tim |title=I-Regexp: An Interoperable Regular Expression Format |rfc=9485 |publisher=Internet Engineering Task Force |access-date=11 March 2024}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)