Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Regular expression
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
====POSIX basic and extended==== In the [[POSIX]] standard, Basic Regular Syntax ('''BRE''') requires that the [[metacharacter]]s <code>( )</code> and <code>{ }</code> be designated <code>\(\)</code> and <code>\{\}</code>, whereas Extended Regular Syntax ('''ERE''') does not. {| class="wikitable" |- ! Metacharacter ! Description |- valign="top" !<code>^</code> |Matches the starting position within the string. In line-based tools, it matches the starting position of any line. |- valign="top" !<code>.</code> |Matches any single character (many applications exclude [[newline]]s, and exactly which characters are considered newlines is flavor-, character-encoding-, and platform-specific, but it is safe to assume that the line feed character is included). Within POSIX bracket expressions, the dot character matches a literal dot. For example, <code>a.c</code> matches "abc", etc., but <code>[a.c]</code> matches only "a", ".", or "c". |- valign="top" !<code>[ ]</code> |A bracket expression. Matches a single character that is contained within the brackets. For example, <code>[abc]</code> matches "a", "b", or "c". <code>[a-z]</code> specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: <code>[abcx-z]</code> matches "a", "b", "c", "x", "y", or "z", as does <code>[a-cx-z]</code>. The <code>-</code> character is treated as a literal character if it is the last or the first (after the <code>^</code>, if present) character within the brackets: <code>[abc-]</code>, <code>[-abc]</code>, <code>[^-abc]</code>. Backslash escapes are not allowed. The <code>]</code> character can be included in a bracket expression if it is the first (after the <code>^</code>, if present) character: <code>[]abc]</code>, <code>[^]abc]</code>. |- valign="top" !<code>[^ ]</code> |Matches a single character that is not contained within the brackets. For example, <code>[^abc]</code> matches any character other than "a", "b", or "c". <code>[^a-z]</code> matches any single character that is not a lowercase letter from "a" to "z". Likewise, literal characters and ranges can be mixed. |- valign="top" !<code>$</code> |Matches the ending position of the string or the position just before a string-ending newline. In line-based tools, it matches the ending position of any line. |- valign="top" !<code>( )</code> |Defines a marked subexpression, also called a capturing group, which is essential for extracting the desired part of the text (See also the next entry, <code>\''n''</code>). ''BRE mode requires {{nowrap|<code>\( \)</code>}}.'' |- valign="top" !<code>\''n''</code> |Matches what the ''n''th marked subexpression matched, where ''n'' is a digit from 1 to 9. This construct is defined in the POSIX standard.<ref>{{cite book |section-url=https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html#tag_09_03_06 |publisher=The Open Group |title=The Open Group Base Specifications Issue 7, 2018 edition |section=9.3.6 BREs Matching Multiple Characters |year=2017 |access-date=December 10, 2023}}</ref> Some tools allow referencing more than nine capturing groups. Also known as a back-reference, this feature is supported in BRE mode. |- valign="top" !<code>*</code> |Matches the preceding element zero or more times. For example, <code>ab*c</code> matches "ac", "abc", "abbbc", etc. <code>[xyz]*</code> matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. <code>(ab)*</code> matches "", "ab", "abab", "ababab", and so on. |- valign="top" !{{nowrap|<code>{''m'',''n''}</code>}} |Matches the preceding element at least ''m'' and not more than ''n'' times. For example, <code>a{3,5}</code> matches only "aaa", "aaaa", and "aaaaa". This is not found in a few older instances of regexes. BRE mode requires <code>{{nowrap|\{''m'',''n''\}}}</code>. |} '''Examples:''' * <code>.at</code> matches any three-character string ending with "at", including "hat", "cat", "bat", "4at", "#at" and " at" (starting with a space). * <code>[hc]at</code> matches "hat" and "cat". * <code>[^b]at</code> matches all strings matched by <code>.at</code> except "bat". * <code>[^hc]at</code> matches all strings matched by <code>.at</code> other than "hat" and "cat". * <code>^[hc]at</code> matches "hat" and "cat", but only at the beginning of the string or line. * <code>[hc]at$</code> matches "hat" and "cat", but only at the end of the string or line. * <code>\[.\]</code> matches any single character surrounded by "[" and "]" since the brackets are escaped, for example: "[a]", "[b]", "[7]", "[@]", "[]]", and "[ ]" (bracket space bracket). * <code>s.*</code> matches s followed by zero or more characters, for example: "s", "saw", "seed", "s3w96.7", and "s6#h%(>>>m n mQ". According to Russ Cox, the POSIX specification requires ambiguous subexpressions to be handled in a way different from Perl's. The committee replaced Perl's rules with one that is simple to explain, but the new "simple" rules are actually more complex to implement: they were incompatible with pre-existing tooling and made it essentially impossible to define a "lazy match" (see below) extension. As a result, very few programs actually implement the POSIX subexpression rules (even when they implement other parts of the POSIX syntax).<ref>{{cite web |title=Regular Expression Matching: the Virtual Machine Approach |url=https://swtch.com/~rsc/regexp/regexp2.html |author=Russ Cox |year=2009 |website=swtch.com |quote=Digression: POSIX Submatching}}</ref>
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)