Open main menu
Home
Random
Recent changes
Special pages
Community portal
Preferences
About Wikipedia
Disclaimers
Incubator escapee wiki
Search
User menu
Talk
Dark mode
Contributions
Create account
Log in
Editing
Regular expression
(section)
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==Patterns== The phrase ''regular expressions'', or ''regexes'', is often used to mean the specific, standard textual syntax for representing patterns for matching text, as distinct from the mathematical notation described below. Each character in a regular expression (that is, each character in the string describing its pattern) is either a [[metacharacter]], having a special meaning, or a regular character that has a literal meaning. For example, in the regex <code>b.</code>, 'b' is a literal character that matches just 'b', while '.' is a metacharacter that matches every character except a newline. Therefore, this regex matches, for example, 'b%', or 'bx', or 'b5'. Together, metacharacters and literal characters can be used to identify text of a given pattern or process a number of instances of it. Pattern matches may vary from a precise equality to a very general similarity, as controlled by the metacharacters. For example, <code>.</code> is a very general pattern, <code><nowiki>[a-z]</nowiki></code> (match all lower case letters from 'a' to 'z') is less general and <code>b</code> is a precise pattern (matches just 'b'). The metacharacter syntax is designed specifically to represent prescribed targets in a concise and flexible way to direct the automation of text processing of a variety of input data, in a form easy to type using a standard [[ASCII]] [[computer keyboard|keyboard]]. A very simple case of a regular expression in this syntax is to locate a word spelled two different ways in a [[text editor]], the regular expression <code>seriali[sz]e</code> matches both "serialise" and "serialize". [[Wildcard character]]s also achieve this, but are more limited in what they can pattern, as they have fewer metacharacters and a simple language-base. The usual context of wildcard characters is in [[glob (programming)|globbing]] similar names in a list of files, whereas regexes are usually employed in applications that pattern-match text strings in general. For example, the regex <syntaxhighlight lang="ragel" inline>^[ \t]+|[ \t]+$</syntaxhighlight> matches excess whitespace at the beginning or end of a line. An advanced regular expression that matches any numeral is <syntaxhighlight lang="ragel" inline>[+-]?(\d+(\.\d*)?|\.\d+)([eE][+-]?\d+)?</syntaxhighlight>. [[File:Thompson-kleene-star.svg|right|thumb|[[Thompson's construction algorithm|Translating]] the [[Kleene star]]<br/>(''s''* means "zero or more of ''s''")]] A '''regex processor''' translates a regular expression in the above syntax into an internal representation that can be executed and matched against a [[string (computing)|string]] representing the text being searched in. One possible approach is the [[Thompson's construction algorithm]] to construct a [[nondeterministic finite automaton]] (NFA), which is then [[powerset construction|made deterministic]] and the resulting [[deterministic finite automaton]] (DFA) is run on the target text string to recognize substrings that match the regular expression. The picture shows the NFA scheme <code>''N''(''s''*)</code> obtained from the regular expression <code>''s''*</code>, where ''s'' denotes a simpler regular expression in turn, which has already been [[recursion (computer science)|recursively]] translated to the NFA ''N''(''s'').
Edit summary
(Briefly describe your changes)
By publishing changes, you agree to the
Terms of Use
, and you irrevocably agree to release your contribution under the
CC BY-SA 4.0 License
and the
GFDL
. You agree that a hyperlink or URL is sufficient attribution under the Creative Commons license.
Cancel
Editing help
(opens in new window)