RegExp class Null safety

A regular expression pattern.

Regular expressions (abbreviated as regex or regexp) consist of a sequence of characters that specify a match-checking algorithm for text inputs. Applying a regexp to an input text results either in the regexp matching, or accepting, the text, or the text being rejected. When the regexp matches the text, it further provides some information about how it matched the text.

Dart regular expressions have the same syntax and semantics as JavaScript regular expressions. To learn more about JavaScript regular expressions, see ecma-international.org/ecma-262/9.0/#sec-regexp-regular-expression-objects.

Dart provides the basic regexp matching algorithm as matchAsPrefix, which checks if the regexp matches a part of the input starting at a specific position. If the regexp matches, Dart returns the details of the match as a RegExpMatch.

You can build all the other methods of RegExp from that basic match check.

The most common use of a regexp is to search for a match in the input. The firstMatch method provides this functionality. This method searches a string for the first position where the regexp matches. Again, if a match is found, Dart returns its details as a RegExpMatch.

The following example finds the first match of a regular expression in a string.

RegExp exp = RegExp(r'(\w+)');
String str = 'Parse my string';
RegExpMatch? match = exp.firstMatch(str);
print(match![0]); // "Parse"

Use allMatches to look for all matches of a regular expression in a string.

The following example finds all matches of a regular expression in a string.

RegExp exp = RegExp(r'(\w+)');
String str = 'Parse my string';
Iterable<RegExpMatch> matches = exp.allMatches(str);
for (final m in matches) {
  print(m[0]);
}

The output of the example is:

Parse
my
string

The preceding examples use a raw string, a specific string type that prefixes the string literal with r. Use a raw string to treat each character, including \ and $, in a string as a literal character. Each character then gets passed to the RegExp parser. You should use a raw string as the argument to the RegExp constructor.

Performance Notice: Regular expressions do not resolve issues magically. Anyone can write a regexp that performs inefficiently when applied to some string inputs. Often, such a regxp will perform well enough on small or common inputs, but have pathological performance on large and uncommon inputs. This inconsistent behavior makes performance issues harder to detect in testing.

A regexp might not find text any faster than using String operations to inspect a string. The strength of regexp comes from the ability to specify somewhat complicated patterns in very few characters. These regexps provide reasonable efficiency in most common cases. This conciseness comes at a cost of readability. Due to their syntactic complexity, regexes cannot be considered self documenting.

Dart regexps implement the ECMAScript RegExp specification. This specification provides a both common and well-known regexp behavior. When compiling Dart for the web, the compiled code can use the browser’s regexp implementation.

The specification defines ECMAScript regexp behavior using backtracking. When a regexp can choose between different ways to match, it tries each way in the order given in the pattern. For example: RegExp(r"(foo|bar)baz") wants to check for foo or bar, so it checks for foo first. If continuing along that path doesn't match the input, the regexp implementation backtracks. The implementation resets to the original state from before checking for foo, forgetting all the work it has done after that, and then tries the next choice; bar in this example.

The specification defines these choices and the order in which they must be attempted. If a regexp could match an input in more than one way, the order of the choices decides which match the regexp returns. Commonly used regexps order their matching choices to ensure a specific result. The ECMAScript regexp specification limits how Dart can implement regular expressions. It must be a backtracking implementation which checks choices in a specific order. Dart cannot choose a different regexp implementation, because then regexp matching would behave differently.

The backtracking approach works, but at a cost. For some regexps and some inputs, finding a correct match can take a lot of tries. It can take even more tries to reject an input that the regexp almost matches.

A well-known dangerous regexp pattern comes from nesting quantifiers like *:

var re = RegExp(r"^(a*|b)*c");
print(re.hasMatch("aaaaaaaaaaaaaaaaaaaaaaaaaaaaa"));

The regexp pattern doesn't match the input string of only as as the input doesn’t contain the required c. There exists an exponential number of different ways for (a*|b)* to match all the as. The backtracking regexp implementation tries all of them before deciding that none of those can lead to a complete match. Each extra a added to the input doubles the time the regexp takes to return false. (When backtracking has this exponential potential, it is called “catastrophic backtracking”).

Sequential quantifiers provide another dangerous pattern, but they provide “only” polynomial complexity.

// Like `\w*-\d`, but check for `b` and `c` in that order.
var re = RegExp(r"^\w*(b)?\w*(c)?\w*-\d");
print(re.hasMatch("a" * 512));

Again the input doesn’t match, but RegExp must try n3 ways to match the n as before deciding that. Doubling the input’s length increases the time to return false eightfold. This exponent increases with the number of sequential quantifiers.

Both of these patterns look trivial when reduced to such simple regexps. However, these "trivial" patterns often arise as parts of more complicated regular expressions, where your ability to find the problem gets more difficult.

In general, if a regexp has potential for super-linear complexity, you can craft an input that takes an inordinate amount of time to search. These patterns can then be used for denial of service attacks if you apply vulnerable regexp patterns to user-provided inputs.

No guaranteed solution exists for this problem. Be careful to not use regexps with super-linear behavior where the program may match that regexp against inputs with no guaranteed match.

Rules of thumb to avoid regexps with super-linear execution time include:

  • Whenever the regexp has a choice, try to make sure that the choice can be made based on the next character (or very limited look-ahead). This limits the need to perform a lot of computation along both choices.
  • When using quantifiers, ensure that the same string cannot match both one and more-than-one iteration of the quantifier's regular expression. (For (a*|b)*, the string "aa" can match both (a*|b){1} and (a*|b){2}.)
  • Most uses of Dart regular expressions search for a match, for example using firstMatch. If you do not anchor the pattern to the start of a line or input using ^, this search acts as if the regexp began with an implicit [^]*. Starting your actual regular expression with .* then results in potential quadratic behavior for the search. Use anchors or matchAsPrefix where appropriate, or avoid starting the regexp with a quantified pattern.
  • For experts only: Neither Dart nor ECMAScript have general “atomic grouping”. Other regular expression dialects use this to limit backtracking. If an atomic capture group succeeds once, the regexp cannot backtrack into the same match later. As lookarounds also serve as atomic groups, something similar can be achieved using a lookahead: var re = RegExp(r"^(?=((a*|b)*))\1d"); The preceding example does the same inefficient matching of (a*|b)*. Once the regexp has matched as far as possible, it completes the positive lookahead. Then it skips what the lookahead matched using a back-reference. After that, it can no longer backtrack and try other combinations of as.

Try to reduce how many ways the regexp can match the same string. That reduces the number of possible backtracks performed when the regexp does not find a match. Several guides to improving the performance of regular expressions exist on the internet. Use these as inspirations, too.

Implemented types

Constructors

RegExp(String source, {bool multiLine = false, bool caseSensitive = true, @Since("2.4") bool unicode = false, @Since("2.4") bool dotAll = false})
Constructs a regular expression.
factory

Properties

hashCode int
The hash code for this object.
read-onlyinherited
isCaseSensitive bool
Whether this regular expression is case sensitive.
read-only
isDotAll bool
Whether "." in this regular expression matches line terminators.
read-only
isMultiLine bool
Whether this regular expression matches multiple lines.
read-only
isUnicode bool
Whether this regular expression uses Unicode mode.
read-only
pattern String
The regular expression pattern source of this RegExp.
read-only
runtimeType Type
A representation of the runtime type of the object.
read-onlyinherited

Methods

allMatches(String input, [int start = 0]) Iterable<RegExpMatch>
Matches this pattern against the string repeatedly.
override
firstMatch(String input) RegExpMatch?
Finds the first match of the regular expression in the string input.
hasMatch(String input) bool
Checks whether this regular expression has a match in the input.
matchAsPrefix(String string, [int start = 0]) Match?
Matches this pattern against the start of string.
inherited
noSuchMethod(Invocation invocation) → dynamic
Invoked when a non-existent method or property is accessed.
inherited
stringMatch(String input) String?
Finds the string of the first match of this regular expression in input.
toString() String
A string representation of this object.
inherited

Operators

operator ==(Object other) bool
The equality operator.
inherited

Static Methods

escape(String text) String
Creates regular expression syntax that matches the input text.