RE2 is a regular expression engine that runs in time linear in the size of the
input. RE2/J is a port of RE2 to pure Java.
.
Java's standard regular expression package, java.util.regex, and many other
widely used regular expression packages such as PCRE, Perl and Python use a
backtracking implementation strategy: when a pattern presents two alternatives
such as a|b, the engine will try to match subpattern a first, and if that
yields no match, it will reset the input stream and try to match b instead.
.
If such choices are deeply nested, this strategy requires an exponential
number of passes over the input data before it can detect whether the input
matches. If the input is large, it is easy to construct a pattern whose
running time would exceed the lifetime of the universe. This creates a
security risk when accepting regular expression patterns from untrusted
sources, such as users of a web application.
.
In contrast, the RE2 algorithm explores all matches simultaneously in a single
pass over the input data by using a nondeterministic finite automaton.
.
There are certain features of PCRE or Perl regular expressions that cannot be
implemented in linear time, for example, backreferences, but the vast majority
of regular expressions patterns in practice avoid such features.
Installed Size: 153.6 kB
Architectures: all