-
Notifications
You must be signed in to change notification settings - Fork 461
Add regex sets. #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add regex sets. #173
Conversation
@alexcrichton This PR includes new additions to the API, but no breaking changes. The additions are completely separate from the primary |
Regex sets permit matching multiple (possibly overlapping) regular expressions in a single scan of the search text. This adds a few new types, with `RegexSet` being the primary one. All matching engines support regex sets, including the lazy DFA. This commit also refactors a lot of the code around handling captures into a central `Search`, which now also includes a set of matches that is used by regex sets to determine which regex has matched. We also merged the `Program` and `Insts` type, which were split up when adding the lazy DFA, but the code seemed more complicated because of it. Closes #156.
Nice! I couldn't immediately come up with a use case for these, but sounds plausible to me! |
URL router, user agent matcher. Generally any time you have lots of patterns you need to match. RE2 has it. :) |
Sounds like a plan to me |
/// alternate can match at a time. | ||
/// | ||
/// For example, consider regular expressions to match email addresses and | ||
/// domains: `[a-z]+@[a-z]+\.(com|org|net)` and `[a-z]+\.(com|org|net)`. If a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would make sense to use a different example here? It seems like there are already plenty of incomplete/inaccurate regexes around the internet the purport to match e-mail addresses without adding more. It seems like the docs should at least call out the fact that these are grossly simplified and will fail to match many valid e-mail addresses, so that nobody copy-pastes them without thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to indulge alternative examples.
Regex sets permit matching multiple (possibly overlapping) regular
expressions in a single scan of the search text. This adds a few new
types, with
RegexSet
being the primary one.All matching engines support regex sets, including the lazy DFA.
This commit also refactors a lot of the code around handling captures
into a central
Search
, which now also includes a set of matches thatis used by regex sets to determine which regex has matched.
We also merged the
Program
andInsts
type, which were split up whenadding the lazy DFA, but the code seemed more complicated because of it.
Closes #156.