-
Notifications
You must be signed in to change notification settings - Fork 49
[5.7] Fix anchor bugs, de-genericize processor, add ranges collection #531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5.7] Fix anchor bugs, de-genericize processor, add ranges collection #531
Conversation
^ and $ should match the start and end of the callee, even if that callee is a substring. Right now ^ and $ match the start and end of the callee's base string, instead. In addition, ^ and $ should only match the start and end of the callee when replacing a subrange, not the start and end of the subrange.
This prepares for adopting an opaque result type for matches(of:) and ranges(of:). The old, CollectionConsumer-based model moves index-by-index, and isn't aware of the regex's semantic level, which results in inaccurate results for regexes that match at a mid-character index.
* Avoid double execution by avoiding Array init * De-genericize processor, engine, etc. Provides only modest performance improvements (it was already getting specialized), but makes it possible to add String-specific specializations.
@swift-ci please test |
* Allow CustomConsuming types to match w/ zero width We previously asserted if a custom consuming type matches with zero width, but that isn't necessary or good. A custom type can implement a lookaround assertion or act as a tracer. * Rename Processor.advance(to:) to resume(at:) Since the given index doesn’t need to advance, this name is less misleading.
This separates the two different ideas for boundaries in the base input: - subjectBounds: These represent the actual subject in the input string. For a `String` callee, this will cover the entire bounds, while for a `Substring` these will represent the bounds of the substring in the base. - searchBounds: These represent the current search range in the subject. These bounds can be the same as `subjectBounds` or a subrange when searching for subsequent matches or replacing only in a subrange of a string. * firstMatch shouldn't update searchBounds on iteration When we move forward while searching for the first match, the search bounds should stay the same. Only the currentPosition needs to move forward. This will allow us to implement the \G start of match anchor, with which /\Gab/ matches "abab" twice, compared with /^ab/, which only matches once. * Make matches(of:) and ranges(of:) boundary-aware With this change, RegexMatchesCollection keeps the subject bounds and search bounds separately, modifying the search bounds with each iteration. In addition, the replace methods that only operate on a subrange can specify that specifically, getting the correct anchor behavior while only matching within a portion of a string.
@swift-ci please test |
@@ -145,6 +145,10 @@ extension Regex where Output == AnyRegexOutput { | |||
public init(_ pattern: String) throws { | |||
self.init(ast: try parse(pattern, .semantic, .traditional)) | |||
} | |||
|
|||
internal init(_ pattern: String, syntax: SyntaxOptions) throws { | |||
self.init(ast: try parse(pattern, .semantic, syntax)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this will need changing to drop the .semantic
now that #519 has landed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why didn't that pick up the change then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because of the ordering of the changes, the parser recovery PR was written after this change, but cherry-picked before it. I resolved the conflict in my cherry-pick, as I wasn't sure whether this was going to be cherry-picked or not
No description provided.