Skip to content

Fully generalize "whole match" in the engine and enable transforming custom types #470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 9, 2022

Conversation

rxwei
Copy link
Contributor

@rxwei rxwei commented Jun 5, 2022

  • Track the whole match as an element of the "capture list" in the matching engine. Do so by emitting code as an implicit capture around the root node.
  • No longer handle matcher as a special case within capture lowering, because the matcher can be arbitrarily nested within "output-forwarding" nodes, such as a changeMatchingOptions non-capturing group. Instead, make the bytecode emitter carry a result value so that a custom output can be propagated through any forwarding nodes.
    Regex {
      Capture(
        SemanticVersionParser()
          .ignoringCase()
          .matchingSemantics(.unicodeScalar)
      ) // This would not work previously.
    }
  • Collapse DSLTree node transform into capture, because a transform can never be standalone (without a capture parent). This greatly simplifies capture lowering.
  • Make the bytecode's capture transform use type (Input, _StoredCapture) -> Any so that it can transform any whole match, not just Substring. This means you can now transform any captured value, including a custom-consuming regex component's result!
    Regex {
      "version:"
      OneOrMore(.whitespace)
      Capture {
        SemanticVersionParser() // Regex<SemanticVersion>
      } transform: {
        // (SemanticVersion) -> SomethingElse
      }
    }
    The transforms of Capture and TryCapture are now generalized from taking Substring to taking generic parameter W (the whole match).
  • Fix an issue where initial options were applied based solely on whether the bytecode had any instructions, failing examples such as ((?i:.)). It now checks whether the first matchable atom has been emitted.

@rxwei rxwei requested a review from milseman June 5, 2022 08:51
@rxwei rxwei force-pushed the whole-match-tracked branch from 5b413ac to 381825d Compare June 5, 2022 12:50
@rxwei rxwei changed the title Track the whole match as a capture in the matching engine. Fully generalize "whole match" in the engine and enable transforming custom types Jun 5, 2022
@rxwei rxwei requested review from natecook1000 and Azoy June 5, 2022 12:51
@rxwei rxwei force-pushed the whole-match-tracked branch 2 times, most recently from 99d01fc to 8437579 Compare June 6, 2022 02:41
@rxwei
Copy link
Contributor Author

rxwei commented Jun 6, 2022

@swift-ci please test

@rxwei
Copy link
Contributor Author

rxwei commented Jun 6, 2022

@swift-ci please test Linux

@rxwei
Copy link
Contributor Author

rxwei commented Jun 6, 2022

For some reason the CI failure only occurs on Linux, while both platforms have the same compiler.

[252/256] Compiling RegexBuilderTests RegexDSLTests.swift
/build/swift-experimental-string-processing/Tests/RegexBuilderTests/RegexDSLTests.swift:586:18: error: the compiler is unable to type-check this expression in reasonable time; try breaking up the expression into distinct sub-expressions
    let regex3 = Regex {
                 ^~~~~~~
Apple Swift version 5.8-dev (LLVM 278d67f38c6a910, Swift ee312bc1e20eb01)
Target: x86_64-apple-macosx11.0
Swift version 5.8-dev (LLVM 278d67f38c6a910, Swift ee312bc1e20eb01)
Target: x86_64-unknown-linux-gnu

@rxwei rxwei force-pushed the whole-match-tracked branch from 8437579 to fef3831 Compare June 6, 2022 05:03
@rxwei
Copy link
Contributor Author

rxwei commented Jun 6, 2022

@swift-ci please test

Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I'm not sure why transform is back on capture, it seems to complicate the model and make it harder to drop the capture enum. Otherwise it looks good.

return nil
}
assert(type(of: result) == resultType)
return result
}
}

func callAsFunction(_ input: Substring) throws -> Any? {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why are these closures all union-ed together into a run-time value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can define a separate CaptureTransform for every capture type, e.g. CaptureTransform and FailableCaptureTransform. But they'd also end up getting unioned as different instruction cases.

@rxwei rxwei force-pushed the whole-match-tracked branch 2 times, most recently from 0adbff7 to 1d5db25 Compare June 9, 2022 21:31
@rxwei
Copy link
Contributor Author

rxwei commented Jun 9, 2022

I'm not sure why transform is back on capture, it seems to complicate the model and make it harder to drop the capture enum.

IMO it's the opposite. I don't think a standalone transform tree node is a good fit because transforms are only ever specified in a Capture or a TryCapture. Before this patch, transform was handled as an unreachable case in Processor.swift because it doesn't make sense when it's not part of a capture; that is, transform does not have a capture index to correspond to unless its parent is a capture.

@rxwei
Copy link
Contributor Author

rxwei commented Jun 9, 2022

@swift-ci please test

@rxwei rxwei force-pushed the whole-match-tracked branch 2 times, most recently from 8c35b1a to fc678d2 Compare June 9, 2022 21:44
@rxwei
Copy link
Contributor Author

rxwei commented Jun 9, 2022

@swift-ci please test

1 similar comment
@rxwei
Copy link
Contributor Author

rxwei commented Jun 9, 2022

@swift-ci please test

@rxwei rxwei force-pushed the whole-match-tracked branch from fc678d2 to d645c5f Compare June 9, 2022 22:15
@rxwei
Copy link
Contributor Author

rxwei commented Jun 9, 2022

@swift-ci please test

1 similar comment
@rxwei
Copy link
Contributor Author

rxwei commented Jun 9, 2022

@swift-ci please test

…custom types

* Track the whole match as an element of the "capture list" in the matching engine. Do so by emitting code as an implicit `capture` around the root node.
* No longer handle `matcher` as a special case within `capture` lowering, because the matcher can be arbitrarily nested within "output-forwarding" nodes, such as a `changeMatchingOptions` non-capturing group. Instead, make the bytecode emitter carry a result value so that a custom output can be propagated through any forwarding nodes.
  ```swift
  Regex {
    Capture(
      SemanticVersionParser()
        .ignoringCase()
        .matchingSemantics(.unicodeScalar)
    ) // This would not work previously.
  }
  ```
* Collapse DSLTree node `transform` into `capture`, because a transform can never be standalone (without a `capture` parent). This greatly simplifies `capture` lowering.
* Make the bytecode's capture transform use type `(Input, _StoredCapture) -> Any` so that it can transform any whole match, not just `Substring`. This means you can now transform any captured value, including a custom-consuming regex component's result!
  ```swift
   Regex {
    "version:"
    OneOrMore(.whitespace)
    Capture {
      SemanticVersionParser() // Regex<SemanticVersion>
    } transform: {
      // (SemanticVersion) -> SomethingElse
    }
  }
  ```
  The transforms of `Capture` and `TryCapture` are now generalized from taking `Substring` to taking generic parameter `W` (the whole match).
* Fix an issue where initial options were applied based solely on whether the bytecode had any instructions, failing examples such as `((?i:.))`. It now checks whether the first matchable atom has been emitted.
@rxwei rxwei force-pushed the whole-match-tracked branch from d645c5f to 9d5280e Compare June 9, 2022 22:57
@rxwei
Copy link
Contributor Author

rxwei commented Jun 9, 2022

@swift-ci please test

1 similar comment
@rxwei
Copy link
Contributor Author

rxwei commented Jun 9, 2022

@swift-ci please test

@rxwei rxwei merged commit fa72f5a into swiftlang:main Jun 9, 2022
@rxwei rxwei deleted the whole-match-tracked branch June 9, 2022 23:37
natecook1000 pushed a commit to natecook1000/swift-experimental-string-processing that referenced this pull request Jun 17, 2022
Fully generalize "whole match" in the engine and enable transforming custom types
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants