-
Notifications
You must be signed in to change notification settings - Fork 49
Regex Literal Pitch v2 #187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4afdabf
to
77cbb0e
Compare
77cbb0e
to
381a3c7
Compare
Yeah, I think I agree. I guess it ultimately depends on what the raw syntax rules regarding backslash end up being, if they end up treating backslashes as literal, we may not want to recommend raw syntax, as that would require changing any escape sequences written. We would likely want the user to use the alternative spellings e.g However even if this is not something we support, I still feel there is some value in the compiler at least implementing the heuristic, as it only impacts invalid code, and allows us to effectively emit a diagnostic with a fix-it to change the regex. Though I don't have a good sense of how common the |
A quick pass to flip `/.../` out of the alternatives and into the main syntax. Still needs a bunch of work. Also add some commentary on a regex with `]` as the starting character.
…n. Split out Proposed solution from Detailed design. Parallelize the structure a bit better.
dbd02ad
to
35d9132
Compare
More details and word smithing.
|
||
### Named typed captures | ||
|
||
Regex literals have their capture types statically determined by the capture groups present. This follows the same inference behavior as [the DSL][regex-dsl], and is explored in more detail in *[Strongly Typed Captures][strongly-typed-captures]*. One aspect of this that is currently unique to the literal is the ability to infer labeled tuple elements for named capture groups. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strongly typed captures isn't a proposal itself. Should we link the DSL proposal or else talk about it more here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do link the DSL here ([the DSL][regex-dsl]
), should it be used as the main reference for the typed capture behavior? I was hesitant to talk more about typed captures here, as there's quite a bit to cover, and I believe most of it is shared behavior with the DSL. But maybe an overview would be reasonable?
Co-authored-by: Michael Ilseman <[email protected]>
|
||
```swift | ||
let regex = #/ | ||
/usr/lib/modules/ # Prefix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note this is a slightly different regex to the single-line version as it includes a /
at the start, we probably ought to be consistent, what do you think? I mainly avoided it in the single-line version as GitHub syntax colors it as a comment 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any extra slashes are unintentional and the natural outcome of using slash as both part of a delimiter and an interior character.
It's interesting that #//usr/local/#
and #/user/local//#
would sensibly be lexed as comments by source tools. That's a downside of /
we should mention too. I suppose the workaround is to escape the slash if it's at the very beginning or end, which is another little wrinkle that #/.../#
doesn't alleviate.
|
||
### Extended delimiters `#/.../#`, `##/.../##` | ||
|
||
Backslashes may be used to write forward slashes within the regex literal, e.g `/foo\/bar/`. However, this can be quite syntactically noisy and confusing. To avoid this, a regex literal may be surrounded by an arbitrary number of balanced octothorpes. This changes the delimiter of the literal, and therefore allows the use of forward slashes without escaping. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call #
a pound symbol elsewhere in the proposal, should we be consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW Unicode calls it "number sign", which is less confusable with £ (whose official name is "POUND SIGN"). How does "number sign" consistently sound? Even in the US, we don't dial numbers that often anymore, so I expect "pound sign" to have limited life span, especially in the era of hashtags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use "number sign", though a couple of occurrences were phrased as "number of ...", which sounded a bit awkward, so I changed those to use the #
character directly.
Standardize on "number signs" for mentions of `#` (though a couple of them read better as just the character). Also change the multi-line example to not include a `/` at the start, which matches the single-line version.
This is looking good, the only things I think might want tweaking:
|
- Add Source Compatibility section - Condense comment syntax ambiguity section - Mention `/.../` being less popular in some communities
And remove the old version of the pitch.
@swift-ci please test |
No description provided.