@@ -543,8 +543,10 @@ scalar value, even when it is encoded using multiple bytes. When Unicode mode
543
543
is disabled (e.g., `(?-u:.)`), then `.` will match a single byte in all cases.
544
544
* The character classes `\w`, `\d` and `\s` are all Unicode-aware by default.
545
545
Use `(?-u:\w)`, `(?-u:\d)` and `(?-u:\s)` to get their ASCII-only definitions.
546
- * Similarly, `\b` and `\B` use a Unicode definition of a "word" character. To
547
- get ASCII-only word boundaries, use `(?-u:\b)` and `(?-u:\B)`.
546
+ * Similarly, `\b` and `\B` use a Unicode definition of a "word" character.
547
+ To get ASCII-only word boundaries, use `(?-u:\b)` and `(?-u:\B)`. This also
548
+ applies to the special word boundary assertions. (That is, `\b{start}`,
549
+ `\b{end}`, `\b{start-half}`, `\b{end-half}`.)
548
550
* `^` and `$` are **not** Unicode-aware in multi-line mode. Namely, they only
549
551
recognize `\n` (assuming CRLF mode is not enabled) and not any of the other
550
552
forms of line terminators defined by Unicode.
@@ -723,12 +725,16 @@ x{n}? exactly n x
723
725
### Empty matches
724
726
725
727
<pre class="rust">
726
- ^ the beginning of a haystack (or start-of-line with multi-line mode)
727
- $ the end of a haystack (or end-of-line with multi-line mode)
728
- \A only the beginning of a haystack (even with multi-line mode enabled)
729
- \z only the end of a haystack (even with multi-line mode enabled)
730
- \b a Unicode word boundary (\w on one side and \W, \A, or \z on other)
731
- \B not a Unicode word boundary
728
+ ^ the beginning of a haystack (or start-of-line with multi-line mode)
729
+ $ the end of a haystack (or end-of-line with multi-line mode)
730
+ \A only the beginning of a haystack (even with multi-line mode enabled)
731
+ \z only the end of a haystack (even with multi-line mode enabled)
732
+ \b a Unicode word boundary (\w on one side and \W, \A, or \z on other)
733
+ \B not a Unicode word boundary
734
+ \b{start}, \< a Unicode start-of-word boundary (\W|\A on the left, \w on the right)
735
+ \b{end}, \> a Unicode end-of-word boundary (\w on the left, \W|\z on the right))
736
+ \b{start-half} half of a Unicode start-of-word boundary (\W|\A on the left)
737
+ \b{end-half} half of a Unicode end-of-word boundary (\W|\z on the right)
732
738
</pre>
733
739
734
740
The empty regex is valid and matches the empty string. For example, the
@@ -856,28 +862,32 @@ Note that this includes all possible escape sequences, even ones that are
856
862
documented elsewhere.
857
863
858
864
<pre class="rust">
859
- \* literal *, applies to all ASCII except [0-9A-Za-z<>]
860
- \a bell (\x07)
861
- \f form feed (\x0C)
862
- \t horizontal tab
863
- \n new line
864
- \r carriage return
865
- \v vertical tab (\x0B)
866
- \A matches at the beginning of a haystack
867
- \z matches at the end of a haystack
868
- \b word boundary assertion
869
- \B negated word boundary assertion
870
- \123 octal character code, up to three digits (when enabled)
871
- \x7F hex character code (exactly two digits)
872
- \x{10FFFF} any hex character code corresponding to a Unicode code point
873
- \u007F hex character code (exactly four digits)
874
- \u{7F} any hex character code corresponding to a Unicode code point
875
- \U0000007F hex character code (exactly eight digits)
876
- \U{7F} any hex character code corresponding to a Unicode code point
877
- \p{Letter} Unicode character class
878
- \P{Letter} negated Unicode character class
879
- \d, \s, \w Perl character class
880
- \D, \S, \W negated Perl character class
865
+ \* literal *, applies to all ASCII except [0-9A-Za-z<>]
866
+ \a bell (\x07)
867
+ \f form feed (\x0C)
868
+ \t horizontal tab
869
+ \n new line
870
+ \r carriage return
871
+ \v vertical tab (\x0B)
872
+ \A matches at the beginning of a haystack
873
+ \z matches at the end of a haystack
874
+ \b word boundary assertion
875
+ \B negated word boundary assertion
876
+ \b{start}, \< start-of-word boundary assertion
877
+ \b{end}, \> end-of-word boundary assertion
878
+ \b{start-half} half of a start-of-word boundary assertion
879
+ \b{end-half} half of a end-of-word boundary assertion
880
+ \123 octal character code, up to three digits (when enabled)
881
+ \x7F hex character code (exactly two digits)
882
+ \x{10FFFF} any hex character code corresponding to a Unicode code point
883
+ \u007F hex character code (exactly four digits)
884
+ \u{7F} any hex character code corresponding to a Unicode code point
885
+ \U0000007F hex character code (exactly eight digits)
886
+ \U{7F} any hex character code corresponding to a Unicode code point
887
+ \p{Letter} Unicode character class
888
+ \P{Letter} negated Unicode character class
889
+ \d, \s, \w Perl character class
890
+ \D, \S, \W negated Perl character class
881
891
</pre>
882
892
883
893
### Perl character classes (Unicode friendly)
0 commit comments