Skip to content

Commit b2ca9c1

Browse files
committed
compile: make Regex::new(r"(?-u:\B)") fail again
This regex failed to compile in `regex <1.8`, but the migration to regex-automata tweaked the rules in a subtle way that permitted it to compile despite the fact that the old/status-quo matching engines can't handle it correctly. By that, I mean that they may permit the \B to match between code units. That in turn results in panicking when slicing a &str. In `regex 1.9`, this regex will actually be able to be compiled, but the matching engines will correctly and robustly never report matches that split UTF-8 code units. For now, we just add code that causes `regex 1.8` to have the same behavior as previous releases. Fixes #1006
1 parent a1a9ebe commit b2ca9c1

File tree

1 file changed

+9
-0
lines changed

1 file changed

+9
-0
lines changed

src/compile.rs

+9
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,15 @@ impl Compiler {
137137
}
138138

139139
fn compile_one(mut self, expr: &Hir) -> result::Result<Program, Error> {
140+
if self.compiled.only_utf8
141+
&& expr.properties().look_set().contains(Look::WordAsciiNegate)
142+
{
143+
return Err(Error::Syntax(
144+
"ASCII-only \\B is not allowed in Unicode regexes \
145+
because it may result in invalid UTF-8 matches"
146+
.to_string(),
147+
));
148+
}
140149
// If we're compiling a forward DFA and we aren't anchored, then
141150
// add a `.*?` before the first capture group.
142151
// Other matching engines handle this by baking the logic into the

0 commit comments

Comments
 (0)