Skip to content

Commit 88a2a62

Browse files
committed
syntax: fix 'is_match_empty' predicate
This was incorrectly defined for \b. Previously, I had erroneously made it return true only for \B since \B matches '' and \b does not match ''. However, \b does match the empty string. Like \B, it only matches a subset of empty strings, depending on what the surrounding context is. The important bit is that it can match *an* empty string, not that it matches *the* empty string. We were not yet using this predicate anywhere in the regex crate, so we just fix the implementation and update the tests. This does present a compatibility hazard for anyone who was using this function, but as of this time, I'm considering this a bug fix since \b clearly matches an empty string. Fixes #859
1 parent 72f09f1 commit 88a2a62

File tree

3 files changed

+17
-7
lines changed

3 files changed

+17
-7
lines changed

Diff for: CHANGELOG.md

+3
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,11 @@
11
TBD
22
===
3+
The below are changes for the next release, which is to be determined.
34

45
* [BUG #680](https://github.com/rust-lang/regex/issues/680):
56
Fixes a bug where `[[:alnum:][:^ascii:]]` dropped `[:alnum:]` from the class.
7+
* [BUG #859](https://github.com/rust-lang/regex/issues/859):
8+
Fixes a bug where `Hir::is_match_empty` returned `false` for `\b`.
69

710

811
1.5.5 (2022-03-08)

Diff for: regex-syntax/src/hir/mod.rs

+9-5
Original file line numberDiff line numberDiff line change
@@ -334,9 +334,13 @@ impl Hir {
334334
info.set_any_anchored_end(false);
335335
info.set_literal(false);
336336
info.set_alternation_literal(false);
337-
// A negated word boundary matches the empty string, but a normal
338-
// word boundary does not!
339-
info.set_match_empty(word_boundary.is_negated());
337+
// A negated word boundary matches '', so that's fine. But \b does not
338+
// match \b, so why do we say it can match the empty string? Well,
339+
// because, if you search for \b against 'a', it will report [0, 0) and
340+
// [1, 1) as matches, and both of those matches correspond to the empty
341+
// string. Thus, only *certain* empty strings match \b, which similarly
342+
// applies to \B.
343+
info.set_match_empty(true);
340344
// Negated ASCII word boundaries can match invalid UTF-8.
341345
if let WordBoundary::AsciiNegate = word_boundary {
342346
info.set_always_utf8(false);
@@ -661,8 +665,8 @@ impl Hir {
661665
/// Return true if and only if the empty string is part of the language
662666
/// matched by this regular expression.
663667
///
664-
/// This includes `a*`, `a?b*`, `a{0}`, `()`, `()+`, `^$`, `a|b?`, `\B`,
665-
/// but not `a`, `a+` or `\b`.
668+
/// This includes `a*`, `a?b*`, `a{0}`, `()`, `()+`, `^$`, `a|b?`, `\b`
669+
/// and `\B`, but not `a` or `a+`.
666670
pub fn is_match_empty(&self) -> bool {
667671
self.info.is_match_empty()
668672
}

Diff for: regex-syntax/src/hir/translate.rs

+5-2
Original file line numberDiff line numberDiff line change
@@ -3139,6 +3139,9 @@ mod tests {
31393139
assert!(t(r"\pL*").is_match_empty());
31403140
assert!(t(r"a*|b").is_match_empty());
31413141
assert!(t(r"b|a*").is_match_empty());
3142+
assert!(t(r"a|").is_match_empty());
3143+
assert!(t(r"|a").is_match_empty());
3144+
assert!(t(r"a||b").is_match_empty());
31423145
assert!(t(r"a*a?(abcd)*").is_match_empty());
31433146
assert!(t(r"^").is_match_empty());
31443147
assert!(t(r"$").is_match_empty());
@@ -3148,6 +3151,8 @@ mod tests {
31483151
assert!(t(r"\z").is_match_empty());
31493152
assert!(t(r"\B").is_match_empty());
31503153
assert!(t_bytes(r"(?-u)\B").is_match_empty());
3154+
assert!(t(r"\b").is_match_empty());
3155+
assert!(t(r"(?-u)\b").is_match_empty());
31513156

31523157
// Negative examples.
31533158
assert!(!t(r"a+").is_match_empty());
@@ -3157,8 +3162,6 @@ mod tests {
31573162
assert!(!t(r"a{1,10}").is_match_empty());
31583163
assert!(!t(r"b|a").is_match_empty());
31593164
assert!(!t(r"a*a+(abcd)*").is_match_empty());
3160-
assert!(!t(r"\b").is_match_empty());
3161-
assert!(!t(r"(?-u)\b").is_match_empty());
31623165
}
31633166

31643167
#[test]

0 commit comments

Comments
 (0)