Skip to content

Commit 9a8f6e3

Browse files
nirvdrumbyroot
authored andcommitted
Cheaply derive code range for String#b return value
The result of String#b is a string with an ASCII_8BIT/BINARY encoding. That encoding is ASCII-compatible and has no byte sequences that are invalid for the encoding. If we know the receiver's code range, we can derive the resulting string's code range without needing to perform a full code range scan.
1 parent 9e6d07f commit 9a8f6e3

File tree

1 file changed

+17
-1
lines changed

1 file changed

+17
-1
lines changed

string.c

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10779,7 +10779,23 @@ rb_str_b(VALUE str)
1077910779
str2 = str_alloc_embed(rb_cString, RSTRING_EMBED_LEN(str) + TERM_LEN(str));
1078010780
}
1078110781
str_replace_shared_without_enc(str2, str);
10782-
ENC_CODERANGE_CLEAR(str2);
10782+
10783+
// BINARY strings can never be broken; they're either 7-bit ASCII or VALID.
10784+
// If we know the receiver's code range then we know the result's code range.
10785+
int cr = ENC_CODERANGE(str);
10786+
switch (cr) {
10787+
case ENC_CODERANGE_7BIT:
10788+
ENC_CODERANGE_SET(str2, ENC_CODERANGE_7BIT);
10789+
break;
10790+
case ENC_CODERANGE_BROKEN:
10791+
case ENC_CODERANGE_VALID:
10792+
ENC_CODERANGE_SET(str2, ENC_CODERANGE_VALID);
10793+
break;
10794+
default:
10795+
ENC_CODERANGE_CLEAR(str2);
10796+
break;
10797+
}
10798+
1078310799
return str2;
1078410800
}
1078510801

0 commit comments

Comments
 (0)