-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-120196: Reuse find_max_char for bytes objects #120497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many "comparison of distinct pointer types lacks a cast" compiler warnings. Can you have a look? (Look at the PR to see the warnings.)
Ah, shoot. This has to do with the current implementation, and I was working in a different implementation in #120196 . Is it OK to wait for that one to be merged/rejected? Otherwise there will be some annoying merge conflicts that will take extra time. |
It's up to you. This change looks simpler to be merged than the other one which looks controversial. |
Done. It was an easy change and it makes that the compile guards above are indeed applicable because there is no mixing of signed and unsigned pointer types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@pitrou @serhiy-storchaka: Would you mind to review this change?
Currently |
Same. Or technically 2 clock cycles later as it will evaluate whether the return value is larger than 127, but if it is properly inlined it should be 0 clock cycles. EDIT: Sorry, I should have left out the mumbo jumbo. Find_max_char also exits early when it finds a non-ascii character. |
It's about reusing existing code, IMO it's a good thing. |
Could you please provide microbenchmark results for short and long, ASCII and non-ASCII (with non-ASCII bytes at the begin and at the end) byte strings? |
There is no need for benchmarking. It is literally the same code. This is removing a copy paste with a function call. See below. This is the old duplicated code PyObject*
_Py_bytes_isascii(const char *cptr, Py_ssize_t len)
{
const char *p = cptr;
const char *end = p + len;
while (p < end) {
/* Fast path, see in STRINGLIB(utf8_decode) in stringlib/codecs.h
for an explanation. */
if (_Py_IS_ALIGNED(p, ALIGNOF_SIZE_T)) {
/* Help allocation */
const char *_p = p;
while (_p + SIZEOF_SIZE_T <= end) {
size_t value = *(const size_t *) _p;
if (value & ASCII_CHAR_MASK) {
Py_RETURN_FALSE;
}
_p += SIZEOF_SIZE_T;
}
p = _p;
if (_p == end)
break;
}
if ((unsigned char)*p & 0x80) {
Py_RETURN_FALSE;
}
p++;
}
Py_RETURN_TRUE;
} This is the code that PR is now calling Py_LOCAL_INLINE(Py_UCS4)
STRINGLIB(find_max_char)(const STRINGLIB_CHAR *begin, const STRINGLIB_CHAR *end)
{
const unsigned char *p = (const unsigned char *) begin;
const unsigned char *_end = (const unsigned char *)end;
while (p < _end) {
if (_Py_IS_ALIGNED(p, ALIGNOF_SIZE_T)) {
/* Help register allocation */
const unsigned char *_p = p;
while (_p + SIZEOF_SIZE_T <= _end) {
size_t value = *(const size_t *) _p;
if (value & UCS1_ASCII_CHAR_MASK)
return 255;
_p += SIZEOF_SIZE_T;
}
p = _p;
if (p == _end)
break;
}
if (*p++ & 0x80)
return 255;
}
return 127;
} EDIT: And because it is inlined, the compiler can switch the return 255 statements with Py_RETURN_TRUE statements. It should result in the same assembly. |
Return True if B is empty or all characters in B are ASCII,\n\ | ||
False otherwise."); | ||
|
||
// Optimization is copied from ascii_decode in unicodeobject.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than being a copy, an inline function call to virtually the same code is now used.
@serhiy-storchaka Sorry, I didn't mean to be so abrasive. Here are the microbenchmarks: Before:
After
As expected, the results are the same (within run-to-run expected error). |
Thanks for actually checking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. It is always better to check even if the answer looks obvious. We can miss some details, and the compiler can produce unexpected results.
LGTM.
Merged, thank you. Oops, I merged the PR while @serhiy-storchaka was reviewing it :-) |
I will keep that in mind for next time. Thanks for reviewing, both of you. |
Simple duplicate code removal and reuse of other function.