Skip to content

gh-120397: improve the speed of str.count, bytes.count et al. for single characters by about 2x. #120398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jun 13, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Significantly improve the speed of the str.count, bytes.count and
bytearray.count method when the argument is a single character and the
target architecture and compiler support vectorization.
19 changes: 19 additions & 0 deletions Objects/stringlib/fastsearch.h
Original file line number Diff line number Diff line change
Expand Up @@ -753,6 +753,22 @@ STRINGLIB(count_char)(const STRINGLIB_CHAR *s, Py_ssize_t n,
}


static inline Py_ssize_t
STRINGLIB(count_char_no_maximum)(const STRINGLIB_CHAR *s, Py_ssize_t n,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make that function private (I don't think it should be exposed except in this module).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is private, as it is static. The STRINGLIB macro is to prevent name clobbering. This function will be generated for STRINGLIB_CHAR==Py_UCS1, Py_UCS2 and PyUCS4. I think keeping it this way is correct. But I may be wrong of course. How do you suggest making it private?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh just by adding an underscore before its name (I should have been clearer when I said "private", I meant it in the naming but I think we don't care about underscores in C files).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't use underscore prefix in Python for static functions. Moreover, the macro adds a prefix such as ucs1lib_.

const STRINGLIB_CHAR p0)
/* By removing the maximum out of the loop, the compiler can optimize using
vectors */
{
Py_ssize_t i, count = 0;
for (i = 0; i < n; i++) {
if (s[i] == p0) {
count++;
}
}
return count;
}


Py_LOCAL_INLINE(Py_ssize_t)
FASTSEARCH(const STRINGLIB_CHAR* s, Py_ssize_t n,
const STRINGLIB_CHAR* p, Py_ssize_t m,
Expand All @@ -773,6 +789,9 @@ FASTSEARCH(const STRINGLIB_CHAR* s, Py_ssize_t n,
else if (mode == FAST_RSEARCH)
return STRINGLIB(rfind_char)(s, n, p[0]);
else {
if (maxcount == PY_SSIZE_T_MAX) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (maxcount == PY_SSIZE_T_MAX) {
if (maxcount >= n) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maxcount is only used in the replace function, it is very unlikely that this condition will ever be triggered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but there's no reason to check for PY_SSIZE_T_MAX specifically when this works as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are correct. However, this function needs some refactoring, as this maxcount provision is only there for replace. Replace for single characters is special.cased elsewhere, so maxcount is actually always Pyssize_t_max I think. I want to revisit this at a later point.

return STRINGLIB(count_char_no_maximum)(s, n, p[0]);
}
return STRINGLIB(count_char)(s, n, p[0], maxcount);
}
}
Expand Down
Loading