gh-100792: Make `email.message.Message.contains` twice as fast #100793

sobolevn · 2023-01-06T11:06:10Z

See my micro-benchmarks in the original issue.

Issue: Make email.message.Message.__contains__ faster #100792

hauntsaninja · 2023-01-07T06:52:04Z

If we're doing microoptimizations, I think any is sometimes faster than the Python loop (and arguably more readable). Would you mind benchmarking that as well?

sobolevn · 2023-01-07T07:37:16Z

No, in this case it is slower:

    def __contains__(self, name):
        name_lower = name.lower()
        return any(name_lower == k.lower() for k, v in self._headers)

Results:

» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"from" in m'
.....................
Mean +- std dev: 1.81 us +- 0.04 us

» pyperf timeit --setup 'import email; m = email.message_from_file(open("Lib/test/test_email/data/msg_01.txt"))' '"missing" in m'
.....................
Mean +- std dev: 1.76 us +- 0.13 us

eendebakpt · 2023-01-07T07:43:55Z

The code with any looks cleaner to me. The reduced performance seems related to #100762

sobolevn · 2023-01-07T07:57:38Z

@eendebakpt we don't want to make existing code slower just to use some style related thing, do we? :)

I agree that any is great, but in this module for k, v is used in many places:

cpython/Lib/email/message.py

Lines 430 to 431 in 26ff436

for k, v in self._headers:

if k.lower() == lname:
cpython/Lib/email/message.py

Lines 445 to 446 in 26ff436

for k, v in self._headers:

if k.lower() != name:

cpython/Lib/email/message.py

Lines 495 to 497 in 26ff436

    
           name = name.lower() 
        
           for k, v in self._headers: 
        
               if k.lower() == name:

etc

So, I think we can call this pattern native to this module :)

hauntsaninja

Thanks for checking! (and I think I was just wrong about any sometimes being faster than the loop, don't see why it would be)

Anyway, this looks fine to me! cc @JelleZijlstra

One quick note (you're probably aware of this, but in case other potential contributors are reading): CPython is often quite hesitant to accept micro-optimisations and I think this PR comes close to that line. For example, if counterfactually the existing code used any and you changed it to a loop for the same speedup, I think that change would be rejected (without stronger evidence that this is something worth optimising).

(Why is CPython conservative here? First, reviewing changes itself costs maintainer time. Code churn risks bugs, obscures history, and invites more churn. Often micro-optimisations are not robust in the face of differing Python implementations or changes in the interpreter; we should avoid local minima. Such changes often affect readability, but readability is subjective, and this can lead to debate that further eats at maintainer time or leaves contributors feeling unwelcome)

AlexWaygood

FWIW I'd also prefer the cleaner, more idiomatic code using any() -- the precise performance characteristics of any vs a for loop feel like they're subject to change in the future, and I disagree that style decisions made a decade and a half ago should determine the style of new additions to the code base.

But, this is precisely the kind of bikeshedding that @hauntsaninja was talking about. So, I don't want to block the PR based on my style preferences -- it is indeed a nice optimisation :)

Misc/NEWS.d/next/Library/2023-01-06-14-05-15.gh-issue-100792.CEOJth.rst

JelleZijlstra · 2023-01-07T17:18:23Z

No strong opinion here but @hauntsaninja feel free to merge based on your best judgment.

Also, in the future no need to ping me on all PRs any more, though of course you can if you want another opinion.

…EOJth.rst Co-authored-by: Alex Waygood <[email protected]>

AlexWaygood · 2023-01-07T18:40:05Z

See also @pochmann's comments on the issue: #100792 (comment)

hauntsaninja · 2023-01-07T21:25:59Z

Thanks for caring and for making Python faster!

pythongh-100792: Make email.message.Message.__contains__ twice as fast

fe61248

sobolevn requested a review from a team as a code owner January 6, 2023 11:06

bedevere-bot added the awaiting review label Jan 6, 2023

bedevere-bot mentioned this pull request Jan 6, 2023

Make email.message.Message.__contains__ faster #100792

Closed

sobolevn added the performance Performance or resource usage label Jan 6, 2023

sobolevn requested a review from hauntsaninja January 7, 2023 06:36

hauntsaninja approved these changes Jan 7, 2023

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting review labels Jan 7, 2023

AlexWaygood reviewed Jan 7, 2023

View reviewed changes

Misc/NEWS.d/next/Library/2023-01-06-14-05-15.gh-issue-100792.CEOJth.rst Outdated Show resolved Hide resolved

Update Misc/NEWS.d/next/Library/2023-01-06-14-05-15.gh-issue-100792.C…

d2d0c76

…EOJth.rst Co-authored-by: Alex Waygood <[email protected]>

hauntsaninja merged commit 6746135 into python:main Jan 7, 2023

bedevere-bot removed the awaiting merge label Jan 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-100792: Make `email.message.Message.contains` twice as fast #100793

gh-100792: Make `email.message.Message.contains` twice as fast #100793

sobolevn commented Jan 6, 2023 •

edited by bedevere-bot

Loading

hauntsaninja commented Jan 7, 2023 •

edited

Loading

sobolevn commented Jan 7, 2023

eendebakpt commented Jan 7, 2023

sobolevn commented Jan 7, 2023 •

edited

Loading

hauntsaninja left a comment •

edited

Loading

AlexWaygood left a comment •

edited

Loading

JelleZijlstra commented Jan 7, 2023

AlexWaygood commented Jan 7, 2023

hauntsaninja commented Jan 7, 2023

gh-100792: Make email.message.Message.__contains__ twice as fast #100793

gh-100792: Make email.message.Message.__contains__ twice as fast #100793

Conversation

sobolevn commented Jan 6, 2023 • edited by bedevere-bot Loading

hauntsaninja commented Jan 7, 2023 • edited Loading

sobolevn commented Jan 7, 2023

eendebakpt commented Jan 7, 2023

sobolevn commented Jan 7, 2023 • edited Loading

hauntsaninja left a comment • edited Loading

Choose a reason for hiding this comment

AlexWaygood left a comment • edited Loading

Choose a reason for hiding this comment

JelleZijlstra commented Jan 7, 2023

AlexWaygood commented Jan 7, 2023

hauntsaninja commented Jan 7, 2023

gh-100792: Make `email.message.Message.contains` twice as fast #100793

gh-100792: Make `email.message.Message.contains` twice as fast #100793

sobolevn commented Jan 6, 2023 •

edited by bedevere-bot

Loading

hauntsaninja commented Jan 7, 2023 •

edited

Loading

sobolevn commented Jan 7, 2023 •

edited

Loading

hauntsaninja left a comment •

edited

Loading

AlexWaygood left a comment •

edited

Loading