Skip to content

Add caching to bottlenecks in the message store #5605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

Pierre-Sassoulas
Copy link
Member

Type of Changes

Type
🔨 Refactoring

Description

This add caching to bottlenecks in the MessageStore. Next step would be to generate the code of an immutable data structure from the checkers instead of creating and checking it at runtime and then using this structure directly, but I think caching is fine for now. (Especially as we can't do everything this way: For plugin there are still unexpected message id and symbol.)

get_active_msgids was calling .lower() / is_digit on every message id we gave it every time caching it save a lot of call.
There's some function we can't cache because of the way PyLinter is designed.

Tested by linting django (03cadb912c78b769d6bf4a943a2a35fc1d952960) with python3 -m cProfile -m pylint .pylint_primer_tests/django|grep message

Before:

  1494493    0.257    0.000    0.257    0.000 <frozen importlib._bootstrap>:222(_verbose_message)
        3    0.000    0.000    0.000    0.000 _enchant.py:120(find_message)
    67884    0.064    0.000    2.472    0.000 base_checker.py:112(add_message)
      678    0.001    0.000    0.003    0.000 base_checker.py:147(create_message_definition_from_tuple)
       86    0.000    0.000    0.004    0.000 base_checker.py:171(messages)
        1    0.000    0.000    0.000    0.000 base_reporter.py:69(display_messages)
    41158    0.014    0.000    0.014    0.000 file_state.py:132(handle_ignored_message)
     2722    0.007    0.000    0.008    0.000 file_state.py:148(iter_spurious_suppression_messages)
    26728    0.021    0.000    0.032    0.000 linterstats.py:288(increase_single_message_count)
    26728    0.020    0.000    0.020    0.000 linterstats.py:292(increase_single_module_message_count)
        1    0.000    0.000    0.000    0.000 linterstats.py:298(reset_message_count)
        1    0.000    0.000    0.000    0.000 message.py:34(Message)
        1    0.000    0.000    0.001    0.001 message.py:5(<module>)
    26728    0.056    0.000    0.080    0.000 message.py:60(__new__)
        1    0.000    0.000    0.000    0.000 message_definition.py:17(MessageDefinition)
      678    0.000    0.000    0.001    0.000 message_definition.py:18(__init__)
        1    0.000    0.000    0.000    0.000 message_definition.py:4(<module>)
      718    0.000    0.000    0.000    0.000 message_definition.py:47(check_msgid)
      339    0.000    0.000    0.000    0.000 message_definition.py:60(may_be_emitted)
    67886    0.083    0.000    0.083    0.000 message_definition.py:94(check_message_definition)
        1    0.000    0.000    0.000    0.000 message_definition_store.py:15(MessageDefinitionStore)
        1    0.000    0.000    0.000    0.000 message_definition_store.py:21(__init__)
        1    0.000    0.000    0.000    0.000 message_definition_store.py:30(messages)
       43    0.000    0.000    0.005    0.000 message_definition_store.py:35(register_messages_from_checker)
        1    0.000    0.000    0.000    0.000 message_definition_store.py:4(<module>)
      339    0.000    0.000    0.001    0.000 message_definition_store.py:41(register_message)
   833883    0.836    0.000    2.848    0.000 message_definition_store.py:49(get_message_definitions)
   833882    0.353    0.000    0.353    0.000 message_definition_store.py:51(<listcomp>)
   833883    1.231    0.000    1.658    0.000 message_id_store.py:104(get_active_msgids)
        1    0.000    0.000    0.000    0.000 message_id_store.py:12(__init__)
        1    0.000    0.000    0.000    0.000 message_id_store.py:3(<module>)
      339    0.000    0.000    0.000    0.000 message_id_store.py:41(register_message_definition)
      339    0.000    0.000    0.000    0.000 message_id_store.py:50(add_msgid_and_symbol)
       20    0.000    0.000    0.000    0.000 message_id_store.py:58(add_legacy_msgid_and_symbol)
      359    0.000    0.000    0.000    0.000 message_id_store.py:71(check_msgid_and_symbol)
        1    0.000    0.000    0.000    0.000 message_id_store.py:8(MessageIdStore)
    41158    0.047    0.000    0.047    0.000 pylinter.py:1373(_get_message_state_scope)
   765813    0.800    0.000    0.942    0.000 pylinter.py:1391(_is_one_message_enabled)
   765805    1.766    0.000    6.404    0.000 pylinter.py:1423(is_message_enabled)
    67886    0.461    0.000    2.050    0.000 pylinter.py:1447(_add_one_message)
    67886    0.111    0.000    2.408    0.000 pylinter.py:1535(add_message)
      162    0.000    0.000    0.001    0.000 pylinter.py:1596(_message_symbol)
    32/12    0.000    0.000    0.000    0.000 pylinter.py:1621(_get_messages_to_set)
        1    0.000    0.000    0.000    0.000 pylinter.py:768(enable_fail_on_messages)
    38928    0.047    0.000    0.065    0.000 redefined_variable_type.py:65(_check_and_add_messages)
    27697    0.039    0.000    0.043    0.000 refactoring_checker.py:1091(_emit_nested_blocks_message_if_needed)
    26728    0.076    0.000    0.301    0.000 text.py:213(write_message)
    26728    0.050    0.000    0.359    0.000 text.py:221(handle_message)
      153    0.000    0.000    0.000    0.000 utils.py:488(check_messages)
      153    0.000    0.000    0.000    0.000 utils.py:491(store_messages)
        1    0.000    0.000    0.000    0.000 utils.py:63(get_fatal_error_message)

After

  1494493    0.310    0.000    0.310    0.000 <frozen importlib._bootstrap>:222(_verbose_message)
        3    0.000    0.000    0.000    0.000 _enchant.py:120(find_message)
    67884    0.062    0.000    1.959    0.000 base_checker.py:112(add_message)
      678    0.001    0.000    0.003    0.000 base_checker.py:147(create_message_definition_from_tuple)
       86    0.000    0.000    0.004    0.000 base_checker.py:171(messages)
        1    0.000    0.000    0.000    0.000 base_reporter.py:69(display_messages)
    41158    0.017    0.000    0.017    0.000 file_state.py:132(handle_ignored_message)
     2722    0.006    0.000    0.007    0.000 file_state.py:148(iter_spurious_suppression_messages)
    26728    0.021    0.000    0.032    0.000 linterstats.py:288(increase_single_message_count)
    26728    0.021    0.000    0.021    0.000 linterstats.py:292(increase_single_module_message_count)
        1    0.000    0.000    0.000    0.000 linterstats.py:298(reset_message_count)
        1    0.000    0.000    0.000    0.000 message.py:34(Message)
        1    0.000    0.000    0.001    0.001 message.py:5(<module>)
    26728    0.054    0.000    0.078    0.000 message.py:60(__new__)
        1    0.000    0.000    0.000    0.000 message_definition.py:17(MessageDefinition)
      678    0.000    0.000    0.001    0.000 message_definition.py:18(__init__)
        1    0.000    0.000    0.000    0.000 message_definition.py:4(<module>)
      718    0.001    0.000    0.001    0.000 message_definition.py:47(check_msgid)
      339    0.000    0.000    0.000    0.000 message_definition.py:60(may_be_emitted)
    67886    0.096    0.000    0.096    0.000 message_definition.py:94(check_message_definition)
        1    0.000    0.000    0.000    0.000 message_definition_store.py:16(MessageDefinitionStore)
        1    0.000    0.000    0.000    0.000 message_definition_store.py:22(__init__)
        1    0.000    0.000    0.000    0.000 message_definition_store.py:31(messages)
       43    0.000    0.000    0.005    0.000 message_definition_store.py:36(register_messages_from_checker)
        1    0.000    0.000    0.000    0.000 message_definition_store.py:4(<module>)
      339    0.000    0.000    0.001    0.000 message_definition_store.py:42(register_message)
      175    0.000    0.000    0.001    0.000 message_definition_store.py:50(get_message_definitions)
      175    0.000    0.000    0.000    0.000 message_definition_store.py:53(<listcomp>)
      772    0.001    0.000    0.002    0.000 message_id_store.py:105(get_active_msgids)
        1    0.000    0.000    0.000    0.000 message_id_store.py:13(__init__)
        1    0.000    0.000    0.000    0.000 message_id_store.py:3(<module>)
      339    0.000    0.000    0.000    0.000 message_id_store.py:42(register_message_definition)
      339    0.000    0.000    0.000    0.000 message_id_store.py:51(add_msgid_and_symbol)
       20    0.000    0.000    0.000    0.000 message_id_store.py:59(add_legacy_msgid_and_symbol)
      359    0.000    0.000    0.000    0.000 message_id_store.py:72(check_msgid_and_symbol)
        1    0.000    0.000    0.000    0.000 message_id_store.py:9(MessageIdStore)
    41158    0.046    0.000    0.046    0.000 pylinter.py:1373(_get_message_state_scope)
   765813    0.817    0.000    0.996    0.000 pylinter.py:1391(_is_one_message_enabled)
   765805    1.480    0.000    3.386    0.000 pylinter.py:1426(is_message_enabled)
    67886    0.460    0.000    1.768    0.000 pylinter.py:1454(_add_one_message)
    67886    0.128    0.000    1.897    0.000 pylinter.py:1542(add_message)
      162    0.000    0.000    0.000    0.000 pylinter.py:1603(_message_symbol)
    32/12    0.000    0.000    0.000    0.000 pylinter.py:1628(_get_messages_to_set)
        1    0.000    0.000    0.000    0.000 pylinter.py:768(enable_fail_on_messages)
    38928    0.045    0.000    0.062    0.000 redefined_variable_type.py:65(_check_and_add_messages)
    27697    0.028    0.000    0.032    0.000 refactoring_checker.py:1091(_emit_nested_blocks_message_if_needed)
    26728    0.069    0.000    0.281    0.000 text.py:213(write_message)
    26728    0.046    0.000    0.334    0.000 text.py:221(handle_message)
      153    0.000    0.000    0.000    0.000 utils.py:488(check_messages)
      153    0.000    0.000    0.000    0.000 utils.py:491(store_messages)
        1    0.000    0.000    0.000    0.000 utils.py:63(get_fatal_error_message)

Follow up to #4814

Some functions can't be cached without impacting the corectness
with the current design.
@coveralls
Copy link

coveralls commented Dec 28, 2021

Pull Request Test Coverage Report for Build 1633545280

  • 6 of 6 (100.0%) changed or added relevant lines in 2 files are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.0004%) to 93.718%

Files with Coverage Reduction New Missed Lines %
pylint/lint/pylinter.py 3 94.29%
Totals Coverage Status
Change from base Build 1630989169: 0.0004%
Covered Lines: 14336
Relevant Lines: 15297

💛 - Coveralls

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Co-authored-by: Daniël van Noord <[email protected]>
@DanielNoord
Copy link
Collaborator

@Pierre-Sassoulas Should we add a limit to the cache size?

@Pierre-Sassoulas
Copy link
Member Author

Should we add a limit to the cache size?

Sorry I thought I added what I was thinking about that to the description but evidently I did not.

Regarding msgid / symbol correspondence (message_id_store.py:105(get_active_msgids)) or message_definition_store.py:49(get_message_definitions) (text describing the message added). it will never go very high as it's a set number of message from our own code and from user's checkers. How many message can the user generate ? I guess it will never be more than 1000, and if each messages has around 1000 characters, it's still ~= 1 Mb to cache. We could even generate the data structure itself and check it during generation but I don't think this is really the bottleneck of pylint's performance issues.

Previously, I was also caching pylinter.py:1423(is_message_enabled) and this one could use a limit as it's one per msgid per line per file (which could be a lot depending on the code base). But it would require a prior refactor to be applied.

@DanielNoord
Copy link
Collaborator

I think we should add either such a comment or a limit to it, to avoid future confusion.

Co-authored-by: Daniël van Noord <[email protected]>
DanielNoord
DanielNoord previously approved these changes Dec 29, 2021
@DanielNoord DanielNoord dismissed their stale review December 29, 2021 09:31

Thought of something

msgid: Optional[str]
symbol: Optional[str]
if msgid_or_symbol[1:].isdigit():
# Only msgid can have a digit as second letter
msgid = msgid_or_symbol.upper()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
msgid = msgid_or_symbol.upper()
msgid: Optional[str] = msgid_or_symbol.upper()

Don't know why I didn't think of this sooner, but this should fix it (or this change but then on L118. We don't need an extra line, we just need to add typing to the call that casts either to str instead of Optional[str].

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I did that at first but then self.__msgid_to_symbol.get(msgid) expects a string and not a null-able string.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we remove the msgid = None line?

    def get_active_msgids(self, msgid_or_symbol: str) -> List[str]:
        """Return msgids but the input can be a symbol."""
        # Only msgid can have a digit as second letter
        msgid = None
        if msgid_or_symbol[1:].isdigit():
            msgid = msgid_or_symbol.upper()
            symbol = self.__msgid_to_symbol.get(msgid)
        else:
            msgid = self.__symbol_to_msgid.get(msgid_or_symbol)
            symbol = msgid_or_symbol
        if msgid is None or symbol is None or not msgid or not symbol:
            error_msg = f"No such message id or symbol '{msgid_or_symbol}'."
            raise UnknownMessageError(error_msg)
        return self.__old_names.get(msgid, [msgid])

Gives not errors for me with mypy --strict. mypy infers that msgid is Optional[str] I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the type hint is only for type checker when the assignment is really executed at run time and impact performance.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants