Optimize class creation #132042

JelleZijlstra · 2025-04-03T05:00:38Z

Currently, creating an empty class is about 70x slower than creating an empty function in my profiling. Classes are much more complex and it makes sense that they're slower to create, but 70x feels excessive. (Related: #118761.)

I ran some profiling on my Mac with a sample script that just made empty classes in a loop:

A few things stood out:

A lot of time is spent updating slot definitions, i.e. filling in all of the tp_*, nb_*, etc. functions in the C struct for the type. We do this by iterating over all the slots, then looking up the function name (e.g., __add__) in the MRO and placing it in the slot for this class.
Significant time is spent in resolve_slotdups which has a comment "XXX Maybe this could be optimized more -- but is it worth it?". Sounds promising. It helps deal with cases where one name maps to multiple slots (e.g. __add__ is both nb_add and sq_concat), and does that by iterating over all the slotdefs and finding other slots with the same name. It does that using some scratch space in the interpreter state, which seems not thread-safe. I feel we could precompute the data instead, so we don't have to figure it out at runtime. For example, the slotdef struct could grow a new member to indicate whether or not the name is unique.

Most types will define very few of these slots, so it makes sense to try to look for an approach that does less work for slots without changes. I think something like this should work:

First fill in the slots table with all the slots from the first base class.
Then collect all slots for which we may need changes: either slots that have a non-NULL value in the second or later base, or slots the name of which appears in the new class's __dict__. For those slots only, perform an update.

This should make it possible to make class creation something like 2x faster. I haven't started working on implementing this and I may not have time to do it; if you see this and are interested, feel free to pick it up!

Linked PRs

gh-132042: Try to optimize class creation #132156

The text was updated successfully, but these errors were encountered:

sergey-miryanov · 2025-04-03T06:51:08Z

I would try if @AA-Turner doesn't pick this up :)

markshannon · 2025-04-03T09:20:00Z

I think the whole concept of slots (the tp_... slots, not __slots__ or PyType_Slot) as an optimization is the root of the problem, they slow down class creation and don't help performance as they complicate the real optimizations that we perform.

We should view the tp_slots as doing two distinct things:

Specifying the behavior of operations when the struct _typeobject is passed to PyType_Ready
A backwards compatible way for C extensions to access operations. Eg. iter(i) as Py_TYPE(i)->tp_iter

For pure Python objects, all slots can be filled in with a function that does the dynamic lookup, which should be very quick.
Once resolved, we can overwrite the slot with a more direct version.

For classes defined by struct _typeobject we can just replace the NULLs with the dynamic lookup function.
For classes defined from PyType_Spec we fill in the defined slots and then replace the NULLs with the dynamic lookup function.

Also, see faster-cpython/ideas#146 (comment)

markshannon · 2025-04-03T09:40:05Z

The bytecode for creating classes is also a bit of a mess. We seem to be creating code objects, just to create functions just to call them, to do things that could easily be done inline.

There is also a fair bit of machinery about finding the metaclass and the base class tuple. We should compute those in the interpreter as pass them into the class creation machinery.

E.g given a BUILD_CLASS instruction that expects name, meta, bases, dict for class C: ... we get:

    LOAD_CONST "C"
    LOAD_CONST object
    LOAD_CONST ()
    // create method dictionary
    BUILD_CLASS

For class C(D): ... we get:

    LOAD_CONST "C"
    LOAD_NAME "D"
    COPY 1
    LOAD_ATTR __class__ # Get the metaclass
    SWAP 2
    BUILD_TUPLE 1
    // create method dictionary
    COPY 2
    LOAD_ATTR "__prepare__"
    SWAP 2 
    CALL 1   # meta.__prepare__(method_dict)

For multiple inheritance LOAD_ATTR __class__ becomes CALL_INSTRINSIC 2 calculate_metaclass and
if metaclass is explicit, metaclass=expr, then COPY 1; LOAD_ATTR __class__ becomes just expr.

I'm missing the code for setting __orig_bases__ and __class__, but I think those come after BUILD_CLASS.

vstinner · 2025-04-03T09:41:56Z

A lot of time is spent updating slot definitions, i.e. filling in all of the tp_, nb_, etc. functions in the C struct for the type. We do this by iterating over all the slots, then looking up the function name (e.g., add) in the MRO and placing it in the slot for this class.

Previous attempt in 2017: #76527

sergey-miryanov · 2025-04-09T18:54:27Z

I have added some tests results - #132156 (comment)
I want to dig deeper, but if you are interested in some numbers - please take a look. And maybe stop me from further research.

sergey-miryanov · 2025-04-09T20:58:37Z

Ok, I got rid of resolve_slotdups and believe it is ready for review. Please take a look.

JelleZijlstra added interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage labels Apr 3, 2025

JelleZijlstra mentioned this issue Apr 3, 2025

Improve import time of various stdlib modules #118761

Open

picnixz added the type-feature A feature request or enhancement label Apr 4, 2025

bedevere-app bot mentioned this issue Apr 6, 2025

gh-132042: Try to optimize class creation #132156

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize class creation #132042

Optimize class creation #132042

JelleZijlstra commented Apr 3, 2025 •

edited by bedevere-app bot

Loading

sergey-miryanov commented Apr 3, 2025

markshannon commented Apr 3, 2025

markshannon commented Apr 3, 2025

vstinner commented Apr 3, 2025

sergey-miryanov commented Apr 9, 2025

sergey-miryanov commented Apr 9, 2025

Optimize class creation #132042

Optimize class creation #132042

Comments

JelleZijlstra commented Apr 3, 2025 • edited by bedevere-app bot Loading

Linked PRs

sergey-miryanov commented Apr 3, 2025

markshannon commented Apr 3, 2025

markshannon commented Apr 3, 2025

vstinner commented Apr 3, 2025

sergey-miryanov commented Apr 9, 2025

sergey-miryanov commented Apr 9, 2025

JelleZijlstra commented Apr 3, 2025 •

edited by bedevere-app bot

Loading