Skip to content

gh-118761: Improve import time of dataclasses #129925

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

donBarbos
Copy link
Contributor

@donBarbos donBarbos commented Feb 9, 2025

Another attempt to improve import time of stdlib modules.
Importing dataclasses takes a long time and affects many other modules so it needs to do is make dataclasses better.

I use lazy importing for 4 largest modules (re, copy, inspect, annotationlib), they are also rarely called (1 and 2 times)

CPython configure flags:

./configure --enable-optimizations --with-lto --enable-loadable-sqlite-extensions

Benchmarks:

Running: pipx install tuna && ./python -X importtime -c 'import dataclasses' 2> import.log && tuna import.log

Total import time: 0.022s -> 0.008s = x2.75 as fast

main branch PR branch
Screenshot from 2025-02-10 03-40-23 Screenshot from 2025-02-10 03-43-08

dataclasses import time: 0.015s -> 0.001s = x15 as fast

main branch PR branch
Screenshot from 2025-02-10 03-41-11 Screenshot from 2025-02-10 03-43-23

hyperfine: 24.ms -> 10.2ms = x2.4 as fast

Main branch:

$ hyperfine --warmup 11 --runs 3000 "./python -c 'import dataclasses'"
Benchmark 1: ./python -c 'import dataclasses'
  Time (mean ± σ):      24.5 ms ±   1.2 ms    [User: 21.2 ms, System: 3.3 ms]
  Range (min … max):    22.9 ms …  38.1 ms    3000 runs

PR branch:

$ hyperfine --warmup 11 --runs 3000 "./python -c 'import dataclasses'"
Benchmark 1: ./python -c 'import dataclasses'
  Time (mean ± σ):      10.2 ms ±   0.4 ms    [User: 8.3 ms, System: 1.8 ms]
  Range (min … max):     9.8 ms …  19.9 ms    3000 runs

@donBarbos donBarbos requested a review from ericvsmith as a code owner February 9, 2025 23:51
@donBarbos donBarbos changed the title gh-11876: Improve import time of dataclasses gh-118761: Improve import time of dataclasses Feb 10, 2025
@donBarbos
Copy link
Contributor Author

On Trade-offs

Here is the new call trace

  1. import re;re.compile in _is_type()
  2. _is_type() in _process_class() and _get_field()
  3. _get_field() in _process_class()
  4. _process_class() in dataclass.wrap()

  1. import annotationlib in _process_class()
  2. _process_class() in dataclass.wrap()

  1. import inspect in _process_class() and _add_slots()
  2. _add_slots() in _process_class()
  3. _process_class() in dataclass.wrap()

  1. import copy in _asdict_inner() and _astuple_inner()
  2. _asdict_inner() in asdict()
  3. _astuple_inner() in astuple()

And I wrote benchmarks to test calling affected public functions (dataclass decorator and asdict, astuple functions):

bench_dataclass.py starts like this:

import time
import dataclasses
import os

code = """
@dataclasses.dataclass
class Address:
    city: str
    zip_code: str

@dataclasses.dataclass
class Person:
    name: str
    age: int
    addresses: list[Address] = dataclasses.field(default_factory=list)
    metadata: dict[str, str] = dataclasses.field(default_factory=dict)
"""

result_times = []

for _ in range(100):
    os.system("sync && echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null")
    start_time = time.time()
    exec(code, {"dataclasses": dataclasses})
    end_time = time.time()
    result_times.append(end_time-start_time)

bench_asdict.py starts like this:

import time
import dataclasses
import os

@dataclasses.dataclass
class Address:
    city: str
    zip_code: str

@dataclasses.dataclass
class Person:
    name: str
    age: int
    addresses: list[Address] = dataclasses.field(default_factory=list)
    metadata: dict[str, str] = dataclasses.field(default_factory=dict)

person = Person(
    name="John Doe",
    age=30,
    addresses=[Address("New York", "10001"), Address("Los Angeles", "90001")],
    metadata={"key1": "value1", "key2": "value2"}
)

code = """
dataclasses.asdict(person)
"""

result_times = []

for _ in range(100):
    os.system("sync && echo 3 | sudo tee /proc/sys/vm/drop_caches > /dev/null")
    start_time = time.time()
    exec(code, {"dataclasses": dataclasses, "person": person})
    end_time = time.time()
    result_times.append(end_time-start_time)

And I getting stats in both scripts in the end of file like this:

import statistics

first = result_times[0]
mean = statistics.mean(result_times)
median = statistics.median(result_times)
stdev = statistics.stdev(result_times)
variance = statistics.variance(result_times)

print(f"First time: {first * 1_000_000:.2f}μs")
print(f"Mean: {mean * 1_000_000:.2f}μs")
print(f"Median: {median * 1_000_000:.2f}μs")
print(f"Standard deviation: {stdev * 1_000_000:.2f}μs")
print(f"Variance: {variance * 1_000_000:.2f}μs")

Results on main branch:

$ ./python -B bench_dataclass.py
First time: 1586.20μs
Mean: 1267.27μs
Median: 1243.83μs
Standard deviation: 106.15μs
Variance: 0.01μs
$ ./python -B bench_asdict.py
First time: 99.66μs
Mean: 112.33μs
Median: 109.91μs
Standard deviation: 12.26μs
Variance: 0.00μs

Results on PR branch:

$ ./python -B bench_dataclass.py
First time: 20320.18μs  # is 20ms or 0.02s and this is a permanent result for first call
Mean: 1454.47μs
Median: 1243.11μs
Standard deviation: 1911.43μs
Variance: 3.65μs
$ ./python -B bench_asdict.py
First time: 100.37μs
Mean: 112.21μs
Median: 108.96μs
Standard deviation: 13.01μs
Variance: 0.00μs

In the end we can say that about 20ms was added for the first mention of @dataclass in the code.

Copy link
Contributor

@eli-schwartz eli-schwartz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use lazy importing for 4 largest modules (re, copy, inspect, annotationlib), they are also rarely called (1 and 2 times)

I don't fully understand what you mean by "rarely", I think. It looks like a lot of this PR is essentially delaying imports that will be unconditionally used in all code that actually utilizes a dataclass. The timings are based on import time for the module itself, but given that the primary use of this module is as a decorator, it's always going to actually use the module at import time, unlike other modules where you might import it at the top of the file and then only use it inside of an if block.

So I don't think you can consider the import time of dataclasses without considering how it's actually used. Any imports that are unconditionally used when decorating a class aren't beneficial to delay the import of (but will incur the use-time cost of re-running import itself, which isn't major but does exist).

@donBarbos
Copy link
Contributor Author

@eli-schwartz yes, but most of the PRs for this issue are also adding lazy imports.

@eli-schwartz
Copy link
Contributor

I'm not sure what point you're trying to make.

Most of the PRs for this issue are adding lazy imports. Lazy imports are a useful tool for making python programs faster, in the principle of "only pay for what you use" -- and stdlib modules often don't know what an application will in fact use.

The issue here is that as far as I can tell you're adding lazy imports for things that the consumer will always use, which means that there won't be a benefit to making them lazy...

@gpshead
Copy link
Member

gpshead commented Feb 17, 2025

I'm closing this one based on Eli's analysis.

@gpshead gpshead closed this Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants