Skip to content

GH-112855: Speed up pathlib.PurePath pickling #112856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Apr 20, 2024

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Dec 7, 2023

The second item in the tuple returned from __reduce__() is a tuple of arguments to supply to path constructor. Previously we returned the parts tuple here, which entailed joining, parsing and normalising the path object, and produced a compact pickle representation.

With this patch, we instead return a tuple of paths that were originally given to the path constructor. This makes pickling much faster (at the expense of compactness).

It's worth noting that, in the olden times, pathlib performed this parsing/normalization up-front in every case, and so using parts for pickling was almost free. Nowadays pathlib only parses/normalises paths when it's necessary or advantageous to do so (e.g. computing a path parent, or iterating over a directory, respectively).

The second item in the tuple returned from `__reduce__()` is a tuple of
arguments to supply to path constructor. Previously we returned the `parts`
tuple here, which entailed joining, parsing and normalising the path
object, and produced a compact pickle representation.

With this patch, we instead return a tuple of paths that were originally
given to the path constructor. This makes pickling much faster (at the
expense of compactness). By also omitting to `sys.intern()` the path parts,
we slightly speed up path parsing/normalization more generally.
@barneygale
Copy link
Contributor Author

Makes pickling ~3x faster, depending on what you include in the measurement:

$ ./python -m timeit -s "from pathlib import PurePath" "PurePath('foo').__reduce__()"
100000 loops, best of 5:   2.17 usec per loop  # before
500000 loops, best of 5: 617    nsec per loop  # after

$ ./python -m timeit -s "from pathlib import PurePath; p = PurePath('foo')" "p.__reduce__()"
2000000 loops, best of 5: 169   nsec per loop  # before
5000000 loops, best of 5:  78.1 nsec per loop  # after

It's hard to measure using only public APIs, but path parsing generally is a little faster due to dropping the sys.intern(str(x)) bit. I measure about a 3% improvement from generating a string representation of PurePath('foo'):

./python -m timeit -s "from pathlib import PurePath;" "str(PurePath('foo'))"
100000 loops, best of 5: 2.98 usec per loop  # before
100000 loops, best of 5: 2.88 usec per loop  # after

@barneygale
Copy link
Contributor Author

barneygale commented Dec 7, 2023

@AlexWaygood asked whether this might make pathlib + multiprocessing faster, and indeed it does. Simple benchmark script:

from multiprocessing import Pool
from pathlib import PurePath

def f(pathobj):
    return str(pathobj)

if __name__ == '__main__':
    paths = [PurePath(str(i), str(j))
             for i in range(1000)
             for j in range(1000)]
    with Pool(5) as p:
        p.map(f, paths)

Before:

$ time ./python mp.py
real    0m7.730s
user    0m13.161s
sys     0m0.869s

After:

real    0m4.165s
user    0m10.614s
sys     0m0.976s

--> ~1.85x faster in this example.

@barneygale
Copy link
Contributor Author

Simple compactness test:

import pathlib
import pickle

paths = pathlib.Path().glob('**/*')
print(len(pickle.dumps(tuple(paths))))

In my CPython checkout, the size increased by ~20% with this PR.

If I instead glob from Path.cwd(), which adds a repetitive /home/barney/projects/cpython prefix to every path, then the size increased by ~50% from baseline.

Seems like a good bargain to me.

@AlexWaygood AlexWaygood self-requested a review December 9, 2023 23:56
@pitrou
Copy link
Member

pitrou commented Dec 11, 2023

This doesn't look like a good idea to me, especially as it may increase memory consumption when unpickling. Unless you know of a workload where this gives a benefit, I would tend to reject this PR.

@barneygale
Copy link
Contributor Author

Thanks @pitrou. Personally I see no reason to prioritise memory consumption over speed of processing here. As a user of pathlib, I'd expect the performance/space characteristics of serialising path objects to roughly match that of string paths. All workloads involving pickling are affected, such as the multiprocessing example in my previous comment, where the wall time is nearly halved by this PR.

@pitrou
Copy link
Member

pitrou commented Dec 11, 2023

All workloads involving pickling are affected, such as the multiprocessing example in my previous comment, where the wall time is nearly halved by this PR.

This example is not a workload, it's a completely unrealistic micro-benchmark which doesn't reflect actual usage.

To rephrase my objection:
"Unless you know of a (actual, real world) workload where this gives a benefit, I would tend to reject this PR".

@barneygale
Copy link
Contributor Author

Could you provide a workload, real-world or otherwise, that shows your concerns? Everything I try shows a speed-up.

@pitrou
Copy link
Member

pitrou commented Dec 11, 2023

I would say: glob a large filesystem tree (something like list(Path('/usr').glob('**'))) and measure memory consumption.

@barneygale
Copy link
Contributor Author

barneygale commented Dec 11, 2023

Mega, thank you. Running a glob across my media collection (133003 files):

barney@acorn /media/sexy $ /usr/bin/time -v ~/projects/cpython/python -c 'from pathlib import Path; list(Path.cwd().glob("**/*"))'
	Maximum resident set size (kbytes): 88284  # before
	Maximum resident set size (kbytes): 84828  # after

... which is a little surprising! I'll dig in.

@pitrou
Copy link
Member

pitrou commented Dec 11, 2023

(make sure you compile CPython in non-debug mode, by the way :-))

@barneygale
Copy link
Contributor Author

Turns out pathlib doesn't intern strings while globbing or iterating over directories, and possibly never did? The private _make_child_relpath() constructor creates a fully-parsed path, bypassing the path parsing routine (_Flavour.parse_parts() in the past, _abc.PurePathBase.parse_path() currently), and therefore skipping the calls to sys.intern().

@pitrou
Copy link
Member

pitrou commented Dec 17, 2023

Turns out pathlib doesn't intern strings while globbing or iterating over directories, and possibly never did?

IIRC it was meant to (globbing or walking directories is really the primary situation where you'd get multiple instances of the same path component), so I'm a bit surprised.

@barneygale
Copy link
Contributor Author

It surprises me too :o

FWIW, I've restored the interning of path parts in this PR, so the only thing it changes is __reduce__()

@barneygale
Copy link
Contributor Author

I plan to merge this patch within a few days, as I'm reasonably sure it will provide a yummy speedup without having much impact on memory usage, given that pathlib doesn't intern parts in most cases where related paths are generated (directory children, with_name(), etc). But I'll keep a close eye on bug reports / forums / etc to see if anyone reports increased memory/disk usage when pickling paths in 3.13.

@Hnasar
Copy link

Hnasar commented Nov 8, 2024

@barneygale do you think it's worth backporting this pickling optimization to 3.12? 3x performance for a one-line change is pretty nice
(thanks for all your work on pathlib btw!!)

@AlexWaygood
Copy link
Member

@Hnasar we don't backport performance optimisations, I'm afraid, only bugfixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage topic-pathlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants