Skip to content

Slow collection time when tests are not in a relative folder to the current working folder #13420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sashko1988 opened this issue May 12, 2025 · 2 comments · Fixed by #13422
Closed
Labels
topic: collection related to the collection phase type: performance performance or memory problem/improvement

Comments

@sashko1988
Copy link
Contributor

Created after this discussion - #13413

OSes, python and pytest versions

OS: macOS 15.4.1, Ubuntu 22.04
Python 3.12.8
Pytest 8.3.4

Problem description

I need to execute a lot of non-python tests that are stored in folders with lots of nesting. And I found that Pytest struggles during the collection.

Some code context:

@pytest.hookimpl(wrapper=True)
def pytest_collection(session):
    resolved_paths = resolve_suites(session)
    session.config.args.extend(resolved_paths)
    return (yield)
    
def pytest_collect_file(parent, file_path):
    if file_path.suffix == ".yaml":
        return YamlFile.from_parent(parent, path=file_path)

class YamlFile(pytest.File):
    def collect(self) -> Iterable[pytest.Item | pytest.Collector]:
        test_cases = YamlTestResolver().from_file(f"{self.path}")  # leftover from previous runner, but resolves needed stuff.
        for tc in test_cases:
            yield YamlTest.from_parent(self, name=tc.name, tc_spec=tc)
            
class YamlTest(pytest.Item):
    def __init__(self, ptul_tc, **kwargs) -> None:
        super().__init__(**kwargs)
        self.tc_spec = tc_spec

Consider this folder structure:

root_working_folder
├── framework_repo
│   └── framework_internal_folder
└── repo_with_tests
    └── tests
        ├── test_folder_1
        │   └── inner_folder
        └── test_folder_2
            └── inner_folder
                └── even_more_depth

But even more subfolders in repo_with_tests

Pytest call is the following: pytest --collect only ${list with 1k non-python tests}. (1 test per file)

When I execute the above from framework_internal_folder, the execution time is 56 minutes with cProfile, 23 minutes without. When I make the same call from root_working_folder or repo_with_tests, the execution time is ~2 minutes with with cProfile / 38 seconds without.

The most significant time difference in the two calls is in the cumulative time of that function - nodes.py:546(_check_initialpaths_for_relpath)

# from framework_internal_folder
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   237033   85.508    0.000 3176.004    0.013 ../_pytest/nodes.py:546(_check_initialpaths_for_relpath)

# from root_working_folder
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      135    0.051    0.000    1.772    0.013 ../_pytest/nodes.py:546(_check_initialpaths_for_relpath)

According to stats, when executing from framework_internal_folder, the most struggling function is here:

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
206063304  262.471    0.000 1580.672    0.000 ../_pytest/pathlib.py:990(commonpath)

# and stats for callers of that function:
Function                            was called by...
                                        ncalls  tottime  cumtime
pathlib.py:990(commonpath)          <- 205937164/3722648  262.308   28.844  nodes.py:546(_check_initialpaths_for_relpath)

Possible solutions

Cache for _check_initialpaths_for_relpath

I experimented with adding lru_cache to _check_initialpaths_for_relpath:

@lru_cache(maxsize=1000)
def _check_initialpaths_for_relpath(initialpaths: frozenset[Path], path: Path) -> str | None:
    for initial_path in initialpaths:
        if commonpath(path, initial_path) == initial_path:
            rel = str(path.relative_to(initial_path))
            return "" if rel == "." else rel
    return None

That change decreased the overall collection time to 4 minutes.

Stats are also impressive:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     5798    2.109    0.000   79.265    0.014 nodes.py:545(_check_initialpaths_for_relpath)

I'm not sure if commonpath needs caching as well.

Anything else on the collection mechanism?

Other optimizations in directory/file collections

@Zac-HD Zac-HD added topic: collection related to the collection phase type: performance performance or memory problem/improvement labels May 15, 2025
patchback bot pushed a commit that referenced this issue May 16, 2025
…3422)

* add lru_cache to nodes._check_initialpaths_for_relpath
update tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Oleksandr Zavertniev <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
(cherry picked from commit cfbe319)
nicoddemus pushed a commit that referenced this issue May 16, 2025
…3422) (#13425)

* add lru_cache to nodes._check_initialpaths_for_relpath
update tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------



(cherry picked from commit cfbe319)

Co-authored-by: Sashko <[email protected]>
Co-authored-by: Oleksandr Zavertniev <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@sashko1988
Copy link
Contributor Author

@RonnyPfannschmidt, any info when this will be released?

@RonnyPfannschmidt
Copy link
Member

The next patch release will include it

Tusenka pushed a commit to Tusenka/pytest that referenced this issue May 24, 2025
…elpath (pytest-dev#13422)

* add lru_cache to nodes._check_initialpaths_for_relpath
update tests

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Oleksandr Zavertniev <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: collection related to the collection phase type: performance performance or memory problem/improvement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants