Skip to content

pip_parse() does not correctly read *.pth files? #2071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
EricCousineau-TRI opened this issue Jul 17, 2024 · 15 comments
Open

pip_parse() does not correctly read *.pth files? #2071

EricCousineau-TRI opened this issue Jul 17, 2024 · 15 comments

Comments

@EricCousineau-TRI
Copy link

EricCousineau-TRI commented Jul 17, 2024

🚀 feature request

Relevant Rules

package_annotation() from pip.bzl

Description

From docs, as of time of writing, you can add BUILD content, or add / remove files, but doesn't seem like you can change py_library(deps, imports).

For example, rerun.io is an awesome-looking package.
But it doesn't work when using using bazel + rules_python @ 0.34.0:
https://github.com/EricCousineau-TRI/repro/tree/e084a7434286cf426a71350f79755c7cbab6d6eb/bazel/rules_python_rerun
Seems like either the wheel doesn't declare things s.t. rules_python adds ./site-packages/rerun_sdk to the Python path (it only adds ./site-packages).

Describe the solution you'd like

For now, would be nice to have something like

pip_parse(
    ...
    annotations = {
        "rerun-sdk": package_annotation(
            extra_imports = ["site-packages/rerun_sdk"],
        ),
    },
)

Describe alternatives you've considered

For now, will try package_annotation(..., additive_build_content), and include that whenever I use pip("rerun-sdk").

EDIT: This workaround worked: EricCousineau-TRI/repro@96e1336

@EricCousineau-TRI EricCousineau-TRI changed the title Add option to inject additonal imports and deps via package_annotation Add option to inject additonal imports and deps via package_annotation ? Jul 17, 2024
@aignas
Copy link
Collaborator

aignas commented Jul 18, 2024

imports is not supported, but deps is supported via the whl patching mechanism. You should just patch the whl METADATA file that defines the dependencies and then our code generation will do the right thing. Why do you need to modify imports?

@EricCousineau-TRI
Copy link
Author

EricCousineau-TRI commented Jul 18, 2024

in this case, it's missing site-packages/rerun_sdk in the PYTHONPATH:
https://github.com/EricCousineau-TRI/repro/tree/e084a7434286cf426a71350f79755c7cbab6d6eb/bazel/rules_python_rerun
it's not an inter-package dependency, but rather a missing dir on path within the whl itself.

perhaps it's because there's a .pth file that rules_python is not parsing? from above linked repro

$ cd $(bazel info output_base)
$ cat external/pip_deps_rerun_sdk/site-packages/rerun_sdk.pth
rerun_sdk

from docs https://docs.python.org/3/library/site.html
it seems like the .pth file should effectively change sys.path (or in this case, imports)?

@EricCousineau-TRI
Copy link
Author

EricCousineau-TRI commented Jul 18, 2024

(if this seems like a valid thing, I can file a new issue edit this issue and reword the title)

@aignas
Copy link
Collaborator

aignas commented Jul 19, 2024

Thanks for the repro and the logs, this makes it easier to understand. The rules_python not handling the .pth file could be something that is the problem here.

I think ideally the solution would be to investigate how to make rules_python respect the .pth files, but I am not sure how to do it yet. Supporting imports or deps via the annotations does not seem like the right solution here, because it increases the API surface and still forces all of the users to apply workarounds.

@EricCousineau-TRI EricCousineau-TRI changed the title Add option to inject additonal imports and deps via package_annotation ? pip_parse() does not correctly read *.pth files? Jul 19, 2024
@EricCousineau-TRI
Copy link
Author

EricCousineau-TRI commented Jul 19, 2024

sweet, welcome

I think simplest way is to assume it's non-executable *.pth (maybe check for import sys as simple check) and just have code populate imports:
https://github.com/bazelbuild/rules_python/blob/ecad092ffa97ac236fb9a6b33ff7f5af4af80eb6/python/private/pypi/generate_whl_library_build_bazel.bzl#L85-L87

would probably need something like parsed_pth_files as argument to that function, and then the repository rule has to cat those contents accordingly.

@EricCousineau-TRI
Copy link
Author

very loose code: main...EricCousineau-TRI:rules_python:issue-2071

@aignas
Copy link
Collaborator

aignas commented Jul 20, 2024 via email

@groodt
Copy link
Collaborator

groodt commented Aug 24, 2024

Closing this issue as a duplicate of #2156

Any PR to workaround .pth is welcome, but the PR I've linked above is a more holistic issue to capture the overall friction that rules_python causes for some packages.

@groodt groodt closed this as completed Aug 24, 2024
@vonschultz
Copy link
Contributor

I think the main idea of just reading the pth file in starlark and then appending the imports would be a nice way to do it.

With Python 3.12, distutils starts making trouble all over the place. The solution is to pull in setuptools as a dependency and make sure the distutils-precedence.pth from setuptools runs. This .pth file is

import os; var = 'SETUPTOOLS_USE_DISTUTILS'; enabled = os.environ.get(var, 'local') == 'local'; enabled and __import__('_distutils_hack').add_shim(); 

With this in mind, I don't think that reading the pth file in starlark and appending the imports would work. It might work for some packages, but not for setuptools, and that's going to be one of the more important ones to support in any non-trivial Python context.

At our company, we're using a workaround where we create a sitecustomize.py file where we loop through sys.path and run site.addsitedir(directory) if the directory looks like it belongs to setuptools. One might imagine having rules_python helping the user to create a useful sitecustomize.py file, possibly even creating it automatically and putting it on the Python path. If we're solving this with #2156 we might not need to, though.

@aignas
Copy link
Collaborator

aignas commented Dec 26, 2024

I think the issue is that there can be only one sitecustomize.py file as noted in #2169 (comment)

Is setuptools used in building wheels in pip.parse in this case or somewhere else?

@vonschultz
Copy link
Contributor

vonschultz commented Dec 27, 2024

In this case, setuptools isn't really used as such. The problem is distutils. You might have code that runs import tensorflow, and that triggers

import distutils as _distutils

https://github.com/tensorflow/tensorflow/blob/5bc9d26649cca274750ad3625bd93422617eed4b/tensorflow/api_template.__init__.py#L30

In Python 3.11 and earlier, this is an import from the standard library, so we're not really involving setuptools — except that distutils has for a long time recommended using the setuptools package instead. Setuptools has integrated a complete copy of distutils and is no longer dependent on the standard library, and once we're in Python 3.12, that's the only distutils there is. See PEP632.

setuptools makes its distutils module available when the distutils-precedence.pth file in the setuptools wheel runs. The way rules_python currently works, though, the pth file ends up in the wrong directory and won't be loaded unless you manually invoke site.addsitedir(directory).

@aignas
Copy link
Collaborator

aignas commented Jan 1, 2025

I am somewhat thinking that setuptools should be treted in rules_python in a special way.

I have no other ideas on how to fix this - maybe we coud have a special BUILD.bazel file for that package and reuse the same setuptools repository everywhere. If the user tries to include it via pip in bzlmod, we could easily handle that. Doing that in workspace may be too much work.

Potentially related - #2370

@aignas aignas reopened this Jan 1, 2025
@vonschultz
Copy link
Contributor

If we want to treat setuptools in a special way, one possible way forward is to bundle it with the Python installation itself, which rules_python does for Python 3.11:

$ bazelisk cquery 'filter("setuptools/__init__.py|distutils-precedence.pth", deps(@python_3_11//:files))'
@python_3_11_x86_64-unknown-linux-gnu//:lib/python3.11/site-packages/distutils-precedence.pth (null)
@python_3_11_x86_64-unknown-linux-gnu//:lib/python3.11/site-packages/setuptools/__init__.py (null)

For reasons unknown to me, it's no longer bundled with Python 3.12:

$ bazelisk cquery 'filter("setuptools/__init__.py|distutils-precedence.pth", deps(@python_3_12//:files))'
INFO: Empty query results

If we were to do a special BUILD.bazel file for setuptools, what would we do in that? I see a few ways setuptools can be handled in a way that respects the distutils-precedence.pth, but I'm not sure how we'd do it in practice: (A) bundle setuptools with Python, as was done with Python 3.11 and earlier, (B) solve #2156 and put everything in the main site-packages directory or (C) use sitecustomize.py and site.addsitedir(directory).

@groodt
Copy link
Collaborator

groodt commented Jan 8, 2025

@aignas I still believe it's more closely related to #2156 (which is why I marked it as duplicate). pth files are a runtime hook based on site-packages and site.py, it's not really a setuptools/distutils thing (but distutils is using a pth file as a deprecation mechanism).

A workaround in the meantime for the runtime distutils -> setuptools change in 3.12 (distutils is now finally fully deprecated) is to patch any packages still using distutils to pull in a runtime version of setuptools.

@aignas
Copy link
Collaborator

aignas commented Jan 14, 2025

I see what you mean @groodt by saying that this ticket is a duplicate of #2156 and me re-opening this is only because I think that setuptools is important enough to potentially be treated separately until we solve #2156.

That said, I like what you suggest - patch the wheels, or submit upstream patches to patch the source to drop distutils dependence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants