Skip to content

Regression (1.4.0-rc0) | Transient python pypi deps not working #2796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JeroenSchmidt opened this issue Apr 21, 2025 · 17 comments · Fixed by #2806
Closed

Regression (1.4.0-rc0) | Transient python pypi deps not working #2796

JeroenSchmidt opened this issue Apr 21, 2025 · 17 comments · Fixed by #2806
Assignees
Labels
type: pip pip/pypi integration

Comments

@JeroenSchmidt
Copy link
Contributor

JeroenSchmidt commented Apr 21, 2025

🐞 bug report

Affected Rule

py_test() & py_library

Is this a regression?

This problem appears to happen after #2629

Description

Transient dependencies of python packages are not being provided to the test environment.

🔬 Minimal Reproduction

# lib/libA/tests/test_fixtures/test_mlflow.py
import mlflow
# lib/libA/BUILD

py_library(
    name = "fixtures",
    srcs = glob(["src/**/*.py"]),
    imports = ["src"],
    visibility = ["//visibility:public"],
    deps = [
        requirement("mlflow-skinny"),
    ],
)

py_test(
    name = "test",
    timeout = "long",
    srcs = glob(["tests/**"]),
    deps = [
        requirement("pytest"),
        requirement("pytest-cov"),
        ":test_fixtures",
    ],
)
# mydeps_without_hashes.txt
# py3.10

mlflow-skinny==2.14.3
pytest-cov==5.0.0
pytest==8.3.1
# .bazelrc
common --java_runtime_version=remotejdk_11
common --java_language_version=11
common --tool_java_runtime_version=remotejdk_11
common --tool_java_language_version=11

build --nolegacy_external_runfiles
build --incompatible_default_to_explicit_init_py

🔥 Exception or Error


==================================== ERRORS ====================================
_ ERROR collecting lib/libA/tests/test_mlflow.py _
ImportError while importing test module '/private/var/tmp/_bazel_USER1/bdf93a2b33b4c731566cb9fa9e8f36ec/sandbox/darwin-sandbox/27/execroot/_main/bazel-out/darwin_arm64-fastbuild/bin/lib/test_fixtures/test.runfiles/_main/lib/libA/tests/test_mlflow.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../rules_python++pip+pypi_deps_mydeps_310_pytest/site-packages/_pytest/python.py:493: in importtestmodule
    mod = import_path(
../rules_python++pip+pypi_deps_mydeps_310_pytest/site-packages/_pytest/pathlib.py:582: in import_path
    importlib.import_module(module_name)
../../../../../../../../../../../../execroot/_main/external/rules_python++python+python_3_10_aarch64-apple-darwin/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
:1050: in _gcd_import
    ???
:1027: in _find_and_load
    ???
:1006: in _find_and_load_unlocked
    ???
:688: in _load_unlocked
    ???
../rules_python++pip+pypi_deps_mydeps_310_pytest/site-packages/_pytest/assertion/rewrite.py:174: in exec_module
    exec(co, module.__dict__)
lib/libA/tests/test_mlflow.py:1: in 
    import mlflow
../rules_python++pip+pypi_deps_mydeps_310_mlflow_skinny/site-packages/mlflow/__init__.py:34: in 
    from mlflow import (
../rules_python++pip+pypi_deps_mydeps_310_mlflow_skinny/site-packages/mlflow/artifacts/__init__.py:9: in 
    from mlflow.exceptions import MlflowException
../rules_python++pip+pypi_deps_mydeps_310_mlflow_skinny/site-packages/mlflow/exceptions.py:5: in 
    from mlflow.protos.databricks_pb2 import (
../rules_python++pip+pypi_deps_mydeps_310_mlflow_skinny/site-packages/mlflow/protos/databricks_pb2.py:5: in 
    from google.protobuf.internal import enum_type_wrapper
E   ModuleNotFoundError: No module named 'google'

🌍 Your Environment

Operating System:

  
MACOS 15.3.2 (24D81)
  

Output of bazel version:

  
Bazelisk version: 1.24.0
Starting local Bazel server (8.1.1) and connecting to it...
Build label: 8.1.1
Build target: @@//src/main/java/com/google/devtools/build/lib/bazel:BazelServer
Build time: Tue Feb 25 18:53:44 2025 (1740509624)
Build timestamp: 1740509624
Build timestamp as int: 1740509624
  

Rules_python version:

  
bazel_dep(name = "rules_python", version = "1.3.0")
git_override(module_name = "rules_python",
    remote = "https://github.com/bazel-contrib/rules_python",
    commit = "2cb920c1e52a85239d6bcc38919fbf143b514dac" # breaks
    # commit = "aa0d16c1463e4e26f6ed633ae83d9785a2ea9dfa" # working
    # commit="84351d4ec14e474bc196c0b8cd70e04fcc9a25ca" # working
)
  

Anything else relevant?
Adding protobuf to the BUILD deps results in another transient missing import error when import airflow is called.

    from opentelemetry.sdk.trace import Event as OTelEvent
E   ModuleNotFoundError: No module named 'opentelemetry'

Lock File Generation

I didn't regenerate the lockfile between testing the changing rules_python versions; but incase it helps I used rules_uv to create the initial lock file with rules_python==1.3

load("@rules_uv//uv:pip.bzl", uv_compile = "pip_compile")

uv_compile(
    name = "mydeps",
    py3_runtime = "@python_3_10//:py3_runtime",
    requirements_in = "mydeps_without_hashes.txt",
    requirements_txt = "mydeps_with_hashes.txt",
)
@rickeylev
Copy link
Collaborator

cc @aignas #2629 author

@rickeylev
Copy link
Collaborator

I didn't regenerate the lockfile between testing the changing rules_python versions
I used rules_uv to generate it

That should be fine. How requirements.txt is generated shouldn't matter, just the content of requirements.txt itself.

Can you post a e.g. github gist of your requirements.txt? If it has env markers in it, that might be relevant.

@JeroenSchmidt
Copy link
Contributor Author

@rickeylev no env markers are in the requirements file. Only index-url is specified within the uv_compile, which is then present in the hash requirements file.

Would you still like the full requirements file?

@rickeylev
Copy link
Collaborator

Yeah, might as well, if that's ok, just in case it's relevant (ignas won't be back online for many hours, so it might save a long cycle of q/a)

keith added a commit to keith/rules_python that referenced this issue Apr 21, 2025
@keith
Copy link
Member

keith commented Apr 21, 2025

ah i missed this issue and made a repro case #2805

@keith
Copy link
Member

keith commented Apr 21, 2025

the problem is this logic: https://github.com/bazelbuild/rules_python/blob/c981569cc89c76eb57a78f0bbc47f1566211c924/python/private/pypi/whl_metadata.bzl#L55-L58

in my example the license text contains empty lines:

Author-email: holger krekel and contributors <[email protected]>, [email protected]
License: MIT License
        
        Copyright (c) 2010 Holger Krekel and contributors.
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/pytest-dev/pytest-xdist
Project-URL: Documentation, https://pytest-xdist.readthedocs.io/en/latest

but the metadata continues afterwards

@keith
Copy link
Member

keith commented Apr 21, 2025

The lines in between aren't actually blank, so I wonder if the solution is just to remove the .strip() there and that's something that can be relied on?

keith added a commit to keith/rules_python that referenced this issue Apr 21, 2025
@keith
Copy link
Member

keith commented Apr 21, 2025

here's a potential fix #2806 for reference the previous implementation didn't have this bug because it just overparsed and ignored the end of the header https://github.com/bazelbuild/rules_python/blob/aa0d16c1463e4e26f6ed633ae83d9785a2ea9dfa/tools/wheelmaker.py#L572

@rickeylev
Copy link
Collaborator

Skimming the metadata spec, I'm not finding much specified about multi-line headers.

https://packaging.python.org/en/latest/specifications/core-metadata

For Description, it mandates "7 spaces, then pipe". That isn't specified for License, though. I don't see any general advice for multi-line stuff, either.

The code says it treates the first blank line as the end of a header, but I don't see that in the spec, either. Is that a de-facto thing?

@keith
Copy link
Member

keith commented Apr 21, 2025

I found that UV appears to use mailparse to parse this, for reference

@keith
Copy link
Member

keith commented Apr 21, 2025

alternatively we could just not break. if something has requires-dist: after the blank lines it seems like that would break elsewhere too, and it didn't before with this ruleset

@rickeylev
Copy link
Collaborator

I found this in one of the earlier peps: https://peps.python.org/pep-0566/#description

In addition to the Description header field, the distribution’s description may instead be provided in the message body (i.e., after a completely blank line following the headers, with no indentation or other special formatting necessary).

I'm thinking that an empty line vs whitespace-only line are semantically distinct in the METADATA format. Empty line means "end of all headers, what follows is the content of the description header". whitespace-only means...I guess an empty line in the current header?

@groodt would you happen to know the packaging trivia here?

@aignas aignas self-assigned this Apr 21, 2025
@JeroenSchmidt
Copy link
Contributor Author

JeroenSchmidt commented Apr 22, 2025

Yeah, might as well, if that's ok, just in case it's relevant (ignas won't be back online for many hours, so it might save a long cycle of q/a)

@aignas @rickeylev here's the full requirements hash file & without hashes.

https://gist.github.com/JeroenSchmidt/cd768cb9cfadb8b784274f41d9dd3843

@groodt
Copy link
Collaborator

groodt commented Apr 22, 2025

@groodt would you happen to know the packaging trivia here?

Yes, its described in Core Metadata here: https://packaging.python.org/en/latest/specifications/core-metadata/

We need support for multiple lines correct because it's actually a format from email headers 😊

Multiple lines and paragraphs are common in Description and License for certain.

@aignas
Copy link
Collaborator

aignas commented Apr 22, 2025

I looked at #2806 once more and I think it would be better to just parse the whole file for Requires-Dist: instead of breaking at the first empty line. That is probably a better first approximation.

@aignas aignas added the type: pip pip/pypi integration label Apr 22, 2025
@aignas
Copy link
Collaborator

aignas commented Apr 22, 2025

Hmmm, looking at the whitespace in https://pypi-browser.org/package/pytest-xdist/pytest_xdist-3.6.1-py3-none-any.whl/pytest_xdist-3.6.1.dist-info/METADATA makes me think that what @keith has submitted maybe will be just as good. Let's go with that for now and then let's reopen this if we have further problems.

EDIT: I see that mlflow would be handled as well: https://pypi-browser.org/package/mlflow/mlflow-2.22.0rc0-py3-none-any.whl/mlflow-2.22.0rc0.dist-info/METADATA

github-merge-queue bot pushed a commit that referenced this issue Apr 22, 2025
The wheel `METADATA` parsing implemented in 1.4 missed the fact
that whitespace is significant and sometimes License is included
inline in the `METADATA` file itself.

This change ensures that we stop parsing the `METADATA` file only
on first completely empty line.

Fixes #2796

---------

Co-authored-by: Ignas Anikevicius <[email protected]>
aignas added a commit that referenced this issue Apr 22, 2025
The wheel `METADATA` parsing implemented in 1.4 missed the fact
that whitespace is significant and sometimes License is included
inline in the `METADATA` file itself.

This change ensures that we stop parsing the `METADATA` file only
on first completely empty line.

Fixes #2796

---------

Co-authored-by: Ignas Anikevicius <[email protected]>
(cherry picked from commit 1d69ad6)
@JeroenSchmidt
Copy link
Contributor Author

Thanks for the quick fix everyone. The issue looks resolved on my side!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: pip pip/pypi integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants