Skip to content

Allow root module to customize toolchains #2081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 of 11 tasks
rickeylev opened this issue Jul 20, 2024 · 5 comments
Open
4 of 11 tasks

Allow root module to customize toolchains #2081

rickeylev opened this issue Jul 20, 2024 · 5 comments
Labels
type: bzlmod bzlmod work type: feature request type: toolchain Related to the toolchains provided by rules_python

Comments

@rickeylev
Copy link
Collaborator

rickeylev commented Jul 20, 2024

We've had several requests where users want to modify something about the toolchain definitions that are generated for the hermetic runtimes. Typically these are small tweaks. Allowing other modules to tweak toolchains is a no-go, but allowing the root module to change things is reasonable.

The "simple" option for customizing toolchains is to add args to python_register_toolchains and python.toolchain.

The two main issues I see with adding args is:

  1. bzlmod APIs tend to be "written in stone". They happen very early in the build process, so we have limited options for evolving the API. In this case, this is somewhat tempered by only the root module being allowed to use it, but it still makes me uneasy.
  2. Toolchains tend to have unstable APIs. Most parts of them are just implementation details for the rules. For example, adding a single arg to customize the bootstrap template made sense last year, but now there are multiple template files. Similarly, the stub_shebang attribute is slated for removal with the introduce of --bootstrap_impl=script.

Adding args to python.toolchain is, I think, a bad idea. Having something else, e.g. python.root_config might make a bit more sense. That said, args on python.toolchain() make it easy to associate something to a particular toolchain, while python.root_config would need some arg to specify which toolchain it wanted to restrict the setting to.

This issue is to collect the use cases and try and figure out some options.

Things people want to change:

  • Overriding bootstrap template

    • Example: feat: Allow custom bootstrap template in python_repository #2032
      • Goal: They want to implement a separate bootstrap
      • The PR proposes adding args to e.g. python_register_toolchains().
      • Issues: (1) Would need to be exposed via bzmod apis. (2) There are 4 bootstrap files: python_bootstrap_template (legacy system python bootstrap), stage1 script bootstrap, stage2 bootstrap, and zip main bootstrap. Plus the launcher exe thing for windows.
      • Proposal: Indirect the references to the templates using flags.
  • Customizing stub_shebang: https://bazelbuild.slack.com/archives/CA306CEV6/p1721464127008299

    • Goal: They want to set -S PYTHONNOUSERSITE=1 for all their Python invocations
    • Proposal 1: Have a flag override for stub_shebang
    • Proposal 2: Add interpreter_args and make env populate values into the bootstrap.
    • Proposal 3: Have flags for "default interpreter args" and "default interpreter env"
  • The register_coverage_tool arg

    • Goal: Having coverage included as part of the toolchain is unnecessary if a user test provides the coverage library itself
  • Control whether local or hermetic runtimes are used

    • Goal: Provide a way for the root module to decide whether a local or hermetic toolchain is used.
    • Basically a local=True|False arg when defining a toolchain. It decides whether the hermetic runtime repo rule or local runtime repo rule is used under the hood.
    • Proposal: Have a flag that acts as a constraint for matching toolchains.
  • Control whether runtimes are in-build or platform runtimes

    • Goal: A platform runtime is cheaper to setup; while not compatible with RBE or sandboxing, that's fine if you aren't using RBE or sandboxing.
    • This mostly applies to local runtimes, but could also apply to hermetic.
    • Basically, have an inbuild=True|False arg when defining a toolchain. It controls whether the underlying py_runtime() targets generated use interpreter= or interpreter_path=
  • Control constraints of toolchains

    • Goal: Allow registering both local and hermetic runtimes. Using constraints, local or platform toolchains can be used for local invocations, and hermetic for RBE.
    • local toolchains can have arbitrary constraints set
    • regular (python-build-standalone) can have their constraints customized
  • Control what toolchain versions are available

    • Goal: An infrastructure team wants to ensure only e.g. Python 3.11 (or a small set of proscribed versions) is usable at their company.
@aignas
Copy link
Collaborator

aignas commented Aug 8, 2024

Wanted to create a separate issue for this but realized that this is already there. We could just create a python.override tag class for the majority of the things listed above, but for starters we could start with customising the URLs for the python toolchain override.

Currently the TOOL_VERSIONS used by the extension cannot be overridden via MODULE.bazel. The only way users have right now is to use the bazel_downloader config, but that may give hard to debug errors and still does not allow users to add Python builds for extra platforms that are not present on the indygreg website.

Initial solution could be a python.override tag class that can override the URLs/structs for python toolchains that are used. The design of the API should allow for future extension for different overrides:

  • restricting allowed python versions
  • setting the x.y->x.y.z mapping
  • disable registering all versions
  • etc

@aignas aignas modified the milestones: v1.0.0, Bzlmod GA Aug 8, 2024
@aignas aignas added type: feature request type: bzlmod bzlmod work type: toolchain Related to the toolchains provided by rules_python labels Aug 8, 2024
github-merge-queue bot pushed a commit that referenced this issue Sep 9, 2024
This value has been hardcoded for a long time, so let's add the
`patch_strip`
attribute to the `WORKSPACE` setups now so that we add less tech debt in
the
`bzlmod` world later and need to migrate `bzlmod` users from the hard
coded
value to something that can be configured.

This should be backwards compatible because of the default `int` value
of `1`
for the `patch_strip` attribute.

Summary:
- feat: add `patch_strip` to `python_repository`.
- feat: add the default value of patch_strip to get_release_info.
- refactor: handle patch_strip key in `TOOL_VERSIONS`.

Work towards #2081.

---------

Co-authored-by: Richard Levasseur <[email protected]>
@aignas aignas mentioned this issue Sep 10, 2024
10 tasks
github-merge-queue bot pushed a commit that referenced this issue Sep 11, 2024
With this PR we get rudimentary unit tests for the `python` bzlmod
extension
which allows us to unit test the override behaviour that will become
more
complex soon.

Summary:
- refactor: inline the python_register_toolchains
- refactor: use toolchain_info to call python_register_toolchains
- refactor: move the registration out of the main loop
- refactor: split the parsing of the modules to a separate function
- test(bzlmod): add python.toolchain module parsing tests

Work towards #2081

---------

Co-authored-by: Richard Levasseur <[email protected]>
aignas added a commit to aignas/rules_python that referenced this issue Sep 13, 2024
aignas added a commit to aignas/rules_python that referenced this issue Sep 13, 2024
github-merge-queue bot pushed a commit that referenced this issue Sep 15, 2024
… in full_version (#2219)

This PR just makes the `MINOR_MAPPING` overridable and explicit in
many macros/rules that we own. Even though technically new API is
exposed, I am not sure if it is possible to use it and I am not sure
if we should advertise it.

Explicit minor_mapping results in easier wiring of `python.override`
`bzlmod` extension tag class planned for #2081.
github-merge-queue bot pushed a commit that referenced this issue Sep 15, 2024
This `semver` function may turn out to be useful in validating
the input for the `python.*override` tag classes to be added in
a followup PR. Because this is a refactor of an existing code and
adding tests, I decided to split it out.

For a POC see #2151, work towards #2081.
aignas added a commit to aignas/rules_python that referenced this issue Sep 15, 2024
Before this PR the users could not override how the hermetic toolchain
is downloaded when in `bzlmod`. However, the same APIs would be
available to users using `WORKSPACE`. With this we also allow root
modules to restrict which toolchain versions the non-root modules, which
may be helpful when optimizing the CI runtimes so that we don't waste
time downloading multiple `micro` versions of the same `3.X` python
version, which most of the times have identical behavior.

Work towards bazel-contrib#2081 and this should be one of the last items that are
blocking bazel-contrib#1361 from the API point of view.
github-merge-queue bot pushed a commit that referenced this issue Sep 17, 2024
This is a different take on #2205.

Summary:
- Remove version constraints from the
`//python/config_settings:python_version`.
- Remove `is_python_config_setting` macro and use a different
implementation to
  retain the same functionality.

This in general will improve the situation on how the toolchains are
registered
and hopefully is a step towards resolving issues for #2081.
aignas added a commit to aignas/rules_python that referenced this issue Sep 17, 2024
Before this PR the users could not override how the hermetic toolchain
is downloaded when in `bzlmod`. However, the same APIs would be
available to users using `WORKSPACE`. With this we also allow root
modules to restrict which toolchain versions the non-root modules, which
may be helpful when optimizing the CI runtimes so that we don't waste
time downloading multiple `micro` versions of the same `3.X` python
version, which most of the times have identical behavior.

Whilst at it, tweak the `semver` implementation to allow for testing of
absence of values in the original input.

Work towards bazel-contrib#2081 and this should be one of the last items that are
blocking bazel-contrib#1361 from the API point of view.
github-merge-queue bot pushed a commit that referenced this issue Sep 19, 2024
…separate files (#2232)

This makes the dependency management in WORKSPACE much easier to do.

Summary:
- refactor: split out the py_repositories call to a separate file
- refactor: split out the python_repository rule to a separate file
- refactor: split out the standalone interpreter utility function
- refactor: split out the python_register_toolchains function
- refactor: rename the remaining file

Work towards #2081.
github-merge-queue bot pushed a commit that referenced this issue Sep 22, 2024
Before this PR the users could not override how the hermetic toolchain
is downloaded when in `bzlmod`. However, the same APIs would be
available to users using `WORKSPACE`. With this we also allow root
modules to restrict which toolchain versions the non-root modules, which
may be helpful when optimizing the CI runtimes so that we don't waste
time downloading multiple `micro` versions of the same `3.X` python
version, which most of the times have identical behavior.

Whilst at it, tweak the `semver` implementation to allow for testing of
absence of values in the original input.

Work towards #2081 and this should be one of the last items that are
blocking #1361 from the API point of view.

Replaces #2151.

---------

Co-authored-by: Richard Levasseur <[email protected]>
@aignas aignas removed this from the Bzlmod GA milestone Sep 24, 2024
github-merge-queue bot pushed a commit that referenced this issue Oct 7, 2024
With this change we set the default value of `--python_version` when
the `python.toolchain` is used in `bzlmod` and we generate the
appropriate config settings based on the registered toolchains and
given overrides by the root module.

This means that we expect the `--python_version` to be always set to
a non-empty value under `bzlmod` and we can cleanup code which was
handling `//conditions:default` case. This also means that we can
in theory drop the requirement for `python_version` in `pip.parse`
and just setup dependencies for all packages that we find in the
`requirements.txt` file and move on. This is left as future work
by myself or anyone willing to contribute. We can also start reusing
the same `whl_library` instance for multi-platform packages because
the python version flag is always set - this will simplify the layout
and makes the feature non-experimental anymore under bzlmod.

Summary:
* Add `@pythons_hub` to the `WORKSPACE` with dummy data to make
  pythons_hub work.
* Add `MINOR_MAPPING` and `PYTHON_VERSIONS` to pythons_hub to
  generate the config settings.
* Remove handling of the default version in `pypi` code under bzlmod.

Work towards #2081, #260, #1708

---------

Co-authored-by: Richard Levasseur <[email protected]>
@rickeylev
Copy link
Collaborator Author

I'm taking a stab at customizing the target_compatible_with and target_settings parts of a toolchain. API I'm looking at is to add args to python.single_version_platform_override, so one can do e.g.

python.single_version_platform_override(
   platform = "linux_x86_64-debug",
   url = ".../cpython-3.12.0-debug-full.tar.gz",
   python_version = "3.12.0",
   target_compatible_with = ["@platforms//os:linux", ...],
   target_settings = ["//:is_debug_enabled"]
)

The net effect being: you can pick any of the python-build-standalone URLs and make it available with whatever custom flags you desire. Similarly, you can use any URL for an archive that extracts out to something structurally compatible with with python-build-standalone does.

github-merge-queue bot pushed a commit that referenced this issue May 13, 2025
This makes the python bzlmod extension handle generating the
platform-specific toolchain
entries ("python_3_10_{platform}"). This is to eventually allow adding
additional
toolchains that aren't part of the PLATFORMS mapping in versions.bzl and
have
their own custom constraints.

The main things this refactor does are:
1. The bzlmod phase passes the full list of implementation toolchains
to create (previously, it relied on `hub_repo` to generate the
implementation
   names).
2. The name of a toolchain (the toolchain.name arg) and the repo that
implements
it (the py_toolchain_suite.user_repository_repo arg) are separate. This
allows
   future work to mixin toolchains that point to arbitrary repos.
3. The platform meta data uses a list of target settings instead of dict
of
flag values. This allows more arbitrary target settings. For now, flag
values
on the platform metadata is still looked for because it's known that
users
   patch python/versions.bzl.

Along the way:
* Factor out a platform_info helper in versions.bzl
* Factor out a NOT_ACTUALLY_PUBLIC constants to better denote things
that
  are public visibility, but actually internal.
* Add some docs to some internals so we don't have to chase down their
definitions.

Work towards #2081
@aignas
Copy link
Collaborator

aignas commented May 14, 2025

Ideally it would be great to start using the MODULE.bazel configuration as the source of truth rather than the versions.bzl file. Maybe we could have the versions.bzl PLATFORMS constant updated automatically based on what we define in the MODULE.bazel?

Though I am not sure where this belongs. The more I think about it, the more I realize that the target python platforms that we want to define have to be somewhat similar/related between the pip and python extensions. What do you think?

@rickeylev
Copy link
Collaborator Author

MODULE.bazel as source of truth

Yes. I'll take it a step further: I like the idea of fetching a remote config of all available toolchains (i.e. all ~500 python-build-standalone runtimes). But, by default, we only register one toolchain -- what works on the host. If people want more than that, they have to opt-in somehow (e.g. MODULE.bazel changes). Similarly, ideally, waving the magic wand 🧙 some more, the set of toolchains built into rules_python source code is only the most recent supported python versions for populate platforms. Anything beyond that requires the remote manifest or user opt-in

But, yeah, quite a bit of refactoring and work needed to get to that point. In the meantime, the upcoming PR to let single_version_platform_override add mostly-arbitrary PBS runtimes will go a long ways.

PLATFORMS is updated based on what is defined

Yeah, this is pretty close to my followup PR from the recent refactor PR to python.bzl. The design of "there's a dict of platform->settings; a toolchain has a platform key, which is looked up in that dict" is baked pretty deeply into various places. My upcoming PR adds a second platforms dict that single_version_platform_override adds into, the two are merged, and that merged result gets passed down to functions that need the platform dict (instead of using the PLATFORMS constant).

Ideally, I don't think we should be defining these platform-description-strings. The values we have today mostly just derive from the python-build-standalone file names and the need to have a unique, valid, repo name for the downloaded runtime.

the toolchain platforms have to be similar to the pip platforms

How so? Do you mean there is some code in the pip-integration that looks at the PLATFORMS global to do something-or-other? Ah, maybe the part where pip.parse tries to automatically lookup the correct interpreter? The need for that should go away with pipstar, right? It'll only be needed for sdist building -- and if we move that to build phase, then that need will go away (and we're only left with the problem of setup_requires et al for sdist building)

@aignas
Copy link
Collaborator

aignas commented May 15, 2025

Yes. I'll take it a step further: I like the idea of fetching a remote config of all available toolchains (i.e. all ~500 python-build-standalone runtimes). But, by default, we only register one toolchain -- what works on the host. If people want more than that, they have to opt-in somehow (e.g. MODULE.bazel changes). Similarly, ideally, waving the magic wand 🧙 some more, the set of toolchains built into rules_python source code is only the most recent supported python versions for populate platforms. Anything beyond that requires the remote manifest or user opt-in

But, yeah, quite a bit of refactoring and work needed to get to that point. In the meantime, the upcoming PR to let single_version_platform_override add mostly-arbitrary PBS runtimes will go a long ways.

Yeah this is a big one. I have the following concerns about the feasibility:

  • If transitive modules can add configuration it will be a mess.
  • If the root module has to provide configuration for transitive modules, it will be a mess as well.
  • Providing toolchains from a single python-build-standalone release is probably an easy to understand compromise.
  • Doing only for a particular host, I think I like the idea in theory. If you need extra platforms, you need to do more, but in practice adding only the host platform has to happen in the repository_ctx and that is not great. It is much easier to just add everything for a particular version and use target_settings to discriminate.

The values we have today mostly just derive from the python-build-standalone file names

Yeah, this riffs to what you are saying above. Maybe we could split the hermetic python toolchain into a separate module, where people can easily register stuff? If pip extension did not depend on Python, that would be actually a feasible architecture.

How so? Do you mean there is some code in the pip-integration that looks at the PLATFORMS global to do something-or-other?

May thinking was that the platform description (the thing that you said you would like to leave out) is right now done in three places uv extension, python extension and pip. In pip with pip-star it has to be done in 3 places: the whl_library marker_setting which selects the deps based on the target configuration, where we evaluate expressions in the requirements files (currently we use Python interpreter, but we could ask the user to provide a target platform configuration if they provide a requirements/lock file with markers). The third location is where the wheel filenames are mapped to config settings - we depend on the fact that manylinux_x86_64 maps to os:linux, cpu:x86_64 and a few extra flags. We don't depend on the platform definitions, but in order to filter out wheels that the user does not want/need, we have to assume the target platform triples (i.e., the cp39_linux_x86_64 and similar).

This is why I was thinking that the target platform concept would be nice, because that would allow us to include/create only the actually required whl_library instances. That is probably where I would love to know the incantation that I need to say with the magic wand here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bzlmod bzlmod work type: feature request type: toolchain Related to the toolchains provided by rules_python
Projects
None yet
Development

No branches or pull requests

2 participants