Skip to content

Imprecise information on the Challenges using setuptools section. #67

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
abravalheri opened this issue Mar 27, 2023 · 9 comments
Open

Comments

@abravalheri
Copy link

Hello, this is a follow up on https://discuss.python.org/t/python-packaging-documentation-feedback-and-discussion/24833/78.

I believe that there are some imprecisions in the section: https://www.pyopensci.org/python-package-guide/package-structure-code/python-package-build-tools.html#challenges-using-setuptools

For example:

setuptools will build a project without a name or version if you are not using a pyproject.toml file to store metadata.

I don't know if I am understanding this correctly, but setuptools can derive the name/version information from any of the configuration files setup.py, setup.cfg or pyproject.toml. I am not sure why that would be problematic...

Setuptools also will include all of the files in your package repository if you do not explicitly tell it to exclude files using a MANIFEST.in file

By default setuptools will add to the distribution a subset of files that do not correspond to all files in the package repository. However we do recommend users to use a plugin like setuptools-scm so the VCS system can be used as the single source of information. With setuptools-scm the approach should be very similar to what hatch does (I believe it tries to parse .gitignore but I might be wrong, or flit(when invoked as theflit` CLI at least), and probably other backends.

My personal opinion is that MANIFEST.in is only needed if you want a high degree of customization and/or are not happy with using VCS (e.g. there are people that believe that disagree on a conceptual level with using VCS info for builds)

There is some information about it on https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html.

@lwasser
Copy link
Member

lwasser commented Mar 28, 2023

hi @abravalheri !! thank you for this issue - so I think what surprised me when i built a package is by default, setuptools included the docs directory and some css files. so it seemed to include more than i was expecting in the sdist. If i recall correctly this was particularly true if i build locally it was also including the documentation build files (html and raw files). whereas in our clean github build it seems to work as expected - the sdist on pypi is not bad - it only includes the docs/ dir

This made me think we should always use manifest with setuptools given the default behavi
or of adding more than you might want in the sdist? for instance should we include markdown and rst documentation files in an sdist? Below is a screenshot - taken from a local build of stravalib which i've been testing this one. Notice that ALL files in the repo are included in the sdist. i worry a user would not include a manifest and have bloated sdist files like this. Then that burdens PyPI / warehouse as storage over time increases.

Screen Shot 2023-03-28 at 12 03 00 PM

I think the name issue relates to setuptools doesn't check for a project name if you are using a setup.py or .cfg file to store metadata. it does check the name in the pyproject.toml file. . this was a bug that someone told me about but because i don't use setup.py i don't have a project to test this. are you saying that is setup.py or setup.cfg metadata are missing a project name setuptools will check to ensure that project name is added before a build?

Many thanks helping me sort this all out!

@abravalheri
Copy link
Author

Hi @lwasser, I would like to cover a few aspects regarding your comment. I am not sure if I will manage to provide a cohesive answer but please find bellow my attempt:

  1. By default setuptools will include in the sdist (if exist):

    • pyproject.toml, setup.cfg, setup.py
    • README, README{.rst,.md,.txt}
    • tests/test*.py and test/test*.py
    • the Python files that are part of the distribution
    • files pointed by package_data, data_files
    • all C sources listed as part of extensions or C libraries
      in the setup script (it does not include C headers)
    • metadata files generated by setuptools (e.g. PKG-INFO, entry-points.txt)

    You can verify that in practice by running a small example:

    > docker run --rm -it python:3.10 /bin/bash
    mkdir -p /tmp/myproj
    cd /tmp/myproj
    mkdir -p src/mymod/
    mkdir docs
    mkdir tests
    mkdir -p .github/workflows
    touch src/mymod/__init__.py
    touch docs/index.rst
    touch docs/index.html
    touch tests/test_mymod.py
    touch .github/workflows/main.yaml
    cat <<EOF > pyproject.toml
    [build-system]
    requires = ["setuptools"]
    build-backend = "setuptools.build_meta"
    [project]
    name = "myproj"
    version = "0.42"
    EOF
    python -m venv .venv
    .venv/bin/python -m pip install -U build
    .venv/bin/python -m build
    tar tf dist/*.tar.gz
    # myproj-0.42/
    # myproj-0.42/PKG-INFO
    # myproj-0.42/pyproject.toml
    # myproj-0.42/setup.cfg
    # myproj-0.42/src/
    # myproj-0.42/src/mymod/
    # myproj-0.42/src/mymod/__init__.py
    # myproj-0.42/src/myproj.egg-info/
    # myproj-0.42/src/myproj.egg-info/PKG-INFO
    # myproj-0.42/src/myproj.egg-info/SOURCES.txt
    # myproj-0.42/src/myproj.egg-info/dependency_links.txt
    # myproj-0.42/src/myproj.egg-info/top_level.txt
    # myproj-0.42/tests/
    # myproj-0.42/tests/test_mymod.py
  2. As a remark I would say that in general adding both docs and tests to the sdist is considered good practice (there is some disagreement, but my overall impression is that it is a 60%/40% split or further appart). You can see a discussion about this topic in https://discuss.python.org/t/should-sdists-include-docs-and-tests/14578.

  3. I imagine that the reason why you see docs in your project is because you are using setuptools-scm.
    setuptools-scm will tell setuptools to add all files tracked by the VCS into the sdist.
    I believe that you should not be seen any "generated" .html file (unless you are adding those to your git repo for tracking, or forgot to configure .gitignore)

    Indeed, I personally recommend people to go for that solution, because I believe it:

    a. is easier
    b. will include by default docs and tests (which is kind of considered best practice)
    c. will automatically include any script and configuration file for tools used during development
    d. will automatically include examples
    e. will include everything that is needed for a developer to work with your project
    (effectivelly, your sdist will work as a snapshot of your project with added "Python package metadata").

    Some people don't like to see CI files in the sdist (which is a fair opinion).
    However, I personally don't mind those and I actually think they are useful
    (a developer might inspect your .github/workflows/*.yml to understand how to run the test suite).
    If don't like certain files, you can trim out excesses with MANIFEST.in.
    The same way, I believe most of the backends also have adhoc solutions for this kind of customization.

The last point is a bit of personal opinion:

  • Isn't the concern about including a few text (e.g. docs/*.rst) in the sdist a bit of premature optimization?
    I believe that the main problem that PyPI has is due to large binary artefacts (e.g. compiled native libraries)
    (specially if you need to produce multiple wheels, e.g. per-OS, per-architecture, ...).

@abravalheri
Copy link
Author

abravalheri commented Mar 28, 2023

I think the name issue relates to setuptools doesn't check for a project name if you are using a setup.py or .cfg file to store metadata. it does check the name in the pyproject.toml file. . this was a bug that someone told me about but because i don't use setup.py i don't have a project to test this. are you saying that is setup.py or setup.cfg metadata are missing a project name setuptools will check to ensure that project name is added before a build?

Setuptools can automatically derive a project name if you don't specify one (if you are using pyproject.toml without [project], setup.cfg or setup.py).
For example:

> docker run --rm -it python:3.10 /bin/bash
mkdir -p /tmp/myproj
cd /tmp/myproj
mkdir -p src/mymod/
touch src/mymod/__init__.py
cat <<EOF > pyproject.toml
[build-system]
requires = ["setuptools", "setuptools-scm"]
build-backend = "setuptools.build_meta"
EOF
python -m venv .venv
.venv/bin/python -m pip install -U build
.venv/bin/python -m build
ls dist/*.whl
# dist/mymod-0.0.0-py3-none-any.whl

or

> docker run --rm -it python:3.10 /bin/bash
mkdir -p /tmp/myproj
cd /tmp/myproj
mkdir -p src/mymod/
touch src/mymod/__init__.py
cat <<EOF > setup.py
from setuptools import setup
setup()
EOF
python -m venv .venv
.venv/bin/python -m pip install -U build
.venv/bin/python -m build
ls dist/*.whl
# dist/mymod-0.0.0-py3-none-any.whl

You can see in these examples that setuptools automatically derives the name mymod from the files in your project (it will also derive a "degenerate version": 0.0.0).

There are some discussions that seem to associate the ability of setuptools to build projects with incomplete metadata with user confusion and problems. In my experience/opinion, there is no direct association. Instead, there are a few cases in which, somehow, the wrong version of setuptools ends up being used. Old versions of setuptools will not be able to read the information present in setup.cfg or pyproject.toml (e.g. leaks in the virtual environment created by the frontend, wrong version of setuptools specified in pyproject.toml, lack of pyproject.toml that causes any system-wide installation of setuptools to be used, etc...).

My personal opinion is that, if the user has a setup.py/pyproject.toml in a directory and they decide to activelly run python -m build, they do want to build a Python package... So there is no reason for setuptools to get in the way, intead it should try to assist the user on the best way possible (e.g. by deriving the project name automatically).

You can see a few discussions on the topic in the following links:

@abravalheri
Copy link
Author

Hi @lwasser did you have any chance to have a look on the topics I discussed above?

@lwasser
Copy link
Member

lwasser commented Jun 12, 2023

hey @abravalheri thank you for following up. let me test again. i was still having issues with setuptools adding too many files by default but i may have done something wrong. so rather than sending you on a loop ... let me please test this out again this week.thank you so much for following up!

@lwasser
Copy link
Member

lwasser commented Jun 13, 2023

ok i've finally tested this (thank you for your patience) @abravalheri let's update our guide to ensure we have the behaviors around setuptools correct. Are you open to submitting a PR with the corrections by chance?

Many thanks!!

@lwasser
Copy link
Member

lwasser commented Jun 13, 2023

@all-contributors please add @abravalheri for code, design

@allcontributors
Copy link
Contributor

@lwasser

I've put up a pull request to add @abravalheri! 🎉

@lwasser
Copy link
Member

lwasser commented Nov 3, 2023

@abravalheri i wondered if you could answer another question related to manifest.in file and setuptools asked here in our discourse. there has been some discussion around what to do with data files in a distribution that i suspect you could shed some light on. many thanks for considering this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants