Skip to content

Investigate switching to urllib3 #2751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jku opened this issue Dec 11, 2024 · 4 comments · Fixed by #2762
Closed

Investigate switching to urllib3 #2751

jku opened this issue Dec 11, 2024 · 4 comments · Fixed by #2762

Comments

@jku
Copy link
Member

jku commented Dec 11, 2024

We currently use requests in the default ngclient fetcher (RequestsFetcher).

Internally requests uses urllib3 and we don't seem to use any functionality outside of what urllib3 provides: connection pool management and streaming are supported. Switching to urllib3 would mean:

  • we would have (at least) one less dependency
  • the API we would use is arguably better maintained

This could be achieved by first writing an alternative FetcherInterface implementation that uses urllib3: if it looks good we can then make it the default and remove the requests-based implementation.

@jku
Copy link
Member Author

jku commented Jan 29, 2025

Documenting findings from the PR:

  • urllib3 API is almost a 1:1 match
  • the one exception is that requests uses certifi for TLS certs and urllib3 by default does not. We could keep using certifi but this seems like a good time to switch to system certificates instead. Reasons are well documented in https://truststore.readthedocs.io/en/latest/
  • We likely want to support RequestsFetcher for at least one release even if the new fetcher is the default to provide easy workaround in case users have issues with the new default
  • Once we can remove the requests dependency we also drop idna, charset-normalizer and certifi: this feels like a win to me

@jku
Copy link
Member Author

jku commented Feb 11, 2025

For historical reference:

  • Almost the opposite change was done in Update TUF to handle HTTPS proxies #781 (replace urllib use with requests) 7 years ago
  • That change made some sense: urllib (a part of Python standard library) is not really usable and I'm sure would break in various cases. requests was indeed the common choice at the time

Why we are (kind of) reversing that decision now:

  • urllib3 (not part of Python standard library) is what actually implements the useful things in requests and is very well maintained and has a reasonable API: Using urllib3 is not the same as implementing our own http code. urllib3 likely was the best choice even in 2018
  • Getting rid of four dependencies is significant
  • we get (as far as I can tell) no value from requests over urllib3: proxies for example should work just as well
  • urllib3 not bundling SSL certificate handling is a positive (PyPI really should not be the CA cert distributor)
  • If something does end up regressing, users can always setup their own fetcher implementation

@jku
Copy link
Member Author

jku commented Feb 12, 2025

we get (as far as I can tell) no value from requests over urllib3: proxies for example should work just as well

Actually, looking further into this: urllib3 does not "automatically" handle the proxy related env variables at least. hmmm...

Basic support for http proxy variables is not that tricky -- legacy urllib contains code that parses all the variables so we'd just have to choose a proxy based on them -- but it's not something I'd like to do in python-tuf

@jku
Copy link
Member Author

jku commented Feb 13, 2025

I've talked to Seth Larson (urllib3 dev): they've made a conscious decision to not handle proxy variables -- this is understandable for them but puts this project in a bit of a pickle... urllib3 is so close to doing everything we need but not quite there.

I think loads of users are behind dumb corporate proxies and their only way to get software working is proxy variables: I don't think we can just hand wave the issue away.

I think we have two choices:

  • abandon the urllib3 switch
  • implement the proxy env var management ourselves -- I don't think this makes sense... but it might be worth it to drop four dependencies

As an experiment I've handled what I think are the sensible parts of proxy variables in ~70 lines of code: this feels acceptable. but would need tests etc still.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant