-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
GH-126367: Fix urllib.request.url2pathname() colon handling in URLs on Windows #127752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
samtstephens
commented
Dec 9, 2024
•
edited by bedevere-app
bot
Loading
edited by bedevere-app
bot
- Issue: urllib.request.url2pathname() mishandles colons in URLs on Windows #126367
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
url = url.replace(':', '|') | ||
if not '|' in url: | ||
url = url.replace(':', '|', 1) | ||
if not '|' in url or ('|' in url and not ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this not be simplified? I had a hard time understanding the code:
url = url.replace(':', '|', 1)
if not '|' in url or ('|' in url and not (
url.startswith('|') # error case
or '/|' in url # error case
or (len(url) > 2 and url[0] == '/' and url[1].isalpha() and url[2] == '|')
)):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a look at this
It's perhaps worth taking a step back to think about what we're aiming to do here, and why Windows drive paths need special handling.
The simplest way to create a file:
URI is to prepend file:
to a path, so:
foo/bar.txt
becomesfile:foo/bar.txt
(relative file URI, not widely supported)c:/foo
becomesfile:c:/foo
(Windows)/etc/hosts
becomesfile:/etc/hosts
(POSIX)
For conformance with RFC 1738, folks began adding an empty authority section for absolute POSIX paths, so:
/etc/hosts
becomesfile:///etc/hosts
(orfile://localhost/etc/hosts
more explicitly)
This works nicely on POSIX because absolute paths must start with a slash, just like the path component of URLs.
But what about Windows? A path like C:/foo
can't be directly mapped to a URL path component because it doesn't start with a slash. The solution folks came up with: just add a slash prefix!
C:/foo
becomesfile:///C:/foo
(orfile://localhost/C:/foo
more explicitly)
Our task in this PR is to undo that operation. We need to:
- Detect when a URL's path component begins with a slash and then a Windows drive, like
/A:/foo/bar
or/B|/doo
- Remove the leading slash if so
- If second character is
|
, convert it to:
Once those steps are done, all that remains is to convert slashes and unquote URL entities.
As far as detecting a Windows drives goes, I suggest having a look at the implementation of os.path.splitroot()
or os.path.isabs()
in ntpath.py
:
Lines 80 to 95 in 5c89adf
def isabs(s): | |
"""Test whether a path is absolute""" | |
s = os.fspath(s) | |
if isinstance(s, bytes): | |
sep = b'\\' | |
altsep = b'/' | |
colon_sep = b':\\' | |
double_sep = b'\\\\' | |
else: | |
sep = '\\' | |
altsep = '/' | |
colon_sep = ':\\' | |
double_sep = '\\\\' | |
s = s[:3].replace(altsep, sep) | |
# Absolute: UNC, device, and paths with a drive and root. | |
return s.startswith(colon_sep, 1) or s.startswith(double_sep) |
We don't need to check whether the drive letter is alphabetic IMHO. It's probably sufficient to check that 3rd character in the URL path is a colon or a pipe (after checking that the first character is a slash).
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase And if you don't make the requested changes, you will be put in the comfy chair! |
Don't worry about breaking the following test case - there's no reason for cpython/Lib/test/test_urllib.py Lines 1498 to 1499 in 5c89adf
|
All this code smells funny and should be binned IMO: Lines 28 to 40 in 5c89adf
|
Hi, thank you for the detailed response, I really appreciate it!! We will look into these changes and we'll see if we can come up with a better solution. I can definitely say we felt a bit out of our depth, not having a lot of in-depth knowledge with how the files should look, so this background is appreciated. |
You didn't pick the easiest issue to work on to be honest 😅 file URIs are a bit of a dark art... |
Hey @samtstephens, is there anything I can do to help with this? Ta |
I'm keen to get this sorted in time for python 3.14, so I've opened a PR here: #131428 |
Closing this PR as the bug has been fixed elsewhere. Thanks all the same for taking a look! |