Skip to content

Commit 66cdb2b

Browse files
barneygaleAA-Turnerpicnixz
authored
GH-123599: url2pathname(): handle authority section in file URL (#126844)
In `urllib.request.url2pathname()`, if the authority resolves to the current host, discard it. If an authority is present but resolves somewhere else, then on Windows we return a UNC path (as before), and on other platforms we raise `URLError`. Affects `pathlib.Path.from_uri()` in the same way. Co-authored-by: Adam Turner <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]>
1 parent a214db0 commit 66cdb2b

File tree

8 files changed

+106
-48
lines changed

8 files changed

+106
-48
lines changed

Doc/library/pathlib.rst

+6
Original file line numberDiff line numberDiff line change
@@ -871,6 +871,12 @@ conforming to :rfc:`8089`.
871871

872872
.. versionadded:: 3.13
873873

874+
.. versionchanged:: next
875+
If a URL authority (e.g. a hostname) is present and resolves to a local
876+
address, it is discarded. If an authority is present and *doesn't*
877+
resolve to a local address, then on Windows a UNC path is returned (as
878+
before), and on other platforms a :exc:`ValueError` is raised.
879+
874880

875881
.. method:: Path.as_uri()
876882

Doc/library/urllib.request.rst

+12-5
Original file line numberDiff line numberDiff line change
@@ -158,16 +158,16 @@ The :mod:`urllib.request` module defines the following functions:
158158
>>> 'file:' + pathname2url(path)
159159
'file:///C:/Program%20Files'
160160

161-
.. versionchanged:: 3.14
162-
Paths beginning with a slash are converted to URLs with authority
163-
sections. For example, the path ``/etc/hosts`` is converted to
164-
the URL ``///etc/hosts``.
165-
166161
.. versionchanged:: 3.14
167162
Windows drive letters are no longer converted to uppercase, and ``:``
168163
characters not following a drive letter no longer cause an
169164
:exc:`OSError` exception to be raised on Windows.
170165

166+
.. versionchanged:: 3.14
167+
Paths beginning with a slash are converted to URLs with authority
168+
sections. For example, the path ``/etc/hosts`` is converted to
169+
the URL ``///etc/hosts``.
170+
171171

172172
.. function:: url2pathname(url)
173173

@@ -186,6 +186,13 @@ The :mod:`urllib.request` module defines the following functions:
186186
characters not following a drive letter no longer cause an
187187
:exc:`OSError` exception to be raised on Windows.
188188

189+
.. versionchanged:: next
190+
This function calls :func:`socket.gethostbyname` if the URL authority
191+
isn't empty or ``localhost``. If the authority resolves to a local IP
192+
address then it is discarded; otherwise, on Windows a UNC path is
193+
returned (as before), and on other platforms a
194+
:exc:`~urllib.error.URLError` is raised.
195+
189196

190197
.. function:: getproxies()
191198

Doc/whatsnew/3.14.rst

+19
Original file line numberDiff line numberDiff line change
@@ -1197,6 +1197,25 @@ urllib
11971197
supporting SHA-256 digest authentication as specified in :rfc:`7616`.
11981198
(Contributed by Calvin Bui in :gh:`128193`.)
11991199

1200+
* Improve standards compliance when parsing and emitting ``file:`` URLs.
1201+
1202+
In :func:`urllib.request.url2pathname`:
1203+
1204+
- Discard URL authorities that resolve to a local IP address.
1205+
- Raise :exc:`~urllib.error.URLError` if a URL authority doesn't resolve
1206+
to ``localhost``, except on Windows where we return a UNC path.
1207+
1208+
In :func:`urllib.request.pathname2url`:
1209+
1210+
- Include an empty URL authority when a path begins with a slash. For
1211+
example, the path ``/etc/hosts`` is converted to the URL ``///etc/hosts``.
1212+
1213+
On Windows, drive letters are no longer converted to uppercase, and ``:``
1214+
characters not following a drive letter no longer cause an :exc:`OSError`
1215+
exception to be raised.
1216+
1217+
(Contributed by Barney Gale in :gh:`125866`.)
1218+
12001219

12011220
uuid
12021221
----

Lib/pathlib/__init__.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -1278,8 +1278,12 @@ def from_uri(cls, uri):
12781278
"""Return a new path from the given 'file' URI."""
12791279
if not uri.startswith('file:'):
12801280
raise ValueError(f"URI does not start with 'file:': {uri!r}")
1281+
from urllib.error import URLError
12811282
from urllib.request import url2pathname
1282-
path = cls(url2pathname(uri.removeprefix('file:')))
1283+
try:
1284+
path = cls(url2pathname(uri.removeprefix('file:')))
1285+
except URLError as exc:
1286+
raise ValueError(exc.reason) from None
12831287
if not path.is_absolute():
12841288
raise ValueError(f"URI is not absolute: {uri!r}")
12851289
return path

Lib/test/test_pathlib/test_pathlib.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -3285,10 +3285,14 @@ def test_handling_bad_descriptor(self):
32853285
def test_from_uri_posix(self):
32863286
P = self.cls
32873287
self.assertEqual(P.from_uri('file:/foo/bar'), P('/foo/bar'))
3288-
self.assertEqual(P.from_uri('file://foo/bar'), P('//foo/bar'))
3288+
self.assertRaises(ValueError, P.from_uri, 'file://foo/bar')
32893289
self.assertEqual(P.from_uri('file:///foo/bar'), P('/foo/bar'))
32903290
self.assertEqual(P.from_uri('file:////foo/bar'), P('//foo/bar'))
32913291
self.assertEqual(P.from_uri('file://localhost/foo/bar'), P('/foo/bar'))
3292+
if not is_wasi:
3293+
self.assertEqual(P.from_uri('file://127.0.0.1/foo/bar'), P('/foo/bar'))
3294+
self.assertEqual(P.from_uri(f'file://{socket.gethostname()}/foo/bar'),
3295+
P('/foo/bar'))
32923296
self.assertRaises(ValueError, P.from_uri, 'foo/bar')
32933297
self.assertRaises(ValueError, P.from_uri, '/foo/bar')
32943298
self.assertRaises(ValueError, P.from_uri, '//foo/bar')

Lib/test/test_urllib.py

+34-9
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
from test.support import os_helper
1212
from test.support import socket_helper
1313
import os
14+
import socket
1415
try:
1516
import ssl
1617
except ImportError:
@@ -1424,6 +1425,17 @@ def test_quoting(self):
14241425
"url2pathname() failed; %s != %s" %
14251426
(expect, result))
14261427

1428+
def test_pathname2url(self):
1429+
# Test cases common to Windows and POSIX.
1430+
fn = urllib.request.pathname2url
1431+
sep = os.path.sep
1432+
self.assertEqual(fn(''), '')
1433+
self.assertEqual(fn(sep), '///')
1434+
self.assertEqual(fn('a'), 'a')
1435+
self.assertEqual(fn(f'a{sep}b.c'), 'a/b.c')
1436+
self.assertEqual(fn(f'{sep}a{sep}b.c'), '///a/b.c')
1437+
self.assertEqual(fn(f'{sep}a{sep}b%#c'), '///a/b%25%23c')
1438+
14271439
@unittest.skipUnless(sys.platform == 'win32',
14281440
'test specific to Windows pathnames.')
14291441
def test_pathname2url_win(self):
@@ -1466,12 +1478,9 @@ def test_pathname2url_win(self):
14661478
'test specific to POSIX pathnames')
14671479
def test_pathname2url_posix(self):
14681480
fn = urllib.request.pathname2url
1469-
self.assertEqual(fn('/'), '///')
1470-
self.assertEqual(fn('/a/b.c'), '///a/b.c')
14711481
self.assertEqual(fn('//a/b.c'), '////a/b.c')
14721482
self.assertEqual(fn('///a/b.c'), '/////a/b.c')
14731483
self.assertEqual(fn('////a/b.c'), '//////a/b.c')
1474-
self.assertEqual(fn('/a/b%#c'), '///a/b%25%23c')
14751484

14761485
@unittest.skipUnless(os_helper.FS_NONASCII, 'need os_helper.FS_NONASCII')
14771486
def test_pathname2url_nonascii(self):
@@ -1480,11 +1489,25 @@ def test_pathname2url_nonascii(self):
14801489
url = urllib.parse.quote(os_helper.FS_NONASCII, encoding=encoding, errors=errors)
14811490
self.assertEqual(urllib.request.pathname2url(os_helper.FS_NONASCII), url)
14821491

1492+
def test_url2pathname(self):
1493+
# Test cases common to Windows and POSIX.
1494+
fn = urllib.request.url2pathname
1495+
sep = os.path.sep
1496+
self.assertEqual(fn(''), '')
1497+
self.assertEqual(fn('/'), f'{sep}')
1498+
self.assertEqual(fn('///'), f'{sep}')
1499+
self.assertEqual(fn('////'), f'{sep}{sep}')
1500+
self.assertEqual(fn('foo'), 'foo')
1501+
self.assertEqual(fn('foo/bar'), f'foo{sep}bar')
1502+
self.assertEqual(fn('/foo/bar'), f'{sep}foo{sep}bar')
1503+
self.assertEqual(fn('//localhost/foo/bar'), f'{sep}foo{sep}bar')
1504+
self.assertEqual(fn('///foo/bar'), f'{sep}foo{sep}bar')
1505+
self.assertEqual(fn('////foo/bar'), f'{sep}{sep}foo{sep}bar')
1506+
14831507
@unittest.skipUnless(sys.platform == 'win32',
14841508
'test specific to Windows pathnames.')
14851509
def test_url2pathname_win(self):
14861510
fn = urllib.request.url2pathname
1487-
self.assertEqual(fn('/'), '\\')
14881511
self.assertEqual(fn('/C:/'), 'C:\\')
14891512
self.assertEqual(fn("///C|"), 'C:')
14901513
self.assertEqual(fn("///C:"), 'C:')
@@ -1530,11 +1553,13 @@ def test_url2pathname_win(self):
15301553
'test specific to POSIX pathnames')
15311554
def test_url2pathname_posix(self):
15321555
fn = urllib.request.url2pathname
1533-
self.assertEqual(fn('/foo/bar'), '/foo/bar')
1534-
self.assertEqual(fn('//foo/bar'), '//foo/bar')
1535-
self.assertEqual(fn('///foo/bar'), '/foo/bar')
1536-
self.assertEqual(fn('////foo/bar'), '//foo/bar')
1537-
self.assertEqual(fn('//localhost/foo/bar'), '/foo/bar')
1556+
self.assertRaises(urllib.error.URLError, fn, '//foo/bar')
1557+
self.assertRaises(urllib.error.URLError, fn, '//localhost:/foo/bar')
1558+
self.assertRaises(urllib.error.URLError, fn, '//:80/foo/bar')
1559+
self.assertRaises(urllib.error.URLError, fn, '//:/foo/bar')
1560+
self.assertRaises(urllib.error.URLError, fn, '//c:80/foo/bar')
1561+
self.assertEqual(fn('//127.0.0.1/foo/bar'), '/foo/bar')
1562+
self.assertEqual(fn(f'//{socket.gethostname()}/foo/bar'), '/foo/bar')
15381563

15391564
@unittest.skipUnless(os_helper.FS_NONASCII, 'need os_helper.FS_NONASCII')
15401565
def test_url2pathname_nonascii(self):

Lib/urllib/request.py

+20-32
Original file line numberDiff line numberDiff line change
@@ -1450,16 +1450,6 @@ def parse_http_list(s):
14501450
return [part.strip() for part in res]
14511451

14521452
class FileHandler(BaseHandler):
1453-
# Use local file or FTP depending on form of URL
1454-
def file_open(self, req):
1455-
url = req.selector
1456-
if url[:2] == '//' and url[2:3] != '/' and (req.host and
1457-
req.host != 'localhost'):
1458-
if not req.host in self.get_names():
1459-
raise URLError("file:// scheme is supported only on localhost")
1460-
else:
1461-
return self.open_local_file(req)
1462-
14631453
# names for the localhost
14641454
names = None
14651455
def get_names(self):
@@ -1476,8 +1466,7 @@ def get_names(self):
14761466
def open_local_file(self, req):
14771467
import email.utils
14781468
import mimetypes
1479-
host = req.host
1480-
filename = req.selector
1469+
filename = _splittype(req.full_url)[1]
14811470
localfile = url2pathname(filename)
14821471
try:
14831472
stats = os.stat(localfile)
@@ -1487,21 +1476,21 @@ def open_local_file(self, req):
14871476
headers = email.message_from_string(
14881477
'Content-type: %s\nContent-length: %d\nLast-modified: %s\n' %
14891478
(mtype or 'text/plain', size, modified))
1490-
if host:
1491-
host, port = _splitport(host)
1492-
if not host or \
1493-
(not port and _safe_gethostbyname(host) in self.get_names()):
1494-
origurl = 'file:' + pathname2url(localfile)
1495-
return addinfourl(open(localfile, 'rb'), headers, origurl)
1479+
origurl = f'file:{pathname2url(localfile)}'
1480+
return addinfourl(open(localfile, 'rb'), headers, origurl)
14961481
except OSError as exp:
14971482
raise URLError(exp, exp.filename)
1498-
raise URLError('file not on local host')
14991483

1500-
def _safe_gethostbyname(host):
1484+
file_open = open_local_file
1485+
1486+
def _is_local_authority(authority):
1487+
if not authority or authority == 'localhost':
1488+
return True
15011489
try:
1502-
return socket.gethostbyname(host)
1503-
except socket.gaierror:
1504-
return None
1490+
address = socket.gethostbyname(authority)
1491+
except (socket.gaierror, AttributeError):
1492+
return False
1493+
return address in FileHandler().get_names()
15051494

15061495
class FTPHandler(BaseHandler):
15071496
def ftp_open(self, req):
@@ -1649,16 +1638,13 @@ def data_open(self, req):
16491638
def url2pathname(url):
16501639
"""OS-specific conversion from a relative URL of the 'file' scheme
16511640
to a file system path; not recommended for general use."""
1652-
if url[:3] == '///':
1653-
# Empty authority section, so the path begins on the third character.
1654-
url = url[2:]
1655-
elif url[:12] == '//localhost/':
1656-
# Skip past 'localhost' authority.
1657-
url = url[11:]
1658-
1641+
authority, url = _splithost(url)
16591642
if os.name == 'nt':
1660-
if url[:3] == '///':
1661-
# Skip past extra slash before UNC drive in URL path.
1643+
if not _is_local_authority(authority):
1644+
# e.g. file://server/share/file.txt
1645+
url = '//' + authority + url
1646+
elif url[:3] == '///':
1647+
# e.g. file://///server/share/file.txt
16621648
url = url[1:]
16631649
else:
16641650
if url[:1] == '/' and url[2:3] in (':', '|'):
@@ -1668,6 +1654,8 @@ def url2pathname(url):
16681654
# Older URLs use a pipe after a drive letter
16691655
url = url[:1] + ':' + url[2:]
16701656
url = url.replace('/', '\\')
1657+
elif not _is_local_authority(authority):
1658+
raise URLError("file:// scheme is supported only on localhost")
16711659
encoding = sys.getfilesystemencoding()
16721660
errors = sys.getfilesystemencodeerrors()
16731661
return unquote(url, encoding=encoding, errors=errors)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Fix issue where :func:`urllib.request.url2pathname` mishandled file URLs with
2+
authorities. If an authority is present and resolves to ``localhost``, it is
3+
now discarded. If an authority is present but *doesn't* resolve to
4+
``localhost``, then on Windows a UNC path is returned (as before), and on
5+
other platforms a :exc:`urllib.error.URLError` is now raised.

0 commit comments

Comments
 (0)