Skip to content

gh-102950: Implement PEP 706 – Filter for tarfile.extractall #102953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 24, 2023

Conversation

encukou
Copy link
Member

@encukou encukou commented Mar 23, 2023

See PEP-706 for details.

I might have overengineered the tests a bit, moving boilerplate to a set of helpers. Hopefully the tests themselves are clear enough to make it worth it.

@encukou encukou requested a review from gpshead March 23, 2023 14:21
@encukou encukou requested a review from ethanfurman as a code owner March 23, 2023 14:21
Copy link
Member

@ethanfurman ethanfurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I have so far; still working on tarfile.py.

@encukou
Copy link
Member Author

encukou commented Mar 31, 2023

@gpshead, did you want to take a look?

@encukou
Copy link
Member Author

encukou commented Apr 3, 2023

I have made the requested changes; please review again

Copy link
Contributor

@hroncok hroncok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the changes in Doc/.


.. versionadded:: 3.12

The *tar* format is designed to capture all details of a UNIX-like ecosystem,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The *tar* format is designed to capture all details of a UNIX-like ecosystem,
The *tar* format is designed to capture all details of a UNIX-like filesystem,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tend to say POSIX rather than UNIX at this point.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POSIX is more precise, but this intentionally hand-wavy terminology: I think UNIX-like is more understandable to more people than POSIX-like.

@encukou
Copy link
Member Author

encukou commented Apr 17, 2023

If there are no objections, I plan to merge this around Wednesday. Please let me know if you're working on a review.

Lib/tarfile.py Outdated
Comment on lines 2309 to 2311
except TarError as e:
if self.errorlevel > 0:
raise
Copy link
Contributor

@hroncok hroncok Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the docs, I thought only OSErrors will be raised with errorlevel == 1. This seems to raise even TarErrors with errorlevel 1 or 2. What do I miss?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dear. What a rabbit hole. Thanks for raising it, and for a very helpful private discussion.
This warrants an addition to the PEP: https://discuss.python.org/t/23149/26
Please voice your concerns there!

I've updated this PR with my preferred solution, so you can see the areas that need to change and so I get a CI check. It doesn't mean the discussion is over, of course.

@hroncok
Copy link
Contributor

hroncok commented Apr 17, 2023

If there are no objections, I plan to merge this around Wednesday. Please let me know if you're working on a review.

I still plan to go trough all of this. I think the last remaining part is ReplaceTests but I've already looked over that part before this PR was opened. Feel free to merge if you don't hear from me by Wednesday.

@encukou encukou added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 19, 2023
@encukou
Copy link
Member Author

encukou commented Apr 20, 2023

  • aarch64 Debian clang fails test_tarfile with ValueError: overflow in number field. Will investigate.

  • wasm32-wasi fails symlink-related tests. Not surprising. Will investigate.

  • ArchLinux TraceRefs fails in freezing modules, presumably unrelated

  • Fedora LTO+PGO fails in test_tools, presumably unrelated

@encukou
Copy link
Member Author

encukou commented Apr 20, 2023

On WASI, all symlink tests are skipped for now. The mechanism to skip them is a bit of a shortcut that doesn't work here, so on WASI I'll skip the tarfile symlink tests explicitly.

@encukou
Copy link
Member Author

encukou commented Apr 20, 2023

And aarch64 Debian clang showed an issue in tests: I assumed addfile will work on a regular file, but with the old USTAR_FORMAT, it fails if the owner user ID is too high. Will fix the test.

@encukou encukou added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Apr 20, 2023
@encukou
Copy link
Member Author

encukou commented Apr 24, 2023

The remaining buildbot failures are unrelated and happened on main builds as well. One was fixed in main (#103561) and the other is still there.
Crossing fingers, merging.

@encukou encukou merged commit af53046 into python:main Apr 24, 2023
@encukou encukou deleted the tarfile-dir-traversal-sqsq branch April 24, 2023 08:58
@kulikjak
Copy link
Contributor

Hi, with this integrated, I see one new test failure on Solaris:

FAIL: test_modes (test.test_tarfile.TestExtractionFilters.test_modes)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "...../cpython-main/Lib/test/test_tarfile.py", line 3641, in test_modes
    self.expect_file('all_bits', mode='?rwsrwsrwt')
  File "...../cpython-main/Lib/test/test_tarfile.py", line 3406, in expect_file
    self.assertEqual(got, mode)
AssertionError: '?rwsrwsrwx' != '?rwsrwsrwt'
- ?rwsrwsrwx
?          ^
+ ?rwsrwsrwt
?          ^

The issue seems to be the fact that setting sticky bit on files as regular user doesn't do anything on Solaris, and from my limited testing, the same seems to be true on some other systems as well (I tried FreeBSD and chmod +t on file as a regular user also did nothing).

@encukou
Copy link
Member Author

encukou commented Apr 25, 2023

Thanks for the report! The tests are intentionally pretty strict. I'll relax this one. (Edit: see #103831)

encukou added a commit to encukou/cpython that referenced this pull request Apr 25, 2023
encukou added a commit that referenced this pull request Apr 28, 2023
…H-102953) (GH-103832)

See [Backporting & Forward Compatibility in PEP 706](https://peps.python.org/pep-0706/#backporting-forward-compatibility).

- Backport b52ad18
- Backport c8c3956
- Remove the DeprecationWarning
- Adjust docs
- Remove new `__all__` entries
encukou added a commit that referenced this pull request May 10, 2023
…H-102953) (GH-104128)

- Backport b52ad18
- Backport c8c3956
- Remove the DeprecationWarning
- Adjust docs
- Remove new `__all__` entries

Co-authored-by: Petr Viktorin <[email protected]>
@encukou
Copy link
Member Author

encukou commented May 16, 2023

A test_gdb timeout looks unrelated. And the equivalent PR buildbot was green before merging.

carlosroman added a commit to DataDog/cpython that referenced this pull request Jun 22, 2023
* Post 3.8.16

* [3.8] Update copyright years to 2023. (pythongh-100852)

* [3.8] Update copyright years to 2023. (pythongh-100848).
(cherry picked from commit 11f9932)

Co-authored-by: Benjamin Peterson <[email protected]>

* Update additional copyright years to 2023.

Co-authored-by: Ned Deily <[email protected]>

* [3.8] Update copyright year in README (pythonGH-100863) (pythonGH-100867)

(cherry picked from commit 30a6cc4)

Co-authored-by: Ned Deily <[email protected]>
Co-authored-by: HARSHA VARDHAN <[email protected]>

* [3.8] Correct CVE-2020-10735 documentation (pythonGH-100306) (python#100698)

(cherry picked from commit 1cf3d78)
(cherry picked from commit 88fe8d7)

Co-authored-by: Jeremy Paige <[email protected]>
Co-authored-by: Gregory P. Smith <[email protected]>

* [3.8] Bump Azure Pipelines to ubuntu-22.04 (pythonGH-101089) (python#101215)

(cherry picked from commit c22a55c)

Co-authored-by: Hugo van Kemenade <[email protected]>

* [3.8] pythongh-100180: Update Windows installer to OpenSSL 1.1.1s (pythonGH-100903) (python#101258)

* pythongh-101422: (docs) TarFile default errorlevel argument is 1, not 0 (pythonGH-101424)

(cherry picked from commit ea23271)

Co-authored-by: Owain Davies <[email protected]>

* [3.8] pythongh-95778: add doc missing in some places (pythonGH-100627) (python#101630)

(cherry picked from commit 4652182)

* [3.8] pythongh-101283: Improved fallback logic for subprocess with shell=True on Windows (pythonGH-101286) (python#101710)

Co-authored-by: Oleg Iarygin <[email protected]>
Co-authored-by: Steve Dower <[email protected]>

* [3.8] pythongh-101981: Fix Ubuntu SSL tests with OpenSSL (3.1.0-beta1) CI i… (python#102095)

[3.8] pythongh-101981: Fix Ubuntu SSL tests with OpenSSL (3.1.0-beta1) CI issue (pythongh-102079)

* [3.8] pythonGH-102306 Avoid GHA CI macOS test_posix failure by using the appropriate macOS SDK (pythonGH-102307)

[3.8] Avoid GHA CI macOS test_posix failure by using the appropriate macOS SDK.

* [3.8] pythongh-101726: Update the OpenSSL version to 1.1.1t (pythonGH-101727) (pythonGH-101752)

Fixes CVE-2023-0286 (High) and a couple of Medium security issues.
https://www.openssl.org/news/secadv/20230207.txt

Co-authored-by: Gregory P. Smith <[email protected]>
Co-authored-by: Ned Deily <[email protected]>

* [3.8] pythongh-102627: Replace address pointing toward malicious web page (pythonGH-102630) (pythonGH-102667)

(cherry picked from commit 61479d4)

Co-authored-by: Blind4Basics <[email protected]>
Co-authored-by: C.A.M. Gerlach <[email protected]>
Co-authored-by: Hugo van Kemenade <[email protected]>

* [3.8] pythongh-101997: Update bundled pip version to 23.0.1 (pythonGH-101998). (python#102244)

(cherry picked from commit 89d9ff0)

* [3.8] pythongh-102950: Implement PEP 706 – Filter for tarfile.extractall (pythonGH-102953) (python#104548)

Backport of c8c3956

* [3.8] pythongh-99889: Fix directory traversal security flaw in uu.decode() (pythonGH-104096) (python#104332)

(cherry picked from commit 0aeda29)

Co-authored-by: Sam Carroll <[email protected]>

* [3.8] pythongh-104049: do not expose on-disk location from SimpleHTTPRequestHandler (pythonGH-104067) (python#104121)

Do not expose the local server's on-disk location from `SimpleHTTPRequestHandler` when generating a directory index. (unnecessary information disclosure)

(cherry picked from commit c7c3a60)

Co-authored-by: Ethan Furman <[email protected]>
Co-authored-by: Gregory P. Smith <[email protected]>
Co-authored-by: Jelle Zijlstra <[email protected]>

* [3.8] pythongh-103935: Use `io.open_code()` when executing code in trace and profile modules (pythonGH-103947) (python#103954)

Co-authored-by: Tian Gao <[email protected]>

* [3.8] pythongh-68966: fix versionchanged in docs (pythonGH-105299)

* [3.8] Update GitHub CI workflow for macOS. (pythonGH-105302)

* [3.8] pythongh-105184: document that marshal functions can fail and need to be checked with PyErr_Occurred (pythonGH-105185) (python#105222)

(cherry picked from commit ee26ca1)

Co-authored-by: Irit Katriel <[email protected]>

* [3.8] pythongh-102153: Start stripping C0 control and space chars in `urlsplit` (pythonGH-102508) (pythonGH-104575) (pythonGH-104592) (python#104593) (python#104895)

`urllib.parse.urlsplit` has already been respecting the WHATWG spec a bit pythonGH-25595.

This adds more sanitizing to respect the "Remove any leading C0 control or space from input" [rule](https://url.spec.whatwg.org/GH-url-parsing:~:text=Remove%20any%20leading%20and%20trailing%20C0%20control%20or%20space%20from%20input.) in response to [CVE-2023-24329](https://nvd.nist.gov/vuln/detail/CVE-2023-24329).

I simplified the docs by eliding the state of the world explanatory
paragraph in this security release only backport.  (people will see
that in the mainline /3/ docs)

(cherry picked from commit d7f8a5f)
(cherry picked from commit 2f630e1)
(cherry picked from commit 610cc0a)
(cherry picked from commit f48a96a)

Co-authored-by: Miss Islington (bot) <[email protected]>
Co-authored-by: Illia Volochii <[email protected]>
Co-authored-by: Gregory P. Smith [Google] <[email protected]>

* [3.8] pythongh-103142: Upgrade binary builds and CI to OpenSSL 1.1.1u (pythonGH-105174) (pythonGH-105200) (pythonGH-105205) (python#105370)

Upgrade builds to OpenSSL 1.1.1u.

Also updates _ssl_data_111.h from OpenSSL 1.1.1u, _ssl_data_300.h from 3.0.9.

Manual edits to the _ssl_data_300.h file prevent it from removing any
existing definitions in case those exist in some peoples builds and were
important (avoiding regressions during backporting).

(cherry picked from commit ede89af)
(cherry picked from commit e15de14)

Co-authored-by: Gregory P. Smith <[email protected]>
Co-authored-by: Ned Deily <[email protected]>

* Python 3.8.17

* Post 3.8.17

* Updated CI to build 3.8.17

---------

Co-authored-by: Łukasz Langa <[email protected]>
Co-authored-by: Benjamin Peterson <[email protected]>
Co-authored-by: Ned Deily <[email protected]>
Co-authored-by: Miss Islington (bot) <[email protected]>
Co-authored-by: HARSHA VARDHAN <[email protected]>
Co-authored-by: Gregory P. Smith <[email protected]>
Co-authored-by: Jeremy Paige <[email protected]>
Co-authored-by: Hugo van Kemenade <[email protected]>
Co-authored-by: Steve Dower <[email protected]>
Co-authored-by: Owain Davies <[email protected]>
Co-authored-by: Éric <[email protected]>
Co-authored-by: Oleg Iarygin <[email protected]>
Co-authored-by: Steve Dower <[email protected]>
Co-authored-by: Dong-hee Na <[email protected]>
Co-authored-by: Blind4Basics <[email protected]>
Co-authored-by: C.A.M. Gerlach <[email protected]>
Co-authored-by: Pradyun Gedam <[email protected]>
Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Sam Carroll <[email protected]>
Co-authored-by: Ethan Furman <[email protected]>
Co-authored-by: Jelle Zijlstra <[email protected]>
Co-authored-by: Tian Gao <[email protected]>
Co-authored-by: Irit Katriel <[email protected]>
Co-authored-by: stratakis <[email protected]>
Co-authored-by: Illia Volochii <[email protected]>
IlyasTalbi

This comment was marked as spam.

maxwell-k added a commit to maxwell-k/dotlocalslashbin that referenced this pull request Jul 16, 2024
So that the script can be used on Debian 12.

> Changed in version 3.11.4: Added the filter parameter.

-- https://docs.python.org/3.11/library/tarfile.html

The implementation was in python/cpython#102953

Before this change, Debian 12 has Python 3.11.2, so this script errors
on a call with `filter=`

After this change the script does not error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants