Skip to content

Commit 3b43335

Browse files
emmatypingStanFromIrelandAA-Turnerpicnixz
authored
gh-132983: Introduce _zstd bindings module (GH-133027)
* Add _zstd module for https://peps.python.org/pep-0784/ This commit introduces the `_zstd` module, with bindings to libzstd from the pyzstd project. It also includes the unix build system configuration. Windows build system support will be integrated independently as it depends on integration with cpython-source-deps. * Add _zstd to modules * Fix path for compression.zstd module * Ignore _zstd module like _io * Expand module state macros to improve code quality Also removes module state references from the classes in the _zstd module and instead uses PyType_GetModuleState() * Remove backticks suggested in review Co-authored-by: Stan Ulbrych <[email protected]> * Use critical sections to lock object state This should avoid races and deadlocks. * Remove compress/decompress and mark module as not reliant on the GIL The `compress`/`decompress` functions will be moved to Python code for simplicity. C implementations can always be re-added in the future. Also, mark _zstd as not requiring the GIL. * Lift critical section to avoid clang warning * Respond to comments by picnixz * Call out pyzstd explicitly in license description Co-authored-by: Adam Turner <[email protected]> * Use a much more robust implementation... ... for `get_zstd_state_from_type` Co-authored-by: Bénédikt Tran <[email protected]> * Use PyList_GetItemRef for thread safety purposes * Use a macro for the minimum supported version * remove const from primivite types * Use PyMem_New in another spot * Simplify error handling in _get_frame_size * Another simplification of error handling in get_frame_info * Rename _module_state to mod_state * Rewrite comment explaining the context of the code * Add link to pyzstd * Add TODO about refactoring dict training code * Use PyModule_AddObjectRef over PyModule_AddObject PyModule_AddObject is soft-deprecated, so we should use PyModule_AddObjectRef * Check result of OutputBufferGrow * Simplify return logic in `add_constant_to_type` Co-authored-by: Bénédikt Tran <[email protected]> * Ignore return value of _zstd_clear() Co-authored-by: Bénédikt Tran <[email protected]> * Remove redundant comments * Remove __reduce__ from ZstdDict We should instead document that to pickle a dictionary a user should use the `.dict_content` attribute. * Use PyUnicode_FromFormat instead of a buffer * Don't use C constants/types in error messages * Make error messages easier to understand for Python users * Lower minimum required version 1.4.0 * Use casts and make slot function signatures correct * Be consistent with CPython on const usage * Make else clauses in line with PEP 7 * Fix over-indented blocks in argument clinic * Add critical section around ZSTD_DCtx_setParameter * Add a TODO about refactoring critical sections * Use Py_UNREACHABLE * Move bytes operations out of Py_BEGIN_ALLOW_THREADS * Add TODO about ensuring a lock is held * Remove asserts that may not be correct * Add TODO to make ZstdDict and others GC objects * Make objects GC tracked * Remove unused include * Fix some memory issues * Fix refleaks on module and in ZstdDict * Update configure to check for ZDICT_finalizeDictionary * Properly check version in configure * exit(1) if check fails * Use AC_RUN_IFELSE * Use a define() to re-use version check * Actually properly set _zstd module status based on version --------- Co-authored-by: Stan Ulbrych <[email protected]> Co-authored-by: Adam Turner <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]>
1 parent 2bc8365 commit 3b43335

23 files changed

+4804
-1
lines changed

Doc/license.rst

+37
Original file line numberDiff line numberDiff line change
@@ -1132,3 +1132,40 @@ The file is distributed under the 2-Clause BSD License::
11321132
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
11331133
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
11341134
THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
1135+
1136+
1137+
Zstandard bindings
1138+
------------------
1139+
1140+
Zstandard bindings in :file:`Modules/_zstd` and :file:`Lib/compression/zstd`
1141+
are based on code from the
1142+
`pyzstd library <https://github.com/Rogdham/pyzstd/>`_, copyright Ma Lin and
1143+
contributors. The pyzstd code is distributed under the 3-Clause BSD License::
1144+
1145+
Copyright (c) 2020-present, Ma Lin and contributors.
1146+
All rights reserved.
1147+
1148+
Redistribution and use in source and binary forms, with or without
1149+
modification, are permitted provided that the following conditions are met:
1150+
1151+
1. Redistributions of source code must retain the above copyright notice, this
1152+
list of conditions and the following disclaimer.
1153+
1154+
2. Redistributions in binary form must reproduce the above copyright notice,
1155+
this list of conditions and the following disclaimer in the documentation
1156+
and/or other materials provided with the distribution.
1157+
1158+
3. Neither the name of the copyright holder nor the names of its
1159+
contributors may be used to endorse or promote products derived from
1160+
this software without specific prior written permission.
1161+
1162+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
1163+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
1164+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
1165+
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
1166+
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
1167+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
1168+
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
1169+
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
1170+
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
1171+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Include/internal/pycore_global_objects_fini_generated.h

+8
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Include/internal/pycore_global_strings.h

+8
Original file line numberDiff line numberDiff line change
@@ -325,6 +325,7 @@ struct _Py_global_strings {
325325
STRUCT_FOR_ID(bytes_per_sep)
326326
STRUCT_FOR_ID(c_call)
327327
STRUCT_FOR_ID(c_exception)
328+
STRUCT_FOR_ID(c_parameter_type)
328329
STRUCT_FOR_ID(c_return)
329330
STRUCT_FOR_ID(cached_datetime_module)
330331
STRUCT_FOR_ID(cached_statements)
@@ -379,6 +380,7 @@ struct _Py_global_strings {
379380
STRUCT_FOR_ID(count)
380381
STRUCT_FOR_ID(covariant)
381382
STRUCT_FOR_ID(cwd)
383+
STRUCT_FOR_ID(d_parameter_type)
382384
STRUCT_FOR_ID(data)
383385
STRUCT_FOR_ID(database)
384386
STRUCT_FOR_ID(day)
@@ -393,6 +395,7 @@ struct _Py_global_strings {
393395
STRUCT_FOR_ID(deterministic)
394396
STRUCT_FOR_ID(device)
395397
STRUCT_FOR_ID(dict)
398+
STRUCT_FOR_ID(dict_content)
396399
STRUCT_FOR_ID(dictcomp)
397400
STRUCT_FOR_ID(difference_update)
398401
STRUCT_FOR_ID(digest)
@@ -459,6 +462,7 @@ struct _Py_global_strings {
459462
STRUCT_FOR_ID(follow_symlinks)
460463
STRUCT_FOR_ID(format)
461464
STRUCT_FOR_ID(format_spec)
465+
STRUCT_FOR_ID(frame_buffer)
462466
STRUCT_FOR_ID(from_param)
463467
STRUCT_FOR_ID(fromlist)
464468
STRUCT_FOR_ID(fromtimestamp)
@@ -517,6 +521,8 @@ struct _Py_global_strings {
517521
STRUCT_FOR_ID(intersection)
518522
STRUCT_FOR_ID(interval)
519523
STRUCT_FOR_ID(io)
524+
STRUCT_FOR_ID(is_compress)
525+
STRUCT_FOR_ID(is_raw)
520526
STRUCT_FOR_ID(is_running)
521527
STRUCT_FOR_ID(is_struct)
522528
STRUCT_FOR_ID(isatty)
@@ -640,6 +646,7 @@ struct _Py_global_strings {
640646
STRUCT_FOR_ID(overlapped)
641647
STRUCT_FOR_ID(owner)
642648
STRUCT_FOR_ID(pages)
649+
STRUCT_FOR_ID(parameter)
643650
STRUCT_FOR_ID(parent)
644651
STRUCT_FOR_ID(password)
645652
STRUCT_FOR_ID(path)
@@ -801,6 +808,7 @@ struct _Py_global_strings {
801808
STRUCT_FOR_ID(write_through)
802809
STRUCT_FOR_ID(year)
803810
STRUCT_FOR_ID(zdict)
811+
STRUCT_FOR_ID(zstd_dict)
804812
} identifiers;
805813
struct {
806814
PyASCIIObject _ascii;

Include/internal/pycore_runtime_init_generated.h

+8
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Include/internal/pycore_unicodeobject_generated.h

+32
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Makefile.pre.in

+1
Original file line numberDiff line numberDiff line change
@@ -3341,6 +3341,7 @@ MODULE__TESTCAPI_DEPS=$(srcdir)/Modules/_testcapi/parts.h $(srcdir)/Modules/_tes
33413341
MODULE__TESTLIMITEDCAPI_DEPS=$(srcdir)/Modules/_testlimitedcapi/testcapi_long.h $(srcdir)/Modules/_testlimitedcapi/parts.h $(srcdir)/Modules/_testlimitedcapi/util.h
33423342
MODULE__TESTINTERNALCAPI_DEPS=$(srcdir)/Modules/_testinternalcapi/parts.h
33433343
MODULE__SQLITE3_DEPS=$(srcdir)/Modules/_sqlite/connection.h $(srcdir)/Modules/_sqlite/cursor.h $(srcdir)/Modules/_sqlite/microprotocols.h $(srcdir)/Modules/_sqlite/module.h $(srcdir)/Modules/_sqlite/prepare_protocol.h $(srcdir)/Modules/_sqlite/row.h $(srcdir)/Modules/_sqlite/util.h
3344+
MODULE__ZSTD_DEPS=$(srcdir)/Modules/_zstd/_zstdmodule.h $(srcdir)/Modules/_zstd/buffer.h
33443345

33453346
CODECS_COMMON_HEADERS=$(srcdir)/Modules/cjkcodecs/multibytecodec.h $(srcdir)/Modules/cjkcodecs/cjkcodecs.h
33463347
MODULE__CODECS_CN_DEPS=$(srcdir)/Modules/cjkcodecs/mappings_cn.h $(CODECS_COMMON_HEADERS)

Modules/Setup

+1
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,7 @@ PYTHONPATH=$(COREPYTHONPATH)
200200
#_dbm _dbmmodule.c -lgdbm_compat -DUSE_GDBM_COMPAT
201201
#_gdbm _gdbmmodule.c -lgdbm
202202
#_lzma _lzmamodule.c -llzma
203+
#_zstd _zstd/_zstdmodule.c -lzstd -I$(srcdir)/Modules/_zstd
203204
#_uuid _uuidmodule.c -luuid
204205
#zlib zlibmodule.c -lz
205206

Modules/Setup.stdlib.in

+2-1
Original file line numberDiff line numberDiff line change
@@ -65,10 +65,11 @@
6565
@MODULE__DECIMAL_TRUE@_decimal _decimal/_decimal.c
6666

6767
# compression libs and binascii (optional CRC32 from zlib)
68-
# bindings need -lbz2, -lz, or -llzma, respectively
68+
# bindings need -lbz2, -llzma, -lzstd, or -lz, respectively
6969
@MODULE_BINASCII_TRUE@binascii binascii.c
7070
@MODULE__BZ2_TRUE@_bz2 _bz2module.c
7171
@MODULE__LZMA_TRUE@_lzma _lzmamodule.c
72+
@MODULE__ZSTD_TRUE@_zstd _zstd/_zstdmodule.c _zstd/zdict.c _zstd/compressor.c _zstd/decompressor.c
7273
@MODULE_ZLIB_TRUE@zlib zlibmodule.c
7374

7475
# dbm/gdbm

0 commit comments

Comments
 (0)