You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-29341][PYTHON] Upgrade cloudpickle to 1.0.0
### What changes were proposed in this pull request?
This patch upgrades cloudpickle to 1.0.0 version.
Main changes:
1. cleanup unused functions: cloudpipe/cloudpickle@936f16f
2. Fix relative imports inside function body: cloudpipe/cloudpickle@31ecdd6
3. Write kw only arguments to pickle: cloudpipe/cloudpickle@6cb4718
### Why are the changes needed?
We should include new bug fix like cloudpipe/cloudpickle@6cb4718, because users might use such python function in PySpark.
```python
>>> def f(a, *, b=1):
... return a + b
...
>>> rdd = sc.parallelize([1, 2, 3])
>>> rdd.map(f).collect()
[Stage 0:> (0 + 12) / 12]19/10/03 00:42:24 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/spark/python/lib/pyspark.zip/pyspark/worker.py", line 598, in main
process()
File "/spark/python/lib/pyspark.zip/pyspark/worker.py", line 590, in process
serializer.dump_stream(out_iter, outfile)
File "/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 513, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "/spark/python/lib/pyspark.zip/pyspark/util.py", line 99, in wrapper
return f(*args, **kwargs)
TypeError: f() missing 1 required keyword-only argument: 'b'
```
After:
```python
>>> def f(a, *, b=1):
... return a + b
...
>>> rdd = sc.parallelize([1, 2, 3])
>>> rdd.map(f).collect()
[2, 3, 4]
```
### Does this PR introduce any user-facing change?
Yes. This fixes two bugs when pickling Python functions.
### How was this patch tested?
Existing tests.
Closes#26009 from viirya/upgrade-cloudpickle.
Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
0 commit comments