fix(type): add bool and List[bool] for join's on input #38168

hongbo-miao · 2022-10-09T03:16:28Z

What changes were proposed in this pull request?

I think join's on input can be bool or List[bool] type. For example, the demo in the comment is a valid demo:

Lines 2107 to 2108 in 309638e

    
                   >>> df.join(df2, df.name == df2.name, 'outer').select( 
        
                   ...     df.name, df2.height).sort(desc("name")).show()

Why are the changes needed?

I originally got this typing error in my IDE:

The command joins two table successfully on different columns, however, the typing definition is wrong I think.

Does this PR introduce any user-facing change?

Yes, I am using pyspark==3.3.0.

How was this patch tested?

After adding bool and List[bool], the typing error is gone.

AmplabJenkins · 2022-10-10T01:45:39Z

Can one of the admins verify this patch?

HyukjinKwon · 2022-10-10T11:12:27Z

python/pyspark/sql/dataframe.py

@@ -2044,7 +2044,7 @@ def crossJoin(self, other: "DataFrame") -> "DataFrame":
    def join(
        self,
        other: "DataFrame",
-        on: Optional[Union[str, List[str], Column, List[Column]]] = None,
+        on: Optional[Union[str, List[str], bool, List[bool], Column, List[Column]]] = None,


That's actually an error from IDE (which assumes that the built-in functions always return bool). The expected types here are correct in fact.

Hi @HyukjinKwon thanks for following up, just to clarify, do you mean df.name == df2.name will return type Column? Thanks!

I see. I found something interesting.

Intellij IDEA has no issue with pandas join's on.

But seems only has issue for pyspark.

Here is the way how pandas implement:

https://github.com/pandas-dev/pandas/blob/main/pandas/core/frame.py#L9837-L9840

https://github.com/pandas-dev/pandas/blob/main/pandas/_typing.py#L121

However, if it is still an IDE issue, the IDE definitely fix it for sure. 😃

Okay, that's interesting. cc @zero323 fyi.

@HyukjinKwon As far as I can tell, there is no real difference here. The difference between Pandas and Spark check, is most likely related to the missing stubs for the former one. If I use environment without pandas-stubs things type check in PyCharm

If I choose one with pandas-stubs installed, I get

which is the same category of failure as for PySpark code.

Given mypy as a reference, this is an expected false positive (see python/mypy#2783).

On a side note ‒ Pandas and PySpark joins shown on the screenshots are not even remotely equivalent.

github-actions bot added CORE PYTHON SQL labels Oct 9, 2022

fix(type): add bool for join's on input

e4d2d6d

hongbo-miao force-pushed the patch-1 branch from b805155 to e4d2d6d Compare October 9, 2022 03:24

hongbo-miao changed the title ~~fix(type): add bool for join's on input~~ fix(type): add bool and List[bool] for join's on input Oct 9, 2022

HyukjinKwon reviewed Oct 10, 2022

View reviewed changes

srowen closed this Nov 29, 2022

hongbo-miao deleted the patch-1 branch November 29, 2022 18:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(type): add bool and List[bool] for join's on input #38168

fix(type): add bool and List[bool] for join's on input #38168

Uh oh!

hongbo-miao commented Oct 9, 2022 •

edited

Loading

Uh oh!

AmplabJenkins commented Oct 10, 2022

Uh oh!

HyukjinKwon Oct 10, 2022

Uh oh!

hongbo-miao Oct 10, 2022

Uh oh!

HyukjinKwon Oct 11, 2022

Uh oh!

hongbo-miao Oct 11, 2022 •

edited

Loading

Uh oh!

HyukjinKwon Oct 11, 2022

Uh oh!

zero323 Nov 27, 2022

Uh oh!

Uh oh!

	>>> df.join(df2, df.name == df2.name, 'outer').select(
	... df.name, df2.height).sort(desc("name")).show()

fix(type): add bool and List[bool] for join's on input #38168

fix(type): add bool and List[bool] for join's on input #38168

Uh oh!

Conversation

hongbo-miao commented Oct 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

AmplabJenkins commented Oct 10, 2022

Uh oh!

HyukjinKwon Oct 10, 2022

Choose a reason for hiding this comment

Uh oh!

hongbo-miao Oct 10, 2022

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Oct 11, 2022

Choose a reason for hiding this comment

Uh oh!

hongbo-miao Oct 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Oct 11, 2022

Choose a reason for hiding this comment

Uh oh!

zero323 Nov 27, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hongbo-miao commented Oct 9, 2022 •

edited

Loading

hongbo-miao Oct 11, 2022 •

edited

Loading