Skip to content

fix(type): add bool and List[bool] for join's on input #38168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions python/pyspark/sql/dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -2044,7 +2044,7 @@ def crossJoin(self, other: "DataFrame") -> "DataFrame":
def join(
self,
other: "DataFrame",
on: Optional[Union[str, List[str], Column, List[Column]]] = None,
on: Optional[Union[str, List[str], bool, List[bool], Column, List[Column]]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually an error from IDE (which assumes that the built-in functions always return bool). The expected types here are correct in fact.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @HyukjinKwon thanks for following up, just to clarify, do you mean df.name == df2.name will return type Column? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Author

@hongbo-miao hongbo-miao Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I found something interesting.

Intellij IDEA has no issue with pandas join's on.
image
But seems only has issue for pyspark.

Here is the way how pandas implement:

However, if it is still an IDE issue, the IDE definitely fix it for sure. 😃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, that's interesting. cc @zero323 fyi.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon As far as I can tell, there is no real difference here. The difference between Pandas and Spark check, is most likely related to the missing stubs for the former one. If I use environment without pandas-stubs things type check in PyCharm

without pandas stubs

If I choose one with pandas-stubs installed, I get

with pandas stubs

which is the same category of failure as for PySpark code.

Given mypy as a reference, this is an expected false positive (see python/mypy#2783).

On a side note ‒ Pandas and PySpark joins shown on the screenshots are not even remotely equivalent.

how: Optional[str] = None,
) -> "DataFrame":
"""Joins with another :class:`DataFrame`, using the given join expression.
Expand Down Expand Up @@ -2165,7 +2165,7 @@ def _joinAsOf(
other: "DataFrame",
leftAsOfColumn: Union[str, Column],
rightAsOfColumn: Union[str, Column],
on: Optional[Union[str, List[str], Column, List[Column]]] = None,
on: Optional[Union[str, List[str], bool, List[bool], Column, List[Column]]] = None,
how: Optional[str] = None,
*,
tolerance: Optional[Column] = None,
Expand Down