Skip to content

DataFrame.join Copy-on-Write optimization tests #52751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 21, 2023

Conversation

SecretLake
Copy link
Contributor

@SecretLake SecretLake commented Apr 18, 2023

@phofl @noatamir @jorisvandenbossche @MarcoGorelli Could you guys please review this PR? Thanks again for the session today.

@SecretLake SecretLake changed the title DF join cow tests DataFrame.join Copy-on-Write optimization tests Apr 18, 2023
@noatamir noatamir added the Sprints Sprint Pull Requests label Apr 18, 2023
@noatamir
Copy link
Member

noatamir commented Apr 18, 2023

Thanks for the PR @SecretLake! The CI caught some errors which I think you can solve locally if you run pre-commit. Let us know if you need help.

@SecretLake
Copy link
Contributor Author

Thanks for the feedback @noatamir. I don‘t see any errors in the CI. Could you pls point me to the failed runs?

@noatamir
Copy link
Member

Oops. I thought I saw one earlier. My bad 🙈

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comments, looks good generally

shares the same memory with original dataframes until it is edited.
"""
df1 = DataFrame({"key": ["a", "b", "c"], "a": [1, 2, 3]})
df2 = DataFrame({"key": ["a", "b", "c"], "b": [4, 5, 6]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you define index=Index(["a", "b", "c"], name="key") instead of using it as a column? We always try to create a test with the least number of operations possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, thank you for the review. Implemented changes based on the comments.



def test_join_on_key(using_copy_on_write):
"""Test if DataFrame.join applies Copy-On-Write optimization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the comment? We generally don't add comments in tests

@phofl phofl added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Copy / view semantics labels Apr 21, 2023
@phofl phofl added this to the 2.1 milestone Apr 21, 2023
@phofl phofl merged commit 17345e3 into pandas-dev:main Apr 21, 2023
@phofl
Copy link
Member

phofl commented Apr 21, 2023

thx @SecretLake

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Copy / view semantics Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sprints Sprint Pull Requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants