-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: groupby.groups with NA categories fails #61364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
result = g.groups | ||
expected = {"a": Index(["x", "z"])} | ||
if not dropna: | ||
expected |= {np.nan: Index(["y"])} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When both arguments are False, should NaN come after non-observed groups? That seems more intuitive to me, especially for an ordered categorical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No - if you do an operation like sum the order here matches the order in that result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I'm getting on both main and 2.2.3.
>>> df = DataFrame(
... {"cat": Categorical(["a", np.nan, "a"], categories=list("adb"))},
... index=list("xyz"),
... )
>>> df["val"] = [1, 2, 3]
>>> g = df.groupby("cat", observed=False, dropna=False)
>>> g.sum()
val
cat
a 4
d 0
b 0
NaN 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, tm.assert_dict_equal
appears to be order-invariant, so it doesn't matter for the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see now. I was correct in that the order was the same, but I failed to notice that the test added the groups in the incorrect order. I do wonder if assert_dict_equal
should default to checking the order (perhaps with an argument to ignore order).
Thanks @rhshadrach |
DataFrameGroupBy.groups
fails when Categorical indexer contains NaNs anddropna=False
#61356 (Replace xxxx with the GitHub issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.There is a slight code duplication here, but we don't need to rely on Cateorical's codes because we can just directly use groupby's. We also can't use
groupby
to implementIndex.groupby
because the former only works in the case where thevalues
are exhaustive.