Skip to content

BUG: groupby-transform produces NaN for series keys with as_index=False #37093

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
thomas-reineking-by opened this issue Oct 13, 2020 · 6 comments · Fixed by #58012
Closed
2 of 3 tasks

BUG: groupby-transform produces NaN for series keys with as_index=False #37093

thomas-reineking-by opened this issue Oct 13, 2020 · 6 comments · Fixed by #58012
Assignees
Labels
Apply Apply, Aggregate, Transform, Map good first issue Groupby Needs Tests Unit test(s) needed to prevent regressions

Comments

@thomas-reineking-by
Copy link

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
df = pd.DataFrame({"X": [1.0]})
df.groupby(pd.Series(["group"]), as_index=False).transform("sum")

Problem description

With version 1.1.3 this produces a dataframe containing NaN.

Expected Output

Expected is 1.0 and this is also what older versions produce (1.0.5).

It seems the following conditions have to be satisfied to trigger this bug:

  • grouping by a series which does not contain indices corresponding to the original dataframe (grouping by pd.Series([0]) works though it seems accidental)
  • using as_index=False
  • using transform

Output of pd.show_versions()

INSTALLED VERSIONS

commit : db08276
python : 3.6.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.76-linuxkit
Version : #1 SMP Tue May 26 11:42:35 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.2.3
setuptools : 41.2.0
Cython : 0.29.15
pytest : 5.4.3
hypothesis : 5.6.0
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.6.2
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.17.1.post2
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.15
tables : None
tabulate : 0.8.6
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.48.0

@thomas-reineking-by thomas-reineking-by added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 13, 2020
@ssmssam
Copy link

ssmssam commented Oct 18, 2020

The issue seems to occur with this line in generic.py added in June
result = result.reindex(self.grouper.result_index, copy=False)

@rhshadrach
Copy link
Member

Thanks for the report! Agreed that this is an issue, but I don't think the line identified is the cause. This path relies on computing the aggregation and then broadcasting the result to the frame. as_index=False changes the result of the reduction so that the broadcasting fails. This can be fixed by temporarily setting as_index=True within the transform call via pandas.core.common.temp_setattr.

Tangential to this, the code

df = pd.DataFrame({0: [1.0]})
gb = df.groupby(pd.Series(["X"]), as_index=False)
print(gb.sum())

gives

     0
0  1.0

whereas I think "X" should be added as a column to the result. Resolving this won't impact this issue however (and in fact, may make the output worse!). Will open a new issue after a little more investigating.

@rhshadrach rhshadrach added Apply Apply, Aggregate, Transform, Map Groupby and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 21, 2021
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Jan 21, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
@rhshadrach
Copy link
Member

The OP now produces

     X
0  1.0

and this looks like the expected result to me.

@rhshadrach rhshadrach added Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Apr 23, 2023
@DevpriyaDave
Copy link

First time contributor going to try to write test case for this

@DevpriyaDave
Copy link

take

@undermyumbrella1
Copy link
Contributor

Hi, I have created an pr with the tests. Can I request for some help reviewing it? thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map good first issue Groupby Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
6 participants