BUG: groupby-transform produces NaN for series keys with as_index=False #37093

thomas-reineking-by · 2020-10-13T07:09:16Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd
df = pd.DataFrame({"X": [1.0]})
df.groupby(pd.Series(["group"]), as_index=False).transform("sum")

Problem description

With version 1.1.3 this produces a dataframe containing NaN.

Expected Output

Expected is 1.0 and this is also what older versions produce (1.0.5).

It seems the following conditions have to be satisfied to trigger this bug:

grouping by a series which does not contain indices corresponding to the original dataframe (grouping by pd.Series([0]) works though it seems accidental)
using as_index=False
using transform

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : db08276
python : 3.6.6.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.76-linuxkit
Version : #1 SMP Tue May 26 11:42:35 UTC 2020
machine : x86_64
processor :
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.3
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 19.2.3
setuptools : 41.2.0
Cython : 0.29.15
pytest : 5.4.3
hypothesis : 5.6.0
sphinx : 1.8.5
blosc : None
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.0
html5lib : None
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.6.2
fastparquet : None
gcsfs : None
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.17.1.post2
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.2.1
sqlalchemy : 1.3.15
tables : None
tabulate : 0.8.6
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

ssmssam · 2020-10-18T19:09:57Z

The issue seems to occur with this line in generic.py added in June
result = result.reindex(self.grouper.result_index, copy=False)

rhshadrach · 2021-01-21T05:04:15Z

Thanks for the report! Agreed that this is an issue, but I don't think the line identified is the cause. This path relies on computing the aggregation and then broadcasting the result to the frame. as_index=False changes the result of the reduction so that the broadcasting fails. This can be fixed by temporarily setting as_index=True within the transform call via pandas.core.common.temp_setattr.

Tangential to this, the code

df = pd.DataFrame({0: [1.0]})
gb = df.groupby(pd.Series(["X"]), as_index=False)
print(gb.sum())

gives

     0
0  1.0

whereas I think "X" should be added as a column to the result. Resolving this won't impact this issue however (and in fact, may make the output worse!). Will open a new issue after a little more investigating.

rhshadrach · 2023-04-23T16:18:32Z

The OP now produces

     X
0  1.0

and this looks like the expected result to me.

DevpriyaDave · 2023-12-20T21:23:59Z

First time contributor going to try to write test case for this

DevpriyaDave · 2023-12-20T21:24:23Z

take

undermyumbrella1 · 2024-03-26T10:48:18Z

Hi, I have created an pr with the tests. Can I request for some help reviewing it? thank you!

thomas-reineking-by added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 13, 2020

rhshadrach added Apply Apply, Aggregate, Transform, Map Groupby and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 21, 2021

rhshadrach added this to the Contributions Welcome milestone Jan 21, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

asishm mentioned this issue Nov 22, 2022

BUG: transform("max") replaces all entries with NaN when applied after groupby with as_index=False #49834

Closed

3 tasks

rhshadrach added Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Apr 23, 2023

github-actions bot assigned DevpriyaDave Dec 20, 2023

DevpriyaDave mentioned this issue Jan 17, 2024

TST: Add test groupby transform no column #56935

Closed

5 tasks

rhshadrach added the good first issue label Mar 2, 2024

undermyumbrella1 mentioned this issue Mar 26, 2024

Add tests for transform sum with series #58012

Merged

5 tasks

mroeschke closed this as completed in #58012 Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: groupby-transform produces NaN for series keys with as_index=False #37093

BUG: groupby-transform produces NaN for series keys with as_index=False #37093

thomas-reineking-by commented Oct 13, 2020

INSTALLED VERSIONS

ssmssam commented Oct 18, 2020

rhshadrach commented Jan 21, 2021

rhshadrach commented Apr 23, 2023

DevpriyaDave commented Dec 20, 2023

DevpriyaDave commented Dec 20, 2023

undermyumbrella1 commented Mar 26, 2024

BUG: groupby-transform produces NaN for series keys with as_index=False #37093

BUG: groupby-transform produces NaN for series keys with as_index=False #37093

Comments

thomas-reineking-by commented Oct 13, 2020

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

ssmssam commented Oct 18, 2020

rhshadrach commented Jan 21, 2021

rhshadrach commented Apr 23, 2023

DevpriyaDave commented Dec 20, 2023

DevpriyaDave commented Dec 20, 2023

undermyumbrella1 commented Mar 26, 2024

Output of `pd.show_versions()`