Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Index name lost when using "resample" with pyarrow dtypes #61222

Open
2 of 3 tasks
mthiboust opened this issue Apr 3, 2025 · 2 comments · May be fixed by #61229
Open
2 of 3 tasks

BUG: Index name lost when using "resample" with pyarrow dtypes #61222

mthiboust opened this issue Apr 3, 2025 · 2 comments · May be fixed by #61229
Labels
Arrow pyarrow functionality Bug Resample resample method

Comments

@mthiboust
Copy link

mthiboust commented Apr 3, 2025

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# Create a df with DatetimeIndex called "timestamp" and native pandas dtype
native_df = pd.DataFrame(
    {'value': [23.5, 24.1, 22.8, 25.3, 23.9]},
    index=pd.date_range(start='2025-01-01 00:00:00', end='2025-01-01 04:00:00', freq='h'),
)
native_df.index.name = "timestamp"

# Create a similar df with pyarrow dtypes
pyarrow_df = native_df.copy()
pyarrow_df.index = pyarrow_df.index.astype('timestamp[ns][pyarrow]')
pyarrow_df["value"] = pyarrow_df["value"].astype('float64[pyarrow]')

native_df.resample("2h").mean().reset_index()["timestamp"] # OK
pyarrow_df.resample("2h").mean().reset_index()["timestamp"] # KeyError: 'timestamp'

Issue Description

The resample forget the name of the index when using pyarrow dtypes.

By the way, I notice that DatetimeIndex are converted to Index when using pyarrow dtypes. Maybe it is related?

See concrete example in screenshot
Image

Expected Behavior

The resample methods is expected to behave in the same way for pyarrow and native dtypes.

Installed Versions

INSTALLED VERSIONS

python : 3.12.8
python-bits : 64
OS : Linux
OS-release : 5.10.234-225.910.amzn2.x86_64
Version : #1 SMP Fri Feb 14 16:52:40 UTC 2025
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : C.UTF-8
LANG : C.UTF-8
LOCALE : C.UTF-8

pandas : 2.2.3
numpy : 1.26.4
pytz : 2025.1
dateutil : 2.9.0.post0
pip : 25.0.1
Cython : None
sphinx : None
IPython : 8.32.0
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.13.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2025.3.2
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : 3.1.5
lxml.etree : None
matplotlib : 3.10.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 19.0.1
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : None
s3fs : 2025.3.2
scipy : 1.15.2
sqlalchemy : 2.0.38
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : 0.23.0
tzdata : 2025.2
qtpy : None
pyqt5 : None

@mthiboust mthiboust added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 3, 2025
@mthiboust
Copy link
Author

If it helps, I currently use this safe_resample() function as a temporary workaround:

T = TypeVar("T", pd.DataFrame, pd.Series)


class IndexPreservingResampler(Generic[T]):
    """A resampler that preserves the index name of the input DataFrame or Series."""

    def __init__(self, resampler: pd.core.resample.Resampler, idx_name: str | None) -> None:
        self._resampler = resampler
        self._index_name = idx_name

    def __getattr__(self, name: str) -> Any:
        method = getattr(self._resampler, name)

        if not callable(method):
            return method

        def wrapped(*args: Any, **kwargs: Any) -> T:
            result = method(*args, **kwargs)
            if hasattr(result, "index"):
                result.index.name = self._index_name
            return result

        return wrapped


def safe_resample(
    df: T,
    freq: str,
    **kwargs: Any,
) -> IndexPreservingResampler[T]:
    """Resample a DataFrame or Series while preserving the index name.

    When using pyarrow dtypes, the index name is lost after resampling.
    This is a temporary fix to preserve the index name.
    See https://github.com/pandas-dev/pandas/issues/61222

    Args:
        df: The DataFrame or Series to resample
        freq: The frequency to resample to
        **kwargs: Additional arguments to pass to pandas resample method

    Returns:
        A Resampler object that will preserve the index name after aggregation
    """
    index_name = df.index.name
    return IndexPreservingResampler(df.resample(freq, **kwargs), index_name)

Using it in practice:

pyarrow_df.resample("2h").mean().reset_index()["timestamp"] # KeyError: 'timestamp'
safe_resample(pyarrow_df, "2h").mean().reset_index()["timestamp"]  # OK

mthiboust added a commit to mthiboust/pandas that referenced this issue Apr 3, 2025
mthiboust added a commit to mthiboust/pandas that referenced this issue Apr 3, 2025
@snitish
Copy link
Member

snitish commented Apr 4, 2025

Thanks for reporting. Confirmed on main. PRs to fix are welcome.

@snitish snitish added Resample resample method Arrow pyarrow functionality and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 4, 2025
mthiboust added a commit to mthiboust/pandas that referenced this issue Apr 4, 2025
mthiboust added a commit to mthiboust/pandas that referenced this issue Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Resample resample method
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants