Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Preserve extension dtypes in MultiIndex during concat (#58421) #61211

Closed

Conversation

afonso-antunes
Copy link

@afonso-antunes afonso-antunes commented Apr 1, 2025

Fix Summary:

Previously, the _make_concat_multiindex method could silently downgrade extension dtypes (e.g., to object) when creating levels. This PR ensures that the _concat_indexes helper uses the correct dtype-aware construction (array(..., dtype=...)) to preserve the original dtype of the first index.

Test added:

Added a test in pandas/tests/frame/methods/test_concat_arrow_index.py that covers the preservation of extension dtypes when using pd.concat with keys= that triggers MultiIndex creation.

The test creates two DataFrames with timestamp[pyarrow] indices, then concatenates them with pd.concat(..., keys=...) and asserts that:

  • The resulting index is a MultiIndex
  • The second level (levels[1]) retains the ArrowDtype('timestamp[us][pyarrow]') instead of being downgraded to object.

This ensures the dtype preservation fix is validated and regressed against.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Index[timestamp[pyarrow]].union with itself return object type
1 participant