Skip to content

PERF: Add short circuiting to RangeIndex._shallow_copy #57534

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 20, 2024
48 changes: 38 additions & 10 deletions pandas/core/indexes/range.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
index as libindex,
lib,
)
from pandas._libs.algos import unique_deltas
from pandas._libs.lib import no_default
from pandas.compat.numpy import function as nv
from pandas.util._decorators import (
Expand Down Expand Up @@ -62,6 +61,37 @@
_dtype_int64 = np.dtype(np.int64)


def has_range_delta(values) -> bool | int:
"""
Check if values have unique difference for RangeIndex._shallow_copy.

values must have more than 2 values.
If there is a unique diff, it cannot be zero.

Parameters
----------
values : 1D iterable

Returns
-------
bool or int
False if there isn't a unique delta
int if there's a non-zero, unique delta
"""
if len(values) < 2:
return False
diff = values[1] - values[0]
if diff == 0:
return False
curr = values[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still pretty slow I think, I think what you are looking for is is_range_indexer that we built for CoW with a stepsize argument added

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha. I refactored to use is_range_indexer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, that will only work with stepsize 1 for now, I am open to support other step sizes as well if you like (follow up should be fine)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I "normalize" the values by the stepsize (values - values[0] / values[1] - values[0]) so that is_range_indexer can always be used here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, yeah that makes sense

for val in values[2:]:
new_diff = val - curr
if new_diff != diff:
return False
curr = val
return diff


class RangeIndex(Index):
"""
Immutable Index implementing a monotonic integer range.
Expand Down Expand Up @@ -469,15 +499,13 @@ def _shallow_copy(self, values, name: Hashable = no_default):

if values.dtype.kind == "f":
return Index(values, name=name, dtype=np.float64)
# GH 46675 & 43885: If values is equally spaced, return a
# more memory-compact RangeIndex instead of Index with 64-bit dtype
unique_diffs = unique_deltas(values)
if len(unique_diffs) == 1 and unique_diffs[0] != 0:
diff = unique_diffs[0]
new_range = range(values[0], values[-1] + diff, diff)
return type(self)._simple_new(new_range, name=name)
else:
return self._constructor._simple_new(values, name=name)
if values.dtype.kind == "i" and values.ndim == 1:
# GH 46675 & 43885: If values is equally spaced, return a
# more memory-compact RangeIndex instead of Index with 64-bit dtype
if diff := has_range_delta(values):
new_range = range(values[0], values[-1] + diff, diff)
return type(self)._simple_new(new_range, name=name)
return self._constructor._simple_new(values, name=name)

def _view(self) -> Self:
result = type(self)._simple_new(self._range, name=self._name)
Expand Down