-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CoW: Switch to copy=False everywhere for Series constructor #52068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks good, thanks! Added in a few cases the question to what extent we are sure we always have an array that isn't shared with an existing pandas object
pandas/core/arrays/categorical.py
Outdated
@@ -2504,15 +2508,15 @@ def codes(self) -> Series: | |||
""" | |||
from pandas import Series | |||
|
|||
return Series(self._parent.codes, index=self._index) | |||
return Series(self._parent.codes, index=self._index, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should maybe not be a copy=False? (modifying the series should never mutate those codes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point
pandas/core/arrays/categorical.py
Outdated
|
||
def _delegate_method(self, name, *args, **kwargs): | ||
from pandas import Series | ||
|
||
method = getattr(self._parent, name) | ||
res = method(*args, **kwargs) | ||
if res is not None: | ||
return Series(res, index=self._index, name=self._name) | ||
return Series(res, index=self._index, name=self._name, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are sure that none of the methods where this is used give a view? (didn't check in detail where this is used, just wondering)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, but not 100%. Lets leave this out for now
@@ -195,7 +195,7 @@ def coo_to_sparse_series( | |||
from pandas import SparseDtype | |||
|
|||
try: | |||
ser = Series(A.data, MultiIndex.from_arrays((A.row, A.col))) | |||
ser = Series(A.data, MultiIndex.from_arrays((A.row, A.col)), copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this keyword get exposed to the user (and with a default of copy=True) ?
I would say this is a similar case as Series(ndarray)
where we decided to change the default to True but give the ability for power uses to specify copy=False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The astype below creates a copy as far as I understand, so copy=False here should be good.
result = Series(result, index=index, name=self.name).__finalize__(self._parent) | ||
result = Series(result, index=index, name=self.name, copy=False).__finalize__( | ||
self._parent | ||
) | ||
|
||
# setting this object will show a SettingWithCopyWarning/Error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sidenote: seeing this here, this might also be a case to look into if we can raise a "chained assignment" error
pandas/core/indexes/accessors.py
Outdated
@@ -102,7 +102,9 @@ def _delegate_property_get(self, name): | |||
else: | |||
index = self._parent.index | |||
# return the result as a Series, which is by definition a copy | |||
result = Series(result, index=index, name=self.name).__finalize__(self._parent) | |||
result = Series(result, index=index, name=self.name, copy=False).__finalize__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment just above should be updated then. But also wondering if we have properties that can be a view, in which case we shouldn't necessarily do copy=False
? (but rather attach a ref to self._parent
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through the datetimelike properties/methods, I think theoretically things like round
/ceil
/floor
could return a view (if nothing needs to be rounded), but that's not an optimization we currently have I think.
Also tz_convert
/tz_localize
could theoretically return a view (but in practice also do not do that, from a quick test)
In [41]: s = pd.Series(pd.date_range("2012-01-01", periods=3, tz="UTC"))
In [43]: s._values._ndarray
Out[43]:
array(['2012-01-01T00:00:00.000000000', '2012-01-02T00:00:00.000000000',
'2012-01-03T00:00:00.000000000'], dtype='datetime64[ns]')
In [44]: s2 = s.dt.tz_convert("Europe/Brussels")
In [45]: np.shares_memory(s._values._ndarray, s2._values._ndarray)
Out[45]: False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually they return views if it's a no-op. So will remove the copy=False for now
merging this, removed all the cases that weren't totally clear. Will open an issue tomorrow |
…r Series constructor
…ere for Series constructor) (#52367) Backport PR #52068: CoW: Switch to copy=False everywhere for Series constructor Co-authored-by: Patrick Hoefler <[email protected]>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.