Skip to content

CoW: Switch to copy=False everywhere for Series constructor #52068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 2, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Mar 18, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@phofl phofl added this to the 2.0 milestone Mar 18, 2023
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, thanks! Added in a few cases the question to what extent we are sure we always have an array that isn't shared with an existing pandas object

@@ -2504,15 +2508,15 @@ def codes(self) -> Series:
"""
from pandas import Series

return Series(self._parent.codes, index=self._index)
return Series(self._parent.codes, index=self._index, copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should maybe not be a copy=False? (modifying the series should never mutate those codes)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point


def _delegate_method(self, name, *args, **kwargs):
from pandas import Series

method = getattr(self._parent, name)
res = method(*args, **kwargs)
if res is not None:
return Series(res, index=self._index, name=self._name)
return Series(res, index=self._index, name=self._name, copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are sure that none of the methods where this is used give a view? (didn't check in detail where this is used, just wondering)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but not 100%. Lets leave this out for now

@@ -195,7 +195,7 @@ def coo_to_sparse_series(
from pandas import SparseDtype

try:
ser = Series(A.data, MultiIndex.from_arrays((A.row, A.col)))
ser = Series(A.data, MultiIndex.from_arrays((A.row, A.col)), copy=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this keyword get exposed to the user (and with a default of copy=True) ?

I would say this is a similar case as Series(ndarray) where we decided to change the default to True but give the ability for power uses to specify copy=False

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The astype below creates a copy as far as I understand, so copy=False here should be good.

result = Series(result, index=index, name=self.name).__finalize__(self._parent)
result = Series(result, index=index, name=self.name, copy=False).__finalize__(
self._parent
)

# setting this object will show a SettingWithCopyWarning/Error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sidenote: seeing this here, this might also be a case to look into if we can raise a "chained assignment" error

@@ -102,7 +102,9 @@ def _delegate_property_get(self, name):
else:
index = self._parent.index
# return the result as a Series, which is by definition a copy
result = Series(result, index=index, name=self.name).__finalize__(self._parent)
result = Series(result, index=index, name=self.name, copy=False).__finalize__(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment just above should be updated then. But also wondering if we have properties that can be a view, in which case we shouldn't necessarily do copy=False? (but rather attach a ref to self._parent)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking through the datetimelike properties/methods, I think theoretically things like round/ceil/floor could return a view (if nothing needs to be rounded), but that's not an optimization we currently have I think.
Also tz_convert/tz_localize could theoretically return a view (but in practice also do not do that, from a quick test)

In [41]: s = pd.Series(pd.date_range("2012-01-01", periods=3, tz="UTC"))

In [43]: s._values._ndarray
Out[43]: 
array(['2012-01-01T00:00:00.000000000', '2012-01-02T00:00:00.000000000',
       '2012-01-03T00:00:00.000000000'], dtype='datetime64[ns]')

In [44]: s2 = s.dt.tz_convert("Europe/Brussels")

In [45]: np.shares_memory(s._values._ndarray, s2._values._ndarray)
Out[45]: False

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually they return views if it's a no-op. So will remove the copy=False for now

@phofl
Copy link
Member Author

phofl commented Apr 2, 2023

merging this, removed all the cases that weren't totally clear. Will open an issue tomorrow

@phofl phofl merged commit bcc5160 into pandas-dev:main Apr 2, 2023
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Apr 2, 2023
mroeschke pushed a commit that referenced this pull request Apr 3, 2023
…ere for Series constructor) (#52367)

Backport PR #52068: CoW: Switch to copy=False everywhere for Series constructor

Co-authored-by: Patrick Hoefler <[email protected]>
@phofl phofl deleted the cow_series_copy_false_everywhere branch April 3, 2023 08:06
topper-123 pushed a commit to topper-123/pandas that referenced this pull request Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants