Skip to content

BUG: divmod return type #22932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Oct 8, 2018
15 changes: 12 additions & 3 deletions doc/source/extending.rst
Original file line number Diff line number Diff line change
Expand Up @@ -160,9 +160,18 @@ your ``MyExtensionArray`` class, as follows:
MyExtensionArray._add_arithmetic_ops()
MyExtensionArray._add_comparison_ops()

Note that since ``pandas`` automatically calls the underlying operator on each
element one-by-one, this might not be as performant as implementing your own
version of the associated operators directly on the ``ExtensionArray``.

.. note::

Since ``pandas`` automatically calls the underlying operator on each
element one-by-one, this might not be as performant as implementing your own
version of the associated operators directly on the ``ExtensionArray``.

For arithmetic operations, this implementation will try to reconstruct a new
``ExtensionArray`` with the result of the element-wise operation. Whether
or not that succeeds depends on whether the operation returns a result
that's valid for the ``ExtensionArray``. If an ``ExtensionArray`` cannot
be reconstructed, an ndarray containing the scalars returned instead.

.. _extending.extension.testing:

Expand Down
21 changes: 14 additions & 7 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -781,17 +781,24 @@ def convert_values(param):
# a TypeError should be raised
res = [op(a, b) for (a, b) in zip(lvalues, rvalues)]

if coerce_to_dtype:
try:
res = self._from_sequence(res)
except Exception:
def _maybe_convert(arr):
if coerce_to_dtype:
# https://github.com/pandas-dev/pandas/issues/22850
# We catch all regular exceptions here, and fall back
# to an ndarray.
res = np.asarray(res)
try:
res = self._from_sequence(arr)
except Exception:
res = np.asarray(arr)
else:
res = np.asarray(arr)
return res

if op.__name__ in {'divmod', 'rdivmod'}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel can't we use dispatch_to_extension_op here to avoid duplication of code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked something similar a few days ago. If Tom says it isn't feasible, I believe him.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are inside the extension array here, so it would also be strange to use (which doesn't prevent that both could share a helper function, if that would be appropriate).
But here we need to construct the divmod correctly, while dispatch_to_extension_op should assume this is already done correctly by the EA

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which doesn't prevent that both could share a helper function,

Right. This is possible, if people want it. I'll push up a commit with some kind of do_extension_op that both of these call to so people can take a look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now that I take a look it's not so straightforward. The two are similar but just slightly different in enough places that they wouldn't benefit from sharing code really.

  1. The unboxing of values. dispatch_to_extension_op knows that at least one of the two is a Series[EA]. _binop knows that self is an EA.
  2. The op: dispatch_to_extension_op dispatches, _binop is defining it in a list comprehension
  3. The re-boxing: _binop has the whole maybe re-constructing _from_seqence that the dispatch_to_extension_op doesn't have to worry about at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They do not do "conceptually exactly the same thing". Paraphrasing myself from above:

The dispatch function calls the EA to perform an operation, the above code is the EA doing the operation.

Why would those two different things necessarily need to live in the same place / code path?

Of course, we could still move the whole EA._create_method to ops.py (which would indeed be similar as functions like add_flex_arithmetic_methods in ops.py that is used in series.py to add methods to Series). But this is then not related to the change in this PR, and should be left for another issue/PR to discuss (personally I don't think that would be an improvement).

I would see if its is possible to integrate these rather than adding a bunch of new code.

Well, and both Tom and me who have looked into the code, say: we don't think it is possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger I saw your comment. I also @jorisvandenbossche comments. I have not looked at this in detail, nor do I have time to. My point is that this instantly creates technical debt no matter how you slice it.

It may require some reorganization to integrate this, and I appreciate that. So happy to defer this, maybe @jbrockmendel has more insight.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll be doing a sparse-de-duplication PR following #22880, can take a fresh look at this then. In the interim, I wouldn't let this issue hold up this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that this instantly creates technical debt no matter how you slice it.

It really doesn't. They're doing two different things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is fixing a bug, not doing a refactor of how the ops on EA's are implemented** . If somebody want to look into that, it should be done in a separate PR anyway. So merging.

** and I fully acknowledge that sometimes, to properly fix a bug, you also need to refactor otherwise you just keep adding hacks. However, I don't think that is the case here, see all the comments above.

a, b = zip(*res)
res = _maybe_convert(a), _maybe_convert(b)
else:
res = np.asarray(res)

res = _maybe_convert(res)
return res

op_name = ops._get_op_name(op, True)
Expand Down
7 changes: 6 additions & 1 deletion pandas/tests/extension/base/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,8 @@ def test_arith_series_with_scalar(self, data, all_arithmetic_operators):
s = pd.Series(data)
self.check_opname(s, op_name, s.iloc[0], exc=TypeError)

@pytest.mark.xfail(run=False, reason="_reduce needs implementation")
@pytest.mark.xfail(run=False, reason="_reduce needs implementation",
strict=True)
def test_arith_frame_with_scalar(self, data, all_arithmetic_operators):
# frame & scalar
op_name = all_arithmetic_operators
Expand All @@ -77,6 +78,10 @@ def test_divmod(self, data):
self._check_divmod_op(s, divmod, 1, exc=TypeError)
self._check_divmod_op(1, ops.rdivmod, s, exc=TypeError)

def test_divmod_series_array(self, data):
s = pd.Series(data)
self._check_divmod_op(s, divmod, data)

def test_add_series_with_extension_array(self, data):
s = pd.Series(data)
result = s + data
Expand Down
29 changes: 24 additions & 5 deletions pandas/tests/extension/decimal/test_decimal.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

from pandas.tests.extension import base

from .array import DecimalDtype, DecimalArray, make_data
from .array import DecimalDtype, DecimalArray, make_data, to_decimal


@pytest.fixture
Expand Down Expand Up @@ -102,7 +102,7 @@ class TestInterface(BaseDecimal, base.BaseInterfaceTests):

class TestConstructors(BaseDecimal, base.BaseConstructorsTests):

@pytest.mark.xfail(reason="not implemented constructor from dtype")
@pytest.mark.skip(reason="not implemented constructor from dtype")
def test_from_dtype(self, data):
# construct from our dtype & string dtype
pass
Expand Down Expand Up @@ -240,9 +240,28 @@ def test_arith_series_with_array(self, data, all_arithmetic_operators):
context.traps[decimal.DivisionByZero] = divbyzerotrap
context.traps[decimal.InvalidOperation] = invalidoptrap

@pytest.mark.skip(reason="divmod not appropriate for decimal")
def test_divmod(self, data):
pass
@pytest.mark.parametrize("reverse, expected_div, expected_mod", [
(False, [0, 1, 1, 2], [1, 0, 1, 0]),
(True, [2, 1, 0, 0], [0, 0, 2, 2]),
])
def test_divmod_array(self, reverse, expected_div, expected_mod):
# https://github.com/pandas-dev/pandas/issues/22930
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to better indicate that this is an "extra" test (not overriding a base one)?

(not necessarily needs to be solved here, but question is relevant in general, as we sometimes add tests to eg decimal to test specific aspects not covered in the base tests, to clearly see which those tests are. Maybe we would put them outside the class?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it off the class makes the most sense. Moved.

arr = to_decimal([1, 2, 3, 4])
if reverse:
div, mod = divmod(2, arr)
else:
div, mod = divmod(arr, 2)
expected_div = to_decimal(expected_div)
expected_mod = to_decimal(expected_mod)

tm.assert_extension_array_equal(div, expected_div)
tm.assert_extension_array_equal(mod, expected_mod)

def _check_divmod_op(self, s, op, other, exc=NotImplementedError):
# We implement divmod
super(TestArithmeticOps, self)._check_divmod_op(
s, op, other, exc=None
)

def test_error(self):
pass
Expand Down
17 changes: 9 additions & 8 deletions pandas/tests/extension/json/test_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,7 @@ def test_custom_asserts(self):

class TestConstructors(BaseJSON, base.BaseConstructorsTests):

# TODO: Should this be pytest.mark.skip?
@pytest.mark.xfail(reason="not implemented constructor from dtype")
@pytest.mark.skip(reason="not implemented constructor from dtype")
def test_from_dtype(self, data):
# construct from our dtype & string dtype
pass
Expand All @@ -147,13 +146,11 @@ class TestGetitem(BaseJSON, base.BaseGetitemTests):


class TestMissing(BaseJSON, base.BaseMissingTests):
# TODO: Should this be pytest.mark.skip?
@pytest.mark.xfail(reason="Setting a dict as a scalar")
@pytest.mark.skip(reason="Setting a dict as a scalar")
def test_fillna_series(self):
"""We treat dictionaries as a mapping in fillna, not a scalar."""

# TODO: Should this be pytest.mark.skip?
@pytest.mark.xfail(reason="Setting a dict as a scalar")
@pytest.mark.skip(reason="Setting a dict as a scalar")
def test_fillna_frame(self):
"""We treat dictionaries as a mapping in fillna, not a scalar."""

Expand Down Expand Up @@ -204,8 +201,7 @@ def test_combine_add(self, data_repeated):


class TestCasting(BaseJSON, base.BaseCastingTests):
# TODO: Should this be pytest.mark.skip?
@pytest.mark.xfail(reason="failing on np.array(self, dtype=str)")
@pytest.mark.skip(reason="failing on np.array(self, dtype=str)")
def test_astype_str(self):
"""This currently fails in NumPy on np.array(self, dtype=str) with

Expand Down Expand Up @@ -257,6 +253,11 @@ def test_add_series_with_extension_array(self, data):
with tm.assert_raises_regex(TypeError, "unsupported"):
ser + data

def _check_divmod_op(self, s, op, other, exc=NotImplementedError):
return super(TestArithmeticOps, self)._check_divmod_op(
s, op, other, exc=TypeError
)


class TestComparisonOps(BaseJSON, base.BaseComparisonOpsTests):
pass
9 changes: 7 additions & 2 deletions pandas/tests/extension/test_categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,11 +140,11 @@ def test_take_series(self):
def test_reindex_non_na_fill_value(self):
pass

@pytest.mark.xfail(reason="Categorical.take buggy")
@pytest.mark.skip(reason="Categorical.take buggy")
def test_take_empty(self):
pass

@pytest.mark.xfail(reason="test not written correctly for categorical")
@pytest.mark.skip(reason="test not written correctly for categorical")
def test_reindex(self):
pass

Expand Down Expand Up @@ -208,6 +208,11 @@ def test_add_series_with_extension_array(self, data):
with tm.assert_raises_regex(TypeError, "cannot perform"):
ser + data

def _check_divmod_op(self, s, op, other, exc=NotImplementedError):
return super(TestArithmeticOps, self)._check_divmod_op(
s, op, other, exc=TypeError
)


class TestComparisonOps(base.BaseComparisonOpsTests):

Expand Down