Raising ValueError when xlim and ylim contain non-int and non-float dtype elements #40889

regmibijay · 2021-04-11T23:55:09Z

closes BUG: xlim and ylim not restricting plot area #40781
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

mzeitlin11

Thanks for the pr @regmibijay! Some comments, plus needs tests which hit the new code added

mzeitlin11 · 2021-04-12T00:56:16Z

doc/source/whatsnew/v1.2.4.rst

@@ -29,7 +29,8 @@ Fixed regressions
 Bug fixes
 ~~~~~~~~~

-
+- Bug in :func:`DataFrame.plot.line` was taking ``xlim`` and ``ylim`` kwargs even if they had string elements, now ``ValueError`` is raised if elements are not ``int`` or ``float`` dtype.
+


Only need the entry in v1.3.0, can revert this

will be done

mzeitlin11 · 2021-04-12T00:56:41Z

pandas/plotting/_matplotlib/core.py

@@ -505,16 +505,35 @@ def _adorn_subplots(self):
            )

        for ax in self.axes:
+


Best to keep out unrelated changes if possible

I am not sure what was done here, do I add a newline?

This diff just shows that you added a newline where there wasn't one before. So if you remove it, this change will go away

mzeitlin11 · 2021-04-12T00:59:35Z

doc/source/whatsnew/v1.3.0.rst

@@ -753,7 +753,7 @@ Plotting
 - Prevent warnings when matplotlib's ``constrained_layout`` is enabled (:issue:`25261`)
 - Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``yerr`` while others didn't (partial fix of :issue:`39522`)
 - Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``secondary_y`` and others use ``legend=False`` (:issue:`40044`)
-
+- Bug in :func:`DataFrame.plot.line` was taking ``xlim`` and ``ylim`` kwargs even if they had string elements, now ``ValueError`` is raised if elements are not ``int`` or ``float`` dtype.


Can you link to the issue here? (see above entries for examples of how to do that)

Also, this is not specific to line right? Should it just reference :func:DataFrame.plot?

will be done

Also, this is not specific to line right? Should it just reference :func:DataFrame.plot?

I have only tested it for DataFrame.plot(kind="line") and DataFrame.plot.line, so will reference Dataframe.plot then

mzeitlin11 · 2021-04-12T01:03:34Z

pandas/plotting/_matplotlib/core.py

+                            " or integer datatype.\n"
+                            "`xlim` has unsupported dtype"
+                            f" {type(elem)} with value {elem}.\n"
+                        )


This logic looks the same for both xlim and ylim, can you find a way to share it?

Addressing this with a lambda function

Sharing the message is nice, but would be great to also share the validation logic itself (eg the iteration and type checking). Also, I think generally it's discouraged to assign an anonymous function to a variable, if doing that better to be explicit and write a function (which could also be inline).

regmibijay

pushed new changes addressing previous comments

mzeitlin11 · 2021-04-12T15:09:27Z

pushed new changes addressing previous comments

Can you please add tests which hit the added code?

regmibijay · 2021-04-12T15:11:13Z

pushed new changes addressing previous comments

Can you please add tests which hit the added code?

I am not sure how to do that, I tested using an arbitrary dataset I created. Could you link me a documentation to pandas standard?

mzeitlin11 · 2021-04-12T15:14:35Z

pushed new changes addressing previous comments

Can you please add tests which hit the added code?

I am not sure how to do that, I tested using an arbitrary dataset I created. Could you link me a documentation to pandas standard?

This guide is extremely useful, please feel free to ask ask any questions that come up when reading it! https://pandas.pydata.org/docs/development/contributing.html

Also the precommit section will be useful, since that is currently failing on this pr.

mzeitlin11 · 2021-04-12T15:16:02Z

Outside of the guide, looking at some tests in pandas/tests/plotting can give an idea of testing style

charlesdong1991 · 2021-04-12T18:17:24Z

doc/source/whatsnew/v1.3.0.rst

@@ -753,7 +753,7 @@ Plotting
 - Prevent warnings when matplotlib's ``constrained_layout`` is enabled (:issue:`25261`)
 - Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``yerr`` while others didn't (partial fix of :issue:`39522`)
 - Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``secondary_y`` and others use ``legend=False`` (:issue:`40044`)
-
+- Bug in :func:`DataFrame.plot` was taking ``xlim`` and ``ylim`` kwargs even if they had string elements, now ``ValueError`` is raised if elements are not ``int`` or ``float`` dtype (:issue:`40781`)


I think matplotlib could also take datetime object, e.g. something like ax.set_xlim([datetime1, datetime2]), could you please check if it is case? If so, would be nice to make changes accordingly in the PR to reflect.

I added datetime datatype to the filter too, I would love a feedback on the function.

make this simpler like was not raising for invalid dtyped elements to xlim/ylim

regmibijay · 2021-04-13T11:21:35Z

Outside of the guide, looking at some tests in pandas/tests/plotting can give an idea of testing style

I did my best effort on writing the tests for individual cases. But it seems to have failed on CI, I would love a recommendation on this part too.

jreback · 2021-04-13T12:14:54Z

doc/source/whatsnew/v1.3.0.rst

@@ -753,7 +753,7 @@ Plotting
 - Prevent warnings when matplotlib's ``constrained_layout`` is enabled (:issue:`25261`)
 - Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``yerr`` while others didn't (partial fix of :issue:`39522`)
 - Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``secondary_y`` and others use ``legend=False`` (:issue:`40044`)
-
+- Bug in :func:`DataFrame.plot` was taking ``xlim`` and ``ylim`` kwargs even if they had string elements, now ``ValueError`` is raised if elements are not ``int`` or ``float`` dtype (:issue:`40781`)


make this simpler like was not raising for invalid dtyped elements to xlim/ylim

jreback · 2021-04-13T12:15:33Z

pandas/plotting/_matplotlib/core.py

@@ -489,6 +489,18 @@ def _post_plot_logic(self, ax, data):
        """Post process for each axes. Overridden in child classes"""
        pass

+    def _has_iterable_only_float_int_or_datetime(self, list_iterable: list) -> bool:


_has_valid_dtype

remove the double-doc-string

fully type list

btw, small idea on the name, could you maybe use _has_valid_lim_dtype? this is only targeting on dtypes of xlim or ylim, right? @regmibijay

jreback · 2021-04-13T12:15:41Z

pandas/plotting/_matplotlib/core.py

+    def _has_iterable_only_float_int_or_datetime(self, list_iterable: list) -> bool:
+        """checks if an iterable has float,int or datetime datatype """
+        """targeting #GH40781"""
+        from datetime import datetime as dt


don't import here

jreback · 2021-04-13T12:15:56Z

pandas/plotting/_matplotlib/core.py

+        """targeting #GH40781"""
+        from datetime import datetime as dt
+        if (
+            all(([isinstance(x, dt) for x in list_iterable]))


use is_* accessors

I tried implementing is_datetime* , is_datetime_or_timedelta* etc and could not get it to work with raw datetime.datetime formats. Any suggestions?

I think there are existing implementations, so should be no need to implement new ones

I mean I tried to use those existing implementations, but could not get it to work with datetime objects being passed as xlim or ylim

emm, you meant non of those implementations of is_* accessors in https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/common.py could work with datetime objects? That sounds a bit weird, could you pls elaborate or maybe post a new issue to address this?

I will try more samples and recreate an issue if that verifies with exact example soon!

jreback · 2021-04-13T12:16:31Z

pandas/plotting/_matplotlib/core.py

            if self.ylim is not None:
+                if not self._has_iterable_only_float_int_or_datetime(self.ylim):
+                    raise ValueError(


change the routine to _validate_valid_dtype and just raise inside

working on it currently, should fix soon.

I have changed the namespace accordingly

jreback · 2021-04-13T12:17:08Z

pandas/tests/plotting/test_xlim_ylim_dtype.py

+    try:
+        df.plot(kind="line", x="A", xlim=[1, 2], ylim=[3, 4])
+        return True
+    except ValueError:


never use try/except in tests like this see simliar tests on how to check for raising

I am looking into more pytesting, I have never written tests before so I would appreciate the guidance.

Hi, you have done a great job in writing those tests, great work!

I think what @jreback meant is to have something like:

msg = "error messages" with pytest.raises(ValueError, match=msg): your_code_which_generates_error

There should be many references in the codebase, for instance, https://github.com/pandas-dev/pandas/blob/master/pandas/tests/plotting/frame/test_frame.py

I hope this helps and please let me know if you have other questions

Thank you, I have started working on this, I will push new changes soon.

jreback · 2021-04-13T12:17:34Z

pandas/tests/plotting/test_xlim_ylim_dtype.py

+        pytest.raises(ValueError, match="FAILED")
+
+
+def test_if_plotable_xlim_first_int() -> bool:


parameterize the tests, you can put with the existing ones, e.g. you only need 1 additional parameterized test

I have added new test adressing your recommendations.

charlesdong1991

thanks for the update, good job! I have left few comments.

charlesdong1991 · 2021-04-26T20:16:10Z

pandas/plotting/_matplotlib/core.py

@@ -489,6 +490,17 @@ def _post_plot_logic(self, ax, data):
        """Post process for each axes. Overridden in child classes"""
        pass

+    def _has_valid_lim_dtype(self, list_iterable: list) -> bool:
+        """checks if an iterable has float,int or datetime datatype """


Suggested change

"""checks if an iterable has float,int or datetime datatype """

"""Check if an iterable has float, int or datetime datatype"""

requested change merged

charlesdong1991 · 2021-04-26T20:16:20Z

pandas/plotting/_matplotlib/core.py

@@ -489,6 +490,17 @@ def _post_plot_logic(self, ax, data):
        """Post process for each axes. Overridden in child classes"""
        pass

+    def _has_valid_lim_dtype(self, list_iterable: list) -> bool:
+        """checks if an iterable has float,int or datetime datatype """
+        """targeting #GH40781"""


Suggested change

"""targeting #GH40781"""

# GH40781

requested change merged

charlesdong1991 · 2021-04-26T20:18:56Z

pandas/tests/plotting/test_xlim_ylim_dtype.py

+from datetime import datetime as dt
+import numpy as np
+
+"""targeting #GH40781"""


please remove this line

the line has been removed

charlesdong1991 · 2021-04-26T20:20:21Z

pandas/tests/plotting/test_xlim_ylim_dtype.py

+import pytest
+import pandas as pd
+from datetime import datetime as dt
+import numpy as np


could you please use isort to resort the imports, otherwise, I think black check will fail.

I have formatted the import list with isort

you probably need to import matplotlib in order to fix the failed tests

I have added an import statement

@regmibijay I see that matplotlib import is still missing.
Can you double check that?

I am having trouble understanding
ModuleNotFoundError: No module named 'matplotlib' in the CI tests, I import matplotlib normally. Is there something very obvious I am missing?

@regmibijay
From the current test module, you actually do not need matplotlib to be imported.

The reason the CI fails is that because matplotlib is simply not installed for all pipelines.
See pandas/ci/deps for dependencies.

In order to ensure that your test module is skipped in the pipelines, which do not have matplotlib installed,
then you need to add pytest.imporotorskip to the top and probably mark the tests as slow.

pytest.importorskip("matplotlib") pytestmark = pytest.mark.slow

See pandas/tests/plotting/test_style.py, for instance.

I just pushed new commit with changes suggested by you.

Pre-commit is complaining about unsorted imports and indentation is some tests. Please look through the CI failure output and correct accordingly (black and isort utils will be helpful here).

ran isort and black and fixed issues CI was complaining about. I have no idea why black did not pick those issues up in my IDLE in first place.

charlesdong1991 · 2021-04-26T20:23:42Z

pandas/tests/plotting/test_xlim_ylim_dtype.py

+
+
+def test_axis_lim_invalid():
+    global df


ehh, try to avoid using global.

charlesdong1991 · 2021-04-26T20:24:14Z

pandas/tests/plotting/test_xlim_ylim_dtype.py

+    """
+    supplying ranges with str, datetime and
+    mixed int,str and float,str dtype to raise
+    ValueError. Valid dtypes are np.datetime64
+    int, and float.
+    """


generally there is no need to write docstrings for tests, could you please remove?

noted for future PRs and has been removed from current accordingly

charlesdong1991 · 2021-04-26T20:26:55Z

pandas/tests/plotting/test_xlim_ylim_dtype.py

+    lim = [
+        ["1", "3"],
+        [dt.now(), dt.now()],
+        [1, "2"],
+        [0.1, "0.2"],
+        [np.datetime64(dt.now()), dt.now()]
+    ]


could you try to parametrize?

The tests are parametrized now

I guess that @charlesdong1991 was referring to use of pytest.mark.parametrize.
Or did you forget to push new changes?

Thank you @ivanovmg, I have used following parametrization to create a custom dataframe, and I can see I have comitted it in latest commit. Can you please confirm it?

@pytest.mark.parametrize("df, lim", [ ( DataFrame( { "A" : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], "B": [dt.now() for i in range(0, 10)] } ), [ ["1", "3"], [dt.now(), dt.now()], [1, "2"], [0.1, "0.2"], [np.datetime64(dt.now()), dt.now()] ] ) ] )

Yes, I confirm. It was a mistake on my end - I looked at an outdated version.

charlesdong1991 · 2021-04-26T20:27:18Z

pandas/tests/plotting/test_xlim_ylim_dtype.py

+    lim = [
+        [1, 3],
+        [np.datetime64(dt.now()), np.datetime64(dt.now())],
+        [0.1, 0.2],
+    ]


could you try to parametrize?

charlesdong1991 · 2021-04-26T20:28:08Z

pandas/tests/plotting/test_xlim_ylim_dtype.py

+    global df
+    """
+    supplying ranges with
+    valid dtypes: np.datetime64
+    int, and float, this test
+    should not raise errors.
+    """


same comment as above

charlesdong1991

thanks for updating, the failed tests seem to be related to your change.

I put a comment on imports which might help fix your tests failures.

ivanovmg

Please see my comments.

ivanovmg · 2021-04-30T04:55:40Z

pandas/tests/plotting/test_xlim_ylim_dtype.py

+import pytest
+import pandas as pd
+from datetime import datetime as dt
+import numpy as np


@regmibijay I see that matplotlib import is still missing.
Can you double check that?

ivanovmg · 2021-04-30T04:57:24Z

pandas/tests/plotting/test_xlim_ylim_dtype.py

+    lim = [
+        ["1", "3"],
+        [dt.now(), dt.now()],
+        [1, "2"],
+        [0.1, "0.2"],
+        [np.datetime64(dt.now()), dt.now()]
+    ]


I guess that @charlesdong1991 was referring to use of pytest.mark.parametrize.
Or did you forget to push new changes?

ivanovmg · 2021-04-30T04:59:25Z

pandas/tests/plotting/test_xlim_ylim_dtype.py

+
+
+def test_axis_lim_invalid():
+    global df


I suggest that you do not use global df.
Instead, you may want to make a fixture returning your dataframe, and pass this fixture into your test functions.
Alternatively, you may create a class with your test methods and define an attribute with the dataframe (self.df), which you can reuse in the test methods.
Or probably you can just leave it as it is, but without global.

I used parametrization in pytest to create custom dataframe and ended up not having to use globals at all.

Please see: e3b69df and c00977a

My bad. I was looking at an outdated version.
Then it is not clear for me why the tests are failing.

simonjayhawkins · 2021-05-25T11:16:04Z

@regmibijay can you merge master to resolve conflicts

jreback · 2021-09-01T00:10:14Z

@regmibijay if you want to merge master can revive this

regmibijay · 2021-09-01T14:08:13Z

I will revive and work on this thread again, which release should I target it for 1.3.x or 1.4.x?

lithomas1 · 2021-09-08T02:34:07Z

1.4.0. 1.3 has already been released.

jreback · 2021-10-04T00:26:47Z

if you can merge master and update to comments

regmibijay · 2021-10-08T18:24:55Z

moved to #43932 because of merge complications

mzeitlin11 reviewed Apr 12, 2021

View reviewed changes

mzeitlin11 added Error Reporting Incorrect or improved errors from pandas Visualization plotting labels Apr 12, 2021

regmibijay commented Apr 12, 2021

View reviewed changes

charlesdong1991 reviewed Apr 12, 2021

View reviewed changes

regmibijay added 5 commits April 12, 2021 21:30

added new test to check if new valueerror is raised with invalid type

57b4415

new entry in bugfix

065e453

Added new test for issue #GH40781

27d13d3

Added new dtype datetime object for xlim and ylim

86a030b

Added new test for issue #GH40781

f3b78b4

jreback requested changes Apr 13, 2021

View reviewed changes

regmibijay added 3 commits April 25, 2021 16:10

simplified bug fix description

5c0ac1a

added support for np.datetime64 objs

be33fa3

test unit to check new valueerror

915fa52

regmibijay requested a review from jreback April 25, 2021 20:16

charlesdong1991 suggested changes Apr 26, 2021

View reviewed changes

regmibijay added 2 commits April 26, 2021 23:47

modified function description on _has_valid_lim_dtype

561d83e

parametrized test, removed unnecessary description

c00977a

regmibijay requested a review from charlesdong1991 April 26, 2021 21:51

charlesdong1991 reviewed Apr 27, 2021

View reviewed changes

added additional matplotlib import

e3b69df

ivanovmg reviewed Apr 30, 2021

View reviewed changes

regmibijay added 3 commits May 7, 2021 22:57

added importorskip for matplotlib for CI tests

6c1819e

fixed import with isort

463cc55

fixed code formatting with black

ae1a282

simonjayhawkins added the Needs Review label May 25, 2021

lithomas1 removed the Needs Review label Sep 8, 2021

regmibijay closed this Oct 8, 2021

		@@ -505,16 +505,35 @@ def _adorn_subplots(self):
		)

		for ax in self.axes:

		pytest.raises(ValueError, match="FAILED")


		def test_if_plotable_xlim_first_int() -> bool:

	"""checks if an iterable has float,int or datetime datatype """
	"""Check if an iterable has float, int or datetime datatype"""

Uh oh!

Raising ValueError when xlim and ylim contain non-int and non-float dtype elements #40889

Raising ValueError when xlim and ylim contain non-int and non-float dtype elements #40889

Uh oh!

Conversation

regmibijay commented Apr 11, 2021

Uh oh!

mzeitlin11 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mzeitlin11 Apr 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

regmibijay left a comment

Choose a reason for hiding this comment

Uh oh!

mzeitlin11 commented Apr 12, 2021

Uh oh!

regmibijay commented Apr 12, 2021

Uh oh!

mzeitlin11 commented Apr 12, 2021

Uh oh!

mzeitlin11 commented Apr 12, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

regmibijay commented Apr 13, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

mzeitlin11 Apr 12, 2021 •

edited

Loading