-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Raising ValueError when xlim and ylim contain non-int and non-float dtype elements #40889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
regmibijay
commented
Apr 11, 2021
- closes BUG: xlim and ylim not restricting plot area #40781
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pr @regmibijay! Some comments, plus needs tests which hit the new code added
doc/source/whatsnew/v1.2.4.rst
Outdated
@@ -29,7 +29,8 @@ Fixed regressions | |||
Bug fixes | |||
~~~~~~~~~ | |||
|
|||
- | |||
- Bug in :func:`DataFrame.plot.line` was taking ``xlim`` and ``ylim`` kwargs even if they had string elements, now ``ValueError`` is raised if elements are not ``int`` or ``float`` dtype. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only need the entry in v1.3.0, can revert this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be done
pandas/plotting/_matplotlib/core.py
Outdated
@@ -505,16 +505,35 @@ def _adorn_subplots(self): | |||
) | |||
|
|||
for ax in self.axes: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best to keep out unrelated changes if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what was done here, do I add a newline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This diff just shows that you added a newline where there wasn't one before. So if you remove it, this change will go away
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -753,7 +753,7 @@ Plotting | |||
- Prevent warnings when matplotlib's ``constrained_layout`` is enabled (:issue:`25261`) | |||
- Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``yerr`` while others didn't (partial fix of :issue:`39522`) | |||
- Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``secondary_y`` and others use ``legend=False`` (:issue:`40044`) | |||
|
|||
- Bug in :func:`DataFrame.plot.line` was taking ``xlim`` and ``ylim`` kwargs even if they had string elements, now ``ValueError`` is raised if elements are not ``int`` or ``float`` dtype. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you link to the issue here? (see above entries for examples of how to do that)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is not specific to line
right? Should it just reference :func:DataFrame.plot
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is not specific to
line
right? Should it just reference :func:DataFrame.plot
?
I have only tested it for DataFrame.plot(kind="line")
and DataFrame.plot.line
, so will reference Dataframe.plot
then
pandas/plotting/_matplotlib/core.py
Outdated
" or integer datatype.\n" | ||
"`xlim` has unsupported dtype" | ||
f" {type(elem)} with value {elem}.\n" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic looks the same for both xlim
and ylim
, can you find a way to share it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressing this with a lambda function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sharing the message is nice, but would be great to also share the validation logic itself (eg the iteration and type checking). Also, I think generally it's discouraged to assign an anonymous function to a variable, if doing that better to be explicit and write a function (which could also be inline).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pushed new changes addressing previous comments
Can you please add tests which hit the added code? |
I am not sure how to do that, I tested using an arbitrary dataset I created. Could you link me a documentation to pandas standard? |
This guide is extremely useful, please feel free to ask ask any questions that come up when reading it! https://pandas.pydata.org/docs/development/contributing.html Also the precommit section will be useful, since that is currently failing on this pr. |
Outside of the guide, looking at some tests in |
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -753,7 +753,7 @@ Plotting | |||
- Prevent warnings when matplotlib's ``constrained_layout`` is enabled (:issue:`25261`) | |||
- Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``yerr`` while others didn't (partial fix of :issue:`39522`) | |||
- Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``secondary_y`` and others use ``legend=False`` (:issue:`40044`) | |||
|
|||
- Bug in :func:`DataFrame.plot` was taking ``xlim`` and ``ylim`` kwargs even if they had string elements, now ``ValueError`` is raised if elements are not ``int`` or ``float`` dtype (:issue:`40781`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think matplotlib could also take datetime object, e.g. something like ax.set_xlim([datetime1, datetime2])
, could you please check if it is case? If so, would be nice to make changes accordingly in the PR to reflect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added datetime datatype to the filter too, I would love a feedback on the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this simpler like was not raising for invalid dtyped elements to xlim/ylim
I did my best effort on writing the tests for individual cases. But it seems to have failed on CI, I would love a recommendation on this part too. |
doc/source/whatsnew/v1.3.0.rst
Outdated
@@ -753,7 +753,7 @@ Plotting | |||
- Prevent warnings when matplotlib's ``constrained_layout`` is enabled (:issue:`25261`) | |||
- Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``yerr`` while others didn't (partial fix of :issue:`39522`) | |||
- Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``secondary_y`` and others use ``legend=False`` (:issue:`40044`) | |||
|
|||
- Bug in :func:`DataFrame.plot` was taking ``xlim`` and ``ylim`` kwargs even if they had string elements, now ``ValueError`` is raised if elements are not ``int`` or ``float`` dtype (:issue:`40781`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this simpler like was not raising for invalid dtyped elements to xlim/ylim
pandas/plotting/_matplotlib/core.py
Outdated
@@ -489,6 +489,18 @@ def _post_plot_logic(self, ax, data): | |||
"""Post process for each axes. Overridden in child classes""" | |||
pass | |||
|
|||
def _has_iterable_only_float_int_or_datetime(self, list_iterable: list) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_has_valid_dtype
remove the double-doc-string
fully type list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, small idea on the name, could you maybe use _has_valid_lim_dtype
? this is only targeting on dtypes of xlim
or ylim
, right? @regmibijay
pandas/plotting/_matplotlib/core.py
Outdated
def _has_iterable_only_float_int_or_datetime(self, list_iterable: list) -> bool: | ||
"""checks if an iterable has float,int or datetime datatype """ | ||
"""targeting #GH40781""" | ||
from datetime import datetime as dt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't import here
pandas/plotting/_matplotlib/core.py
Outdated
"""targeting #GH40781""" | ||
from datetime import datetime as dt | ||
if ( | ||
all(([isinstance(x, dt) for x in list_iterable])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_* accessors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried implementing is_datetime*
, is_datetime_or_timedelta*
etc and could not get it to work with raw datetime.datetime
formats. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are existing implementations, so should be no need to implement new ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean I tried to use those existing implementations, but could not get it to work with datetime
objects being passed as xlim
or ylim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emm, you meant non of those implementations of is_* accessors in https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/common.py
could work with datetime
objects? That sounds a bit weird, could you pls elaborate or maybe post a new issue to address this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try more samples and recreate an issue if that verifies with exact example soon!
if self.ylim is not None: | ||
if not self._has_iterable_only_float_int_or_datetime(self.ylim): | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the routine to _validate_valid_dtype and just raise inside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
working on it currently, should fix soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the namespace accordingly
try: | ||
df.plot(kind="line", x="A", xlim=[1, 2], ylim=[3, 4]) | ||
return True | ||
except ValueError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never use try/except in tests like this see simliar tests on how to check for raising
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am looking into more pytesting, I have never written tests before so I would appreciate the guidance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, you have done a great job in writing those tests, great work!
I think what @jreback meant is to have something like:
msg = "error messages"
with pytest.raises(ValueError, match=msg):
your_code_which_generates_error
There should be many references in the codebase, for instance, https://github.com/pandas-dev/pandas/blob/master/pandas/tests/plotting/frame/test_frame.py
I hope this helps and please let me know if you have other questions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, I have started working on this, I will push new changes soon.
pytest.raises(ValueError, match="FAILED") | ||
|
||
|
||
def test_if_plotable_xlim_first_int() -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parameterize the tests, you can put with the existing ones, e.g. you only need 1 additional parameterized test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added new test adressing your recommendations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the update, good job! I have left few comments.
pandas/plotting/_matplotlib/core.py
Outdated
@@ -489,6 +490,17 @@ def _post_plot_logic(self, ax, data): | |||
"""Post process for each axes. Overridden in child classes""" | |||
pass | |||
|
|||
def _has_valid_lim_dtype(self, list_iterable: list) -> bool: | |||
"""checks if an iterable has float,int or datetime datatype """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""checks if an iterable has float,int or datetime datatype """ | |
"""Check if an iterable has float, int or datetime datatype""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requested change merged
pandas/plotting/_matplotlib/core.py
Outdated
@@ -489,6 +490,17 @@ def _post_plot_logic(self, ax, data): | |||
"""Post process for each axes. Overridden in child classes""" | |||
pass | |||
|
|||
def _has_valid_lim_dtype(self, list_iterable: list) -> bool: | |||
"""checks if an iterable has float,int or datetime datatype """ | |||
"""targeting #GH40781""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""targeting #GH40781""" | |
# GH40781 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requested change merged
from datetime import datetime as dt | ||
import numpy as np | ||
|
||
"""targeting #GH40781""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the line has been removed
import pytest | ||
import pandas as pd | ||
from datetime import datetime as dt | ||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please use isort
to resort the imports, otherwise, I think black check will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have formatted the import list with isort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you probably need to import matplotlib in order to fix the failed tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added an import statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@regmibijay I see that matplotlib import is still missing.
Can you double check that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having trouble understanding
ModuleNotFoundError: No module named 'matplotlib'
in the CI tests, I import matplotlib
normally. Is there something very obvious I am missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@regmibijay
From the current test module, you actually do not need matplotlib to be imported.
The reason the CI fails is that because matplotlib is simply not installed for all pipelines.
See pandas/ci/deps
for dependencies.
In order to ensure that your test module is skipped in the pipelines, which do not have matplotlib installed,
then you need to add pytest.imporotorskip
to the top and probably mark the tests as slow
.
pytest.importorskip("matplotlib")
pytestmark = pytest.mark.slow
See pandas/tests/plotting/test_style.py
, for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed new commit with changes suggested by you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-commit is complaining about unsorted imports and indentation is some tests. Please look through the CI failure output and correct accordingly (black and isort utils will be helpful here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ran isort
and black
and fixed issues CI was complaining about. I have no idea why black did not pick those issues up in my IDLE in first place.
|
||
|
||
def test_axis_lim_invalid(): | ||
global df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ehh, try to avoid using global.
""" | ||
supplying ranges with str, datetime and | ||
mixed int,str and float,str dtype to raise | ||
ValueError. Valid dtypes are np.datetime64 | ||
int, and float. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally there is no need to write docstrings for tests, could you please remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noted for future PRs and has been removed from current accordingly
lim = [ | ||
["1", "3"], | ||
[dt.now(), dt.now()], | ||
[1, "2"], | ||
[0.1, "0.2"], | ||
[np.datetime64(dt.now()), dt.now()] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you try to parametrize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests are parametrized now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that @charlesdong1991 was referring to use of pytest.mark.parametrize
.
Or did you forget to push new changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ivanovmg, I have used following parametrization to create a custom dataframe, and I can see I have comitted it in latest commit. Can you please confirm it?
@pytest.mark.parametrize("df, lim", [
(
DataFrame(
{
"A" : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
"B": [dt.now() for i in range(0, 10)]
}
),
[
["1", "3"],
[dt.now(), dt.now()],
[1, "2"],
[0.1, "0.2"],
[np.datetime64(dt.now()), dt.now()]
]
)
]
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I confirm. It was a mistake on my end - I looked at an outdated version.
lim = [ | ||
[1, 3], | ||
[np.datetime64(dt.now()), np.datetime64(dt.now())], | ||
[0.1, 0.2], | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you try to parametrize?
global df | ||
""" | ||
supplying ranges with | ||
valid dtypes: np.datetime64 | ||
int, and float, this test | ||
should not raise errors. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for updating, the failed tests seem to be related to your change.
I put a comment on imports which might help fix your tests failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comments.
import pytest | ||
import pandas as pd | ||
from datetime import datetime as dt | ||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@regmibijay I see that matplotlib import is still missing.
Can you double check that?
lim = [ | ||
["1", "3"], | ||
[dt.now(), dt.now()], | ||
[1, "2"], | ||
[0.1, "0.2"], | ||
[np.datetime64(dt.now()), dt.now()] | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that @charlesdong1991 was referring to use of pytest.mark.parametrize
.
Or did you forget to push new changes?
|
||
|
||
def test_axis_lim_invalid(): | ||
global df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that you do not use global df
.
Instead, you may want to make a fixture returning your dataframe, and pass this fixture into your test functions.
Alternatively, you may create a class with your test methods and define an attribute with the dataframe (self.df
), which you can reuse in the test methods.
Or probably you can just leave it as it is, but without global
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used parametrization in pytest to create custom dataframe and ended up not having to use globals at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. I was looking at an outdated version.
Then it is not clear for me why the tests are failing.
@regmibijay can you merge master to resolve conflicts |
@regmibijay if you want to merge master can revive this |
I will revive and work on this thread again, which release should I target it for 1.3.x or 1.4.x? |
1.4.0. 1.3 has already been released. |
if you can merge master and update to comments |
moved to #43932 because of merge complications |