Skip to content

Raising ValueError when xlim and ylim contain non-int and non-float dtype elements #40889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from

Conversation

regmibijay
Copy link

Copy link
Member

@mzeitlin11 mzeitlin11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pr @regmibijay! Some comments, plus needs tests which hit the new code added

@@ -29,7 +29,8 @@ Fixed regressions
Bug fixes
~~~~~~~~~

-
- Bug in :func:`DataFrame.plot.line` was taking ``xlim`` and ``ylim`` kwargs even if they had string elements, now ``ValueError`` is raised if elements are not ``int`` or ``float`` dtype.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only need the entry in v1.3.0, can revert this

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be done

@@ -505,16 +505,35 @@ def _adorn_subplots(self):
)

for ax in self.axes:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to keep out unrelated changes if possible

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what was done here, do I add a newline?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff just shows that you added a newline where there wasn't one before. So if you remove it, this change will go away

@@ -753,7 +753,7 @@ Plotting
- Prevent warnings when matplotlib's ``constrained_layout`` is enabled (:issue:`25261`)
- Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``yerr`` while others didn't (partial fix of :issue:`39522`)
- Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``secondary_y`` and others use ``legend=False`` (:issue:`40044`)

- Bug in :func:`DataFrame.plot.line` was taking ``xlim`` and ``ylim`` kwargs even if they had string elements, now ``ValueError`` is raised if elements are not ``int`` or ``float`` dtype.
Copy link
Member

@mzeitlin11 mzeitlin11 Apr 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link to the issue here? (see above entries for examples of how to do that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is not specific to line right? Should it just reference :func:DataFrame.plot?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be done

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this is not specific to line right? Should it just reference :func:DataFrame.plot?

I have only tested it for DataFrame.plot(kind="line") and DataFrame.plot.line, so will reference Dataframe.plot then

" or integer datatype.\n"
"`xlim` has unsupported dtype"
f" {type(elem)} with value {elem}.\n"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic looks the same for both xlim and ylim, can you find a way to share it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing this with a lambda function

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sharing the message is nice, but would be great to also share the validation logic itself (eg the iteration and type checking). Also, I think generally it's discouraged to assign an anonymous function to a variable, if doing that better to be explicit and write a function (which could also be inline).

@mzeitlin11 mzeitlin11 added Error Reporting Incorrect or improved errors from pandas Visualization plotting labels Apr 12, 2021
Copy link
Author

@regmibijay regmibijay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed new changes addressing previous comments

@mzeitlin11
Copy link
Member

pushed new changes addressing previous comments

Can you please add tests which hit the added code?

@regmibijay
Copy link
Author

pushed new changes addressing previous comments

Can you please add tests which hit the added code?

I am not sure how to do that, I tested using an arbitrary dataset I created. Could you link me a documentation to pandas standard?

@mzeitlin11
Copy link
Member

pushed new changes addressing previous comments

Can you please add tests which hit the added code?

I am not sure how to do that, I tested using an arbitrary dataset I created. Could you link me a documentation to pandas standard?

This guide is extremely useful, please feel free to ask ask any questions that come up when reading it! https://pandas.pydata.org/docs/development/contributing.html

Also the precommit section will be useful, since that is currently failing on this pr.

@mzeitlin11
Copy link
Member

Outside of the guide, looking at some tests in pandas/tests/plotting can give an idea of testing style

@@ -753,7 +753,7 @@ Plotting
- Prevent warnings when matplotlib's ``constrained_layout`` is enabled (:issue:`25261`)
- Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``yerr`` while others didn't (partial fix of :issue:`39522`)
- Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``secondary_y`` and others use ``legend=False`` (:issue:`40044`)

- Bug in :func:`DataFrame.plot` was taking ``xlim`` and ``ylim`` kwargs even if they had string elements, now ``ValueError`` is raised if elements are not ``int`` or ``float`` dtype (:issue:`40781`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think matplotlib could also take datetime object, e.g. something like ax.set_xlim([datetime1, datetime2]), could you please check if it is case? If so, would be nice to make changes accordingly in the PR to reflect.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added datetime datatype to the filter too, I would love a feedback on the function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this simpler like was not raising for invalid dtyped elements to xlim/ylim

@regmibijay
Copy link
Author

Outside of the guide, looking at some tests in pandas/tests/plotting can give an idea of testing style

I did my best effort on writing the tests for individual cases. But it seems to have failed on CI, I would love a recommendation on this part too.

@@ -753,7 +753,7 @@ Plotting
- Prevent warnings when matplotlib's ``constrained_layout`` is enabled (:issue:`25261`)
- Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``yerr`` while others didn't (partial fix of :issue:`39522`)
- Bug in :func:`DataFrame.plot` was showing the wrong colors in the legend if the function was called repeatedly and some calls used ``secondary_y`` and others use ``legend=False`` (:issue:`40044`)

- Bug in :func:`DataFrame.plot` was taking ``xlim`` and ``ylim`` kwargs even if they had string elements, now ``ValueError`` is raised if elements are not ``int`` or ``float`` dtype (:issue:`40781`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this simpler like was not raising for invalid dtyped elements to xlim/ylim

@@ -489,6 +489,18 @@ def _post_plot_logic(self, ax, data):
"""Post process for each axes. Overridden in child classes"""
pass

def _has_iterable_only_float_int_or_datetime(self, list_iterable: list) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_has_valid_dtype

remove the double-doc-string

fully type list

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, small idea on the name, could you maybe use _has_valid_lim_dtype? this is only targeting on dtypes of xlim or ylim, right? @regmibijay

def _has_iterable_only_float_int_or_datetime(self, list_iterable: list) -> bool:
"""checks if an iterable has float,int or datetime datatype """
"""targeting #GH40781"""
from datetime import datetime as dt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't import here

"""targeting #GH40781"""
from datetime import datetime as dt
if (
all(([isinstance(x, dt) for x in list_iterable]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_* accessors

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried implementing is_datetime* , is_datetime_or_timedelta* etc and could not get it to work with raw datetime.datetime formats. Any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are existing implementations, so should be no need to implement new ones

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I tried to use those existing implementations, but could not get it to work with datetime objects being passed as xlim or ylim

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm, you meant non of those implementations of is_* accessors in https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/common.py could work with datetime objects? That sounds a bit weird, could you pls elaborate or maybe post a new issue to address this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try more samples and recreate an issue if that verifies with exact example soon!

if self.ylim is not None:
if not self._has_iterable_only_float_int_or_datetime(self.ylim):
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the routine to _validate_valid_dtype and just raise inside

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working on it currently, should fix soon.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the namespace accordingly

try:
df.plot(kind="line", x="A", xlim=[1, 2], ylim=[3, 4])
return True
except ValueError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never use try/except in tests like this see simliar tests on how to check for raising

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am looking into more pytesting, I have never written tests before so I would appreciate the guidance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, you have done a great job in writing those tests, great work!

I think what @jreback meant is to have something like:

msg = "error messages"
with pytest.raises(ValueError, match=msg):
    your_code_which_generates_error

There should be many references in the codebase, for instance, https://github.com/pandas-dev/pandas/blob/master/pandas/tests/plotting/frame/test_frame.py

I hope this helps and please let me know if you have other questions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, I have started working on this, I will push new changes soon.

pytest.raises(ValueError, match="FAILED")


def test_if_plotable_xlim_first_int() -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameterize the tests, you can put with the existing ones, e.g. you only need 1 additional parameterized test

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added new test adressing your recommendations.

@regmibijay regmibijay requested a review from jreback April 25, 2021 20:16
Copy link
Member

@charlesdong1991 charlesdong1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the update, good job! I have left few comments.

@@ -489,6 +490,17 @@ def _post_plot_logic(self, ax, data):
"""Post process for each axes. Overridden in child classes"""
pass

def _has_valid_lim_dtype(self, list_iterable: list) -> bool:
"""checks if an iterable has float,int or datetime datatype """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""checks if an iterable has float,int or datetime datatype """
"""Check if an iterable has float, int or datetime datatype"""

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requested change merged

@@ -489,6 +490,17 @@ def _post_plot_logic(self, ax, data):
"""Post process for each axes. Overridden in child classes"""
pass

def _has_valid_lim_dtype(self, list_iterable: list) -> bool:
"""checks if an iterable has float,int or datetime datatype """
"""targeting #GH40781"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""targeting #GH40781"""
# GH40781

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requested change merged

from datetime import datetime as dt
import numpy as np

"""targeting #GH40781"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the line has been removed

Comment on lines 1 to 4
import pytest
import pandas as pd
from datetime import datetime as dt
import numpy as np
Copy link
Member

@charlesdong1991 charlesdong1991 Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please use isort to resort the imports, otherwise, I think black check will fail.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have formatted the import list with isort

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably need to import matplotlib in order to fix the failed tests

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added an import statement

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regmibijay I see that matplotlib import is still missing.
Can you double check that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having trouble understanding
ModuleNotFoundError: No module named 'matplotlib' in the CI tests, I import matplotlib normally. Is there something very obvious I am missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regmibijay
From the current test module, you actually do not need matplotlib to be imported.

The reason the CI fails is that because matplotlib is simply not installed for all pipelines.
See pandas/ci/deps for dependencies.

In order to ensure that your test module is skipped in the pipelines, which do not have matplotlib installed,
then you need to add pytest.imporotorskip to the top and probably mark the tests as slow.

pytest.importorskip("matplotlib")
pytestmark = pytest.mark.slow

See pandas/tests/plotting/test_style.py, for instance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed new commit with changes suggested by you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-commit is complaining about unsorted imports and indentation is some tests. Please look through the CI failure output and correct accordingly (black and isort utils will be helpful here).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ran isort and black and fixed issues CI was complaining about. I have no idea why black did not pick those issues up in my IDLE in first place.



def test_axis_lim_invalid():
global df
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ehh, try to avoid using global.

Comment on lines 19 to 24
"""
supplying ranges with str, datetime and
mixed int,str and float,str dtype to raise
ValueError. Valid dtypes are np.datetime64
int, and float.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally there is no need to write docstrings for tests, could you please remove?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted for future PRs and has been removed from current accordingly

Comment on lines 25 to 31
lim = [
["1", "3"],
[dt.now(), dt.now()],
[1, "2"],
[0.1, "0.2"],
[np.datetime64(dt.now()), dt.now()]
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you try to parametrize?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are parametrized now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that @charlesdong1991 was referring to use of pytest.mark.parametrize.
Or did you forget to push new changes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ivanovmg, I have used following parametrization to create a custom dataframe, and I can see I have comitted it in latest commit. Can you please confirm it?

@pytest.mark.parametrize("df, lim", [
    (
        DataFrame(
            {
                "A" : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                "B": [dt.now() for i in range(0, 10)]
            }
        ),
        [
            ["1", "3"],
            [dt.now(), dt.now()],
            [1, "2"],
            [0.1, "0.2"],
            [np.datetime64(dt.now()), dt.now()]
        ]
    )
]
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I confirm. It was a mistake on my end - I looked at an outdated version.

Comment on lines 45 to 49
lim = [
[1, 3],
[np.datetime64(dt.now()), np.datetime64(dt.now())],
[0.1, 0.2],
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you try to parametrize?

Comment on lines 38 to 44
global df
"""
supplying ranges with
valid dtypes: np.datetime64
int, and float, this test
should not raise errors.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Member

@charlesdong1991 charlesdong1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for updating, the failed tests seem to be related to your change.

I put a comment on imports which might help fix your tests failures.

Copy link
Member

@ivanovmg ivanovmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comments.

Comment on lines 1 to 4
import pytest
import pandas as pd
from datetime import datetime as dt
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@regmibijay I see that matplotlib import is still missing.
Can you double check that?

Comment on lines 25 to 31
lim = [
["1", "3"],
[dt.now(), dt.now()],
[1, "2"],
[0.1, "0.2"],
[np.datetime64(dt.now()), dt.now()]
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that @charlesdong1991 was referring to use of pytest.mark.parametrize.
Or did you forget to push new changes?



def test_axis_lim_invalid():
global df
Copy link
Member

@ivanovmg ivanovmg Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that you do not use global df.
Instead, you may want to make a fixture returning your dataframe, and pass this fixture into your test functions.
Alternatively, you may create a class with your test methods and define an attribute with the dataframe (self.df), which you can reuse in the test methods.
Or probably you can just leave it as it is, but without global.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used parametrization in pytest to create custom dataframe and ended up not having to use globals at all.

Copy link
Author

@regmibijay regmibijay Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see: e3b69df and c00977a

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. I was looking at an outdated version.
Then it is not clear for me why the tests are failing.

@simonjayhawkins
Copy link
Member

@regmibijay can you merge master to resolve conflicts

@jreback
Copy link
Contributor

jreback commented Sep 1, 2021

@regmibijay if you want to merge master can revive this

@regmibijay
Copy link
Author

I will revive and work on this thread again, which release should I target it for 1.3.x or 1.4.x?

@lithomas1
Copy link
Member

1.4.0. 1.3 has already been released.

@jreback
Copy link
Contributor

jreback commented Oct 4, 2021

if you can merge master and update to comments

@regmibijay regmibijay closed this Oct 8, 2021
@regmibijay
Copy link
Author

regmibijay commented Oct 8, 2021

moved to #43932 because of merge complications

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Visualization plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: xlim and ylim not restricting plot area
7 participants