Skip to content

TST: add method/dtype coverage to str-accessor; precursor to #23167 #23582

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Nov 28, 2018
Merged
82 changes: 81 additions & 1 deletion pandas/conftest.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
from datetime import date, time, timedelta
from decimal import Decimal
import importlib
import os

Expand All @@ -8,7 +10,7 @@
import pytest
from pytz import utc

from pandas.compat import PY3
from pandas.compat import PY3, u
import pandas.util._test_decorators as td

import pandas as pd
Expand Down Expand Up @@ -513,6 +515,84 @@ def any_numpy_dtype(request):
return request.param


# categoricals are handled separately
_any_skipna_inferred_dtype = [
('string', ['a', np.nan, 'c']),
('unicode' if not PY3 else 'string', [u('a'), np.nan, u('c')]),
('bytes' if PY3 else 'string', [b'a', np.nan, b'c']),
('empty', [np.nan, np.nan, np.nan]),
('empty', []),
('mixed-integer', ['a', np.nan, 2]),
('mixed', ['a', np.nan, 2.0]),
('floating', [1.0, np.nan, 2.0]),
('integer', [1, np.nan, 2]),
('mixed-integer-float', [1, np.nan, 2.0]),
('decimal', [Decimal(1), np.nan, Decimal(2)]),
('boolean', [True, np.nan, False]),
('datetime64', [np.datetime64('2013-01-01'), np.nan,
np.datetime64('2018-01-01')]),
('datetime', [pd.Timestamp('20130101'), np.nan, pd.Timestamp('20180101')]),
('date', [date(2013, 1, 1), np.nan, date(2018, 1, 1)]),
# The following two dtypes are commented out due to GH 23554
# ('complex', [1 + 1j, np.nan, 2 + 2j]),
# ('timedelta64', [np.timedelta64(1, 'D'),
# np.nan, np.timedelta64(2, 'D')]),
('timedelta', [timedelta(1), np.nan, timedelta(2)]),
('time', [time(1), np.nan, time(2)]),
('period', [pd.Period(2013), pd.NaT, pd.Period(2018)]),
('interval', [pd.Interval(0, 1), np.nan, pd.Interval(0, 2)])]
ids, _ = zip(*_any_skipna_inferred_dtype) # use inferred type as fixture-id


@pytest.fixture(params=_any_skipna_inferred_dtype, ids=ids)
def any_skipna_inferred_dtype(request):
"""
Fixture for all inferred dtypes from _libs.lib.infer_dtype

The covered (inferred) types are:
* 'string'
* 'unicode' (if PY2)
* 'empty'
* 'bytes' (if PY3)
* 'mixed'
* 'mixed-integer'
* 'mixed-integer-float'
* 'floating'
* 'integer'
* 'decimal'
* 'boolean'
* 'datetime64'
* 'datetime'
* 'date'
* 'timedelta'
* 'time'
* 'period'
* 'interval'

Returns
-------
inferred_dtype : str
The string for the inferred dtype from _libs.lib.infer_dtype
values : np.ndarray
An array of object dtype that will be inferred to have
`inferred_dtype`

Examples
--------
>>> import pandas._libs.lib as lib
>>>
>>> def test_something(any_skipna_inferred_dtype):
... inferred_dtype, values = any_skipna_inferred_dtype
... # will pass
... assert lib.infer_dtype(values, skipna=True) == inferred_dtype
"""
inferred_dtype, values = request.param
values = np.array(values, dtype=object) # object dtype to avoid casting

# correctness of inference tested in tests/dtypes/test_inference.py
return inferred_dtype, values


@pytest.fixture
def mock():
"""
Expand Down
7 changes: 7 additions & 0 deletions pandas/tests/dtypes/test_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -495,6 +495,13 @@ class TestTypeInference(object):
class Dummy():
pass

def test_inferred_dtype_fixture(self, any_skipna_inferred_dtype):
# see pandas/conftest.py
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its nice that you added this, but you didn't remove any code. if you are not going to do that , then not much point of putting the fixture in conftest.py in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
This is coming out of your review above:

I'm asking if you want me to move this particular fixture to pandas/conftest.py and then test it within the dtype tests (because this is effectively a dtype thing).

Yes

And I don't get how adding this fixture is tied to code removal? I'm testing the .str-accessor on all the inferred dtypes to make sure it raises correctly, that's what I mainly need this fixture for.

That I'm testing the validity of the fixture in test_inference.py is for consistency, because it belongs there thematically (but could otherwise test that directly in the fixture constructor).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
This fixture would be a perfect candidate for splitting off a PR, but then, I'm afraid you're gonna say it doesn't do anything interesting (yet).

Do you want me to:

  • split it up, and have the fixture being unusued until this PR is merged,
  • or do want me to keep things logically together (i.e. in this PR)?

inferred_dtype, values = any_skipna_inferred_dtype

# make sure the inferred dtype of the fixture is as requested
assert inferred_dtype == lib.infer_dtype(values, skipna=True)

def test_length_zero(self):
result = lib.infer_dtype(np.array([], dtype='i4'))
assert result == 'integer'
Expand Down
76 changes: 0 additions & 76 deletions pandas/tests/series/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -602,82 +602,6 @@ def f():
ordered=True))
tm.assert_series_equal(result, expected)

def test_str_accessor_api_for_categorical(self):
# https://github.com/pandas-dev/pandas/issues/10661
from pandas.core.strings import StringMethods
s = Series(list('aabb'))
s = s + " " + s
c = s.astype('category')
assert isinstance(c.str, StringMethods)

# str functions, which need special arguments
special_func_defs = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The full list of these functions & arg-combinations is reflected in the any_string_method-fixture of test_strings.py now.

('cat', (list("zyxw"),), {"sep": ","}),
('center', (10,), {}),
('contains', ("a",), {}),
('count', ("a",), {}),
('decode', ("UTF-8",), {}),
('encode', ("UTF-8",), {}),
('endswith', ("a",), {}),
('extract', ("([a-z]*) ",), {"expand": False}),
('extract', ("([a-z]*) ",), {"expand": True}),
('extractall', ("([a-z]*) ",), {}),
('find', ("a",), {}),
('findall', ("a",), {}),
('index', (" ",), {}),
('ljust', (10,), {}),
('match', ("a"), {}), # deprecated...
('normalize', ("NFC",), {}),
('pad', (10,), {}),
('partition', (" ",), {"expand": False}), # not default
('partition', (" ",), {"expand": True}), # default
('repeat', (3,), {}),
('replace', ("a", "z"), {}),
('rfind', ("a",), {}),
('rindex', (" ",), {}),
('rjust', (10,), {}),
('rpartition', (" ",), {"expand": False}), # not default
('rpartition', (" ",), {"expand": True}), # default
('slice', (0, 1), {}),
('slice_replace', (0, 1, "z"), {}),
('split', (" ",), {"expand": False}), # default
('split', (" ",), {"expand": True}), # not default
('startswith', ("a",), {}),
('wrap', (2,), {}),
('zfill', (10,), {})
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added a handful more combinations to test

_special_func_names = [f[0] for f in special_func_defs]

# * get, join: they need a individual elements of type lists, but
# we can't make a categorical with lists as individual categories.
# -> `s.str.split(" ").astype("category")` will error!
# * `translate` has different interfaces for py2 vs. py3
_ignore_names = ["get", "join", "translate"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I also got rid of these previously "ignored" methods - meaning they're being tested now as well.


str_func_names = [f for f in dir(s.str) if not (
f.startswith("_") or
f in _special_func_names or
f in _ignore_names)]

func_defs = [(f, (), {}) for f in str_func_names]
func_defs.extend(special_func_defs)

for func, args, kwargs in func_defs:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop is now explicitly parametrized.

res = getattr(c.str, func)(*args, **kwargs)
exp = getattr(s.str, func)(*args, **kwargs)

if isinstance(res, DataFrame):
tm.assert_frame_equal(res, exp)
else:
tm.assert_series_equal(res, exp)

invalid = Series([1, 2, 3]).astype('category')
msg = "Can only use .str accessor with string"

with pytest.raises(AttributeError, match=msg):
invalid.str
assert not hasattr(invalid, 'str')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is fully tested ~100x as thoroughly by test_strings.py::TestStringMethods::test_api_per_dtype


def test_dt_accessor_api_for_categorical(self):
# https://github.com/pandas-dev/pandas/issues/10661
from pandas.core.indexes.accessors import Properties
Expand Down
Loading