-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: optimized median func when bottleneck not present #16509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
23303e0
8b2fd36
3e2d1ed
227d8c4
23bbe73
53a1ccc
c053b75
2735a7e
6329118
21160a0
21da28f
afc61fd
1123477
3025f79
5b71ee1
f25a62a
4583e69
027a2b2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,7 @@ | |
from distutils.version import LooseVersion | ||
|
||
import numpy as np | ||
from pandas import compat | ||
from pandas import compat, _np_version_under1p9 | ||
from pandas._libs import tslib, algos, lib | ||
from pandas.core.dtypes.common import ( | ||
_get_dtype, | ||
|
@@ -344,12 +344,22 @@ def nanmedian(values, axis=None, skipna=True): | |
|
||
values, mask, dtype, dtype_max = _get_values(values, skipna) | ||
|
||
if not skipna: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why are you adding this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The tests compare nanmedian to numpy's median function. If this special case were not there, this function would return a non-null result for input with null values and skipna turned off. Here is a debug log demonstrating this: https://gist.github.com/rohanp/1082ad5a5e199f6b1cb8ade965c4519f There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so fix the test to do the correct comparison There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't the desired behavior for nanmedian to be like median when skipna is turned off though? If not, then what is the skipna flag supposed to do? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, so use the |
||
return _wrap_results(np.median(values, axis=axis), dtype) | ||
|
||
def get_median(x): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and this will break in numpy < 1.9, you need to do this conditionally. |
||
mask = notna(x) | ||
if not skipna and not mask.all(): | ||
return np.nan | ||
return algos.median(_values_from_object(x[mask])) | ||
|
||
if _np_version_under1p9: | ||
agg1d = get_median | ||
aggaxis = lambda values, axis: np.apply_along_axis(get_median, axis, values) | ||
else: | ||
agg1d = np.nanmedian | ||
aggaxis = np.nanmedian | ||
|
||
if not is_float_dtype(values): | ||
values = values.astype('f8') | ||
values[mask] = np.nan | ||
|
@@ -363,8 +373,7 @@ def get_median(x): | |
if values.ndim > 1: | ||
# there's a non-empty array to apply over otherwise numpy raises | ||
if notempty: | ||
return _wrap_results( | ||
np.apply_along_axis(get_median, axis, values), dtype) | ||
return _wrap_results(aggaxis(values, axis), dtype) | ||
|
||
# must return the correct shape, but median is not defined for the | ||
# empty set so return nans of shape "everything but the passed axis" | ||
|
@@ -377,7 +386,7 @@ def get_median(x): | |
return _wrap_results(ret, dtype) | ||
|
||
# otherwise return a scalar value | ||
return _wrap_results(get_median(values) if notempty else np.nan, dtype) | ||
return _wrap_results(agg1d(values) if notempty else np.nan, dtype) | ||
|
||
|
||
def _get_counts_nanvar(mask, axis, ddof, dtype=float): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use
_np_version_under1p19
which are already definedThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then define
then the code is almost the same as before, just replace with
agg1d
andaggaxis