Skip to content

NotImplementedError error when using z_from_p with pandas DataFrame #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eldobbins opened this issue Mar 31, 2021 · 12 comments
Closed

Comments

@eldobbins
Copy link

I had code that was perhaps broken by the latest release.

It seems like I used to be able to use Series from a pandas DataFrame as arguments to gsw.z_from_p(). I.e. this worked:

df = pd.DataFrame({'pressure': [0, 10, 20],
                   'latitude': [70, 70, 70]})
df['depth'] = -1 * gsw.z_from_p(df['pressure'], 
                                df['latitude'])
df
pressure latitude depth
0 70 0.000000
10 70 9.898516
20 70 19.796552

but with v3.4.0, the same code gives me this error.

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-3-54c6a5ed8806> in <module>
      1 df['depth'] = -1 * gsw.z_from_p(df['pressure'], 
----> 2                                 df['latitude'])

/opt/conda/envs/py37/lib/python3.7/site-packages/gsw/_fixed_wrapped_ufuncs.py in z_from_p(p, lat, geo_strf_dyn_height, sea_surface_geopotential)
     13 _z_from_p = z_from_p
     14 def z_from_p(p, lat, geo_strf_dyn_height=0, sea_surface_geopotential=0):
---> 15     return _z_from_p(p, lat, geo_strf_dyn_height, sea_surface_geopotential)
     16 z_from_p.__doc__ = _z_from_p.__doc__
     17 

/opt/conda/envs/py37/lib/python3.7/site-packages/gsw/_utilities.py in wrapper(*args, **kw)
     60             kw['p'] = newargs.pop()
     61 
---> 62         ret = f(*newargs, **kw)
     63 
     64         if isinstance(ret, tuple):

/opt/conda/envs/py37/lib/python3.7/site-packages/gsw/_wrapped_ufuncs.py in z_from_p(p, lat, geo_strf_dyn_height, sea_surface_geopotential)
   4424 
   4425     """
-> 4426     return _gsw_ufuncs.z_from_p(p, lat, geo_strf_dyn_height, sea_surface_geopotential)

/opt/conda/envs/py37/lib/python3.7/site-packages/pandas/core/generic.py in __array_ufunc__(self, ufunc, method, *inputs, **kwargs)
   1933         self, ufunc: Callable, method: str, *inputs: Any, **kwargs: Any
   1934     ):
-> 1935         return arraylike.array_ufunc(self, ufunc, method, *inputs, **kwargs)
   1936 
   1937     # ideally we would define this to avoid the getattr checks, but

/opt/conda/envs/py37/lib/python3.7/site-packages/pandas/core/arraylike.py in array_ufunc(self, ufunc, method, *inputs, **kwargs)
    284             raise NotImplementedError(
    285                 "Cannot apply ufunc {} to mixed DataFrame and Series "
--> 286                 "inputs.".format(ufunc)
    287             )
    288         axes = self.axes

NotImplementedError: Cannot apply ufunc <ufunc 'z_from_p'> to mixed DataFrame and Series inputs.

This works:

df['depth'] = -1 * gsw.z_from_p(df['pressure'].values, 
                                df['latitude'].values)

What is the appropriate behavior here?

Thanks,
Liz

@eldobbins
Copy link
Author

eldobbins commented Mar 31, 2021

This works, so I think the problem is with z_from_p

df = pd.DataFrame({'pressure': [0, 10, 20],
                   'latitude': [70, 70, 70],
                   'longitude': [-125, -125, -125],
                   'salinity': [32, 33, 34]})

df['Absolute_Salinity'] = gsw.conversions.SA_from_SP(df['salinity'],
                                                     df['pressure'] - 10.1325,
                                                     df['longitude'], 
                                                     df['latitude'])
df
pressure latitude longitude salinity Absolute_Salinity
0 70 -125 32 32.154535
10 70 -125 33 33.158960
20 70 -125 34 34.163360

@eldobbins eldobbins changed the title NotImplementedError error when using with pandas DataFrame NotImplementedError error when using z_from_p with pandas DataFrame Apr 1, 2021
@efiring
Copy link
Member

efiring commented Apr 1, 2021

It works with an analogous situation using xarray, and we test with xarray but not Pandas. Why it works for one function and not another with Pandas, I don't know. The error is coming from inside Pandas.
It looks like the error message from Pandas is misleading, and the real problem is that its __array_ufunc__ implementation can't handle a mixture of pd.Series and np.ndarray arguments. In the case of z_from_p, the arguments it is choking on are typed as follows:

ipdb> print([type(x) for x in inputs])
[<class 'pandas.core.series.Series'>, <class 'pandas.core.series.Series'>, <class 'numpy.ndarray'>, <class 'numpy.ndarray'>]

This seems to me to be a bug, or at least a major shortcoming, in pd.__array_ufunc__. That's consistent with the "NotImplementedError".

@eldobbins
Copy link
Author

I'm curious about the phrase "mixed DataFrame and Series inputs." In my example, I'd think that both df['pressure'] and df['latitude'] are Series.

@efiring
Copy link
Member

efiring commented Apr 1, 2021

They are Series. That's what I showed in the previous comment. The problem is that z_from_p has two optional arguments which default to zero, but which could be arrays. They have been converted to ndarrays by the point at which the error occurs. The error message is incorrect. The case that is not implemented is not "mixed DataFrame and Series" but mixed Series and numpy.ndarray.

@efiring
Copy link
Member

efiring commented Apr 1, 2021

Strangely, it works if only one of the two required arguments is an ndarray:

In [49]: df = pd.DataFrame({'pressure': [0, 10, 20],
    ...:                    'latitude': [70, 70, 70]})
    ...: df['depth'] = -1 * gsw.z_from_p(df['pressure'].values,
    ...:                                 df['latitude'])

In [50]: df
Out[50]:
   pressure  latitude      depth
0         0        70   0.000000
1        10        70   9.898516
2        20        70  19.796552

(Same with only the second argument being an ndarray.)

@efiring
Copy link
Member

efiring commented Apr 1, 2021

@DocOtak found this bug last month and created an issue for it on the Pandas site: pandas-dev/pandas#39853.

@DocOtak
Copy link
Contributor

DocOtak commented Apr 1, 2021

Last time I looked at the pandas code (when I filed the bug over there), their __array_ufunc__ method was written with the assumption that ufuncs could only be up to 2 argument (vs our usual 4), which is true for all the builtin numpy ufuncs I think. I was heading off to sea at the time so didn't look too hard at things, but I think our solution was to just use the underlying ndarray (the .values value) when putting things though gsw funcs.

@ocefpaf
Copy link
Member

ocefpaf commented Sep 13, 2022

Looks like there is a PR upstream to fix pandas-dev/pandas#39853.

xref.: pandas-dev/pandas#48280

@DocOtak
Copy link
Contributor

DocOtak commented Sep 13, 2022

Woah, I hope it works.

@DocOtak
Copy link
Contributor

DocOtak commented Oct 11, 2022

The upstream fix has landed and should be included in pandas 1.6.

@efiring
Copy link
Member

efiring commented Oct 11, 2022

Excellent, thank you!

@ocefpaf
Copy link
Member

ocefpaf commented Apr 10, 2023

This one can be closed now. Pandas 2.0 does support this and we added a regression test in #126.

@ocefpaf ocefpaf closed this as completed Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants