Skip to content

DataArray.sum does not respect dtype keyword #1838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gerritholl opened this issue Jan 18, 2018 · 2 comments
Closed

DataArray.sum does not respect dtype keyword #1838

gerritholl opened this issue Jan 18, 2018 · 2 comments
Labels

Comments

@gerritholl
Copy link
Contributor

Code Sample, a copy-pastable example if possible

# Your code here
da = xarray.DataArray(arange(5, dtype="i2"))
print(da.sum(dtype="i4").dtype)

Problem description

The result is int64. This is a problem because I asked for int32.

Expected Output

Expected output int32.

Output of xr.show_versions()

# Paste the output here xr.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Linux OS-release: 2.6.32-696.6.3.el6.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

xarray: 0.10.0+dev12.gf882a58
pandas: 0.22.0
numpy: 1.14.0
scipy: 1.0.0
netCDF4: 1.3.1
h5netcdf: None
Nio: None
bottleneck: 1.2.1
cyordereddict: None
dask: 0.16.1
matplotlib: 2.1.1
cartopy: 0.15.1
seaborn: 0.8.1
setuptools: 38.4.0
pip: 9.0.1
conda: 4.3.16
pytest: 3.1.2
IPython: 6.1.0
sphinx: 1.6.2

@fujiisoup
Copy link
Member

fujiisoup commented Jan 19, 2018

I notice here

def f(values, axis=None, skipna=None, **kwargs):
# ignore keyword args inserted by np.mean and other numpy aggregators
# automatically:
kwargs.pop('dtype', None)
kwargs.pop('out', None)

we removed dtype arguments from reduce methods, but I'm not sure why.
This might be a bug.

As a temporal solution,

da.reduce(np.sum, dtype="i4")

preserves dtype.

@fujiisoup fujiisoup added the bug label Jan 19, 2018
@shoyer
Copy link
Member

shoyer commented Jan 19, 2018

We do this for two reasons:

  • bottleneck's aggregation functions like bottleneck.nansum() don't have a dtype argument, so passing on a dtype argument causes an error to be raised.
  • If you call a numpy function like numpy.sum() on an xarray object, it calls the appropriate method with all keyword arguments, e.g., numpy.sum(xarray_obj) -> xarray_obj.sum(axis=None, dtype=None, out=None).

What we should probably do here instead of ignoring dtype and out entirely is to look at their values:

  • If dtype is not None, use numpy's aggregation function instead of bottleneck's.
  • If out is not None, raise an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants