Skip to content

How to avoid the auto convert variable dtype from float32 to float64 when read netCDF file use open_dataset? #1008

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wqshen opened this issue Sep 19, 2016 · 6 comments

Comments

@wqshen
Copy link

wqshen commented Sep 19, 2016

when i read some netCDF4 file using xr.open_dataset, it seems this method would auto convert the variable type from float32 to float64, how to avoid it ?

Use xarry.open_dataset

import xarray as xr
xr.open_dataset('cat.20151003200633.nc')

will yield output as follow

<xarray.Dataset>
Dimensions:  (x: 461, y: 461, z: 9)
Coordinates:
  * x        (x) float32 -230.0 -229.0 -228.0 -227.0 -226.0 -225.0 -224.0 ...
  * y        (y) float32 -230.0 -229.0 -228.0 -227.0 -226.0 -225.0 -224.0 ...
  * z        (z) float32 0.4 1.4 2.3 3.2 4.3 5.8 9.7 14.5 19.3
Data variables:
    dbz      (z, y, x) float64 nan nan nan nan nan nan nan nan nan nan nan ...
    vr       (z, y, x) float64 nan nan nan nan nan nan nan nan nan nan nan ...
    sw       (z, y, x) float64 nan nan nan nan nan nan nan nan nan nan nan ...

Variables dtype of dbz, vr and sw in this file have been convert to float64, which actually is float32.

Use netCDF4.Dataset

import netCDF4 as ncf
ncf.Dataset('cat.20151003200633.nc')

will yield output as follow

   <type 'netCDF4._netCDF4.Dataset'>
    root group (NETCDF4 data model, file format HDF5): 
    ..........
    dimensions(sizes): x(461), y(461), z(9)
    variables(dimensions): float32 x(x), float32 y(y), float32 z(z), float32 dbz(z,y,x), float32 vr(z,y,x), float32 sw(z,y,x)
    groups:

The netCDF4.Dataset produce the right variable type, while the xarray.open_dataset not.

@wqshen wqshen changed the title How to avoid the auto conversion from float32 to float64 when read netCDF file use open_dataset? How to avoid the auto convert variable dtype from float32 to float64 when read netCDF file use open_dataset? Sep 19, 2016
@shoyer
Copy link
Member

shoyer commented Sep 19, 2016

Some of your variables probably have scale_factor or add_offset attributes. To avoid overflow, xarray converts such variables to float64 when scaling.

You can disable the scaling by writing xr.open_dataset('cat.20151003200633.nc', mask_and_scale=False)

@wqshen
Copy link
Author

wqshen commented Sep 20, 2016

Thanks for your reply.

Variables dbz, vr and sw have the _FillValue attribute, whose value is equal to _FillValue: -999.0

<type 'netCDF4._netCDF4.Variable'>
float32 dbz(z, y, x)
    _FillValue: -999.0
    units: dBZ
    long_name: reflectivity in log units
unlimited dimensions:
current shape = (9, 461, 461)
filling on

It seems that xarray will convert the dtype from float32 to float64, while the variable in netCDF4 file has the attribute _FillValue, xr.open_dataset auto identify the _FillValue in the variable and mask it with np.nan , however the variable is also changed into float64. I think this result with default argument is not reasonable enough, as this type conversion is not necessary in fact.

Using mask_and_scale=False will maintain the variable in float32 and retain the _FillValue attribute of variable. However, what i want is mask but maintain the dtype in float32. Is it has possible bug in the internal of the open_dataset method ?

@shoyer
Copy link
Member

shoyer commented Sep 20, 2016

OK, that makes sense. I agree, we could keep such arrays as float32. We don't need to guard against overflow when only decoding a _FillValue.

If you're interested in taking a look at a fix, this is where the current logic is:

if ((fill_value is not None and not np.any(pd.isnull(fill_value))) or
scale_factor is not None or add_offset is not None):
if fill_value.dtype.kind in ['U', 'S']:
dtype = object
else:
dtype = float
data = MaskedAndScaledArray(data, fill_value, scale_factor,
add_offset, dtype)

@wqshen
Copy link
Author

wqshen commented Sep 20, 2016

OK & Thanks

@lvankampenhout
Copy link

lvankampenhout commented Mar 28, 2018

I stumbled across the same problem in xarray 0.9.1 and updating to 0.10.2 solved it. Perhaps this issue may be closed?

@shoyer
Copy link
Member

shoyer commented Mar 28, 2018

Yes, this was fixed by @Zac-HD in #1840

@shoyer shoyer closed this as completed Mar 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants