How to avoid the auto convert variable dtype from float32 to float64 when read netCDF file use open_dataset? #1008

wqshen · 2016-09-19T10:51:29Z

when i read some netCDF4 file using xr.open_dataset, it seems this method would auto convert the variable type from float32 to float64, how to avoid it ?

Use xarry.open_dataset

import xarray as xr
xr.open_dataset('cat.20151003200633.nc')

will yield output as follow

<xarray.Dataset>
Dimensions:  (x: 461, y: 461, z: 9)
Coordinates:
  * x        (x) float32 -230.0 -229.0 -228.0 -227.0 -226.0 -225.0 -224.0 ...
  * y        (y) float32 -230.0 -229.0 -228.0 -227.0 -226.0 -225.0 -224.0 ...
  * z        (z) float32 0.4 1.4 2.3 3.2 4.3 5.8 9.7 14.5 19.3
Data variables:
    dbz      (z, y, x) float64 nan nan nan nan nan nan nan nan nan nan nan ...
    vr       (z, y, x) float64 nan nan nan nan nan nan nan nan nan nan nan ...
    sw       (z, y, x) float64 nan nan nan nan nan nan nan nan nan nan nan ...

Variables dtype of dbz, vr and sw in this file have been convert to float64, which actually is float32.

Use netCDF4.Dataset

import netCDF4 as ncf
ncf.Dataset('cat.20151003200633.nc')

will yield output as follow

   <type 'netCDF4._netCDF4.Dataset'>
    root group (NETCDF4 data model, file format HDF5): 
    ..........
    dimensions(sizes): x(461), y(461), z(9)
    variables(dimensions): float32 x(x), float32 y(y), float32 z(z), float32 dbz(z,y,x), float32 vr(z,y,x), float32 sw(z,y,x)
    groups:

The netCDF4.Dataset produce the right variable type, while the xarray.open_dataset not.

The text was updated successfully, but these errors were encountered:

shoyer · 2016-09-19T16:22:02Z

Some of your variables probably have scale_factor or add_offset attributes. To avoid overflow, xarray converts such variables to float64 when scaling.

You can disable the scaling by writing xr.open_dataset('cat.20151003200633.nc', mask_and_scale=False)

wqshen · 2016-09-20T00:44:27Z

Thanks for your reply.

Variables dbz, vr and sw have the _FillValue attribute, whose value is equal to _FillValue: -999.0

<type 'netCDF4._netCDF4.Variable'>
float32 dbz(z, y, x)
    _FillValue: -999.0
    units: dBZ
    long_name: reflectivity in log units
unlimited dimensions:
current shape = (9, 461, 461)
filling on

It seems that xarray will convert the dtype from float32 to float64, while the variable in netCDF4 file has the attribute _FillValue, xr.open_dataset auto identify the _FillValue in the variable and mask it with np.nan , however the variable is also changed into float64. I think this result with default argument is not reasonable enough, as this type conversion is not necessary in fact.

Using mask_and_scale=False will maintain the variable in float32 and retain the _FillValue attribute of variable. However, what i want is mask but maintain the dtype in float32. Is it has possible bug in the internal of the open_dataset method ?

shoyer · 2016-09-20T00:48:01Z

OK, that makes sense. I agree, we could keep such arrays as float32. We don't need to guard against overflow when only decoding a _FillValue.

If you're interested in taking a look at a fix, this is where the current logic is:

xarray/xarray/conventions.py

Lines 795 to 802 in 551a7bc

    
           if ((fill_value is not None and not np.any(pd.isnull(fill_value))) or 
        
                   scale_factor is not None or add_offset is not None): 
        
               if fill_value.dtype.kind in ['U', 'S']: 
        
                   dtype = object 
        
               else: 
        
                   dtype = float 
        
               data = MaskedAndScaledArray(data, fill_value, scale_factor, 
        
                                           add_offset, dtype)

wqshen · 2016-09-20T01:53:43Z

OK & Thanks

lvankampenhout · 2018-03-28T08:49:24Z

I stumbled across the same problem in xarray 0.9.1 and updating to 0.10.2 solved it. Perhaps this issue may be closed?

shoyer · 2018-03-28T22:37:00Z

Yes, this was fixed by @Zac-HD in #1840

wqshen changed the title ~~How to avoid the auto conversion from float32 to float64 when read netCDF file use open_dataset?~~ How to avoid the auto convert variable dtype from float32 to float64 when read netCDF file use open_dataset? Sep 19, 2016

shoyer closed this as completed Mar 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to avoid the auto convert variable dtype from float32 to float64 when read netCDF file use open_dataset? #1008

How to avoid the auto convert variable dtype from float32 to float64 when read netCDF file use open_dataset? #1008

wqshen commented Sep 19, 2016 •

edited

Loading

shoyer commented Sep 19, 2016

wqshen commented Sep 20, 2016

shoyer commented Sep 20, 2016

wqshen commented Sep 20, 2016

lvankampenhout commented Mar 28, 2018 •

edited

Loading

shoyer commented Mar 28, 2018

How to avoid the auto convert variable dtype from float32 to float64 when read netCDF file use open_dataset? #1008

How to avoid the auto convert variable dtype from float32 to float64 when read netCDF file use open_dataset? #1008

Comments

wqshen commented Sep 19, 2016 • edited Loading

Use xarry.open_dataset

Use netCDF4.Dataset

shoyer commented Sep 19, 2016

wqshen commented Sep 20, 2016

shoyer commented Sep 20, 2016

wqshen commented Sep 20, 2016

lvankampenhout commented Mar 28, 2018 • edited Loading

shoyer commented Mar 28, 2018

wqshen commented Sep 19, 2016 •

edited

Loading

lvankampenhout commented Mar 28, 2018 •

edited

Loading