-
-
Notifications
You must be signed in to change notification settings - Fork 6
Conversation
Tests seem to suggest that the OSGB coordinates are huge and not as expected
Tests are now close, but not quite there
# Rescale here to the exact size, assumes that the above is good slice | ||
# Xarray doesn't have good rescaling built in, so convert to numpy then back | ||
# TODO This is very slow, it would probably be faster to do in Numpy and convert back | ||
# Coarsen is built in but doesn't end up with the correct shape a lot of the time, the output | ||
# will be cut off on the right and bottom sides | ||
# Could get data, resample down with Numpy, resample coordinates and then recreate the data? | ||
# Or use EMFPy as optional dependency, as needs conda install or other complicated | ||
nsamples = self._square.size_pixels + 1 | ||
xbins = np.linspace(selected_data.x.min(), selected_data.x.max(), nsamples).astype(int) | ||
ybins = np.linspace(selected_data.y.min(), selected_data.y.max(), nsamples).astype(int) | ||
selected_data = ( | ||
selected_data.groupby_bins("x", xbins).mean().groupby_bins("y", ybins).mean() | ||
) | ||
# TODO get coordinates again with center of bin? | ||
print(selected_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the best way of going about this. Current approach will probably be to use xemfs to do the regridding. Adds quite a large dependency though, which seems to have to be installed through conda to work well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xESMF requires lat
and lon
to exist in the dataset. This could be accomplished by not projecting to OSGB coordinates till after this step? But that might also be not very effective rioxarray seems like it could also work as an option, with built in regridding and such
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand... Are these comments now out-of-date because the plan is now to pre-compute topographic data at 1km resolution, so there's less heavy lifting to do every time nowcasting_dataset
runs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, pretty much. I've changed it a bit so the data source can resample to the meters_per_pixel
size of the input data if wanted, but most of the reprojection and such is done in the generation script
Only resample if resolution is different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've commented a few changes, that are fair enough you didn't know. Please reach out if something doesnt make sense
Also thanks for fixing my typo on 'satelLite'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Very exciting to see this topographic data in nowcasting_dataset
!
(dataset.width / data.shape[-1]), (dataset.height / data.shape[-2]) | ||
) | ||
name = f.split("/")[-1] | ||
# Set the nodata values to 0, as nearly all ocean. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps in a later PR, it might be nice to also give the ML model a binary map of "ocean versus land".
src_crs = src.crs | ||
xds.attrs["crs"] = src_crs | ||
|
||
xds_resampled = xds.rio.reproject(dst_crs=dst_crs, resolution=500, resampling=Resampling.bilinear) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if I've missed something but where does the 500
in resolution=500
come from? Half a kilometre per pixel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, just in case we want a higher resolution map to downsample or whatever later. Its still only about 240mb I think? So still quite small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool beans. It might be worth commenting on what the 500 means in the code? No big worries if not though!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added it!
Co-authored-by: Jack Kelly <[email protected]>
Pull Request
Description
This PR adds support for topographic data from NASA SRTM. The original data is at 30m resolution worldwide. This fills a gap with functionality that was in SatFlow's data pipeline and is missing from nowcastin-dataset at the moment.
Fixes issue #105
How Has This Been Tested?
Unit tests
Checklist: