Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Add Topographic data source #150

Merged
merged 49 commits into from
Sep 28, 2021
Merged

Add Topographic data source #150

merged 49 commits into from
Sep 28, 2021

Conversation

jacobbieker
Copy link
Member

@jacobbieker jacobbieker commented Sep 22, 2021

Pull Request

Description

This PR adds support for topographic data from NASA SRTM. The original data is at 30m resolution worldwide. This fills a gap with functionality that was in SatFlow's data pipeline and is missing from nowcastin-dataset at the moment.

Fixes issue #105

How Has This Been Tested?

Unit tests

  • No
  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@jacobbieker jacobbieker added enhancement New feature or request data New data source or feature; or modification of existing data source labels Sep 22, 2021
@jacobbieker jacobbieker added this to the WP1 essential tasks milestone Sep 22, 2021
@jacobbieker jacobbieker self-assigned this Sep 22, 2021
@peterdudfield peterdudfield removed this from the WP1 essential tasks milestone Sep 24, 2021
@jacobbieker jacobbieker mentioned this pull request Sep 24, 2021
Comment on lines 70 to 84
# Rescale here to the exact size, assumes that the above is good slice
# Xarray doesn't have good rescaling built in, so convert to numpy then back
# TODO This is very slow, it would probably be faster to do in Numpy and convert back
# Coarsen is built in but doesn't end up with the correct shape a lot of the time, the output
# will be cut off on the right and bottom sides
# Could get data, resample down with Numpy, resample coordinates and then recreate the data?
# Or use EMFPy as optional dependency, as needs conda install or other complicated
nsamples = self._square.size_pixels + 1
xbins = np.linspace(selected_data.x.min(), selected_data.x.max(), nsamples).astype(int)
ybins = np.linspace(selected_data.y.min(), selected_data.y.max(), nsamples).astype(int)
selected_data = (
selected_data.groupby_bins("x", xbins).mean().groupby_bins("y", ybins).mean()
)
# TODO get coordinates again with center of bin?
print(selected_data)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the best way of going about this. Current approach will probably be to use xemfs to do the regridding. Adds quite a large dependency though, which seems to have to be installed through conda to work well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xESMF requires lat and lon to exist in the dataset. This could be accomplished by not projecting to OSGB coordinates till after this step? But that might also be not very effective rioxarray seems like it could also work as an option, with built in regridding and such

Copy link
Member

@JackKelly JackKelly Sep 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand... Are these comments now out-of-date because the plan is now to pre-compute topographic data at 1km resolution, so there's less heavy lifting to do every time nowcasting_dataset runs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, pretty much. I've changed it a bit so the data source can resample to the meters_per_pixel size of the input data if wanted, but most of the reprojection and such is done in the generation script

Copy link
Contributor

@peterdudfield peterdudfield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've commented a few changes, that are fair enough you didn't know. Please reach out if something doesnt make sense

Also thanks for fixing my typo on 'satelLite'

@jacobbieker jacobbieker mentioned this pull request Sep 27, 2021
Copy link
Member

@JackKelly JackKelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Very exciting to see this topographic data in nowcasting_dataset!

(dataset.width / data.shape[-1]), (dataset.height / data.shape[-2])
)
name = f.split("/")[-1]
# Set the nodata values to 0, as nearly all ocean.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in a later PR, it might be nice to also give the ML model a binary map of "ocean versus land".

src_crs = src.crs
xds.attrs["crs"] = src_crs

xds_resampled = xds.rio.reproject(dst_crs=dst_crs, resolution=500, resampling=Resampling.bilinear)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I've missed something but where does the 500 in resolution=500 come from? Half a kilometre per pixel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just in case we want a higher resolution map to downsample or whatever later. Its still only about 240mb I think? So still quite small.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool beans. It might be worth commenting on what the 500 means in the code? No big worries if not though!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it!

@jacobbieker jacobbieker merged commit 7324045 into main Sep 28, 2021
@jacobbieker jacobbieker deleted the jacob/elevation branch September 28, 2021 17:22
@JackKelly JackKelly linked an issue Oct 1, 2021 that may be closed by this pull request
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
data New data source or feature; or modification of existing data source enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Topographic Data
3 participants