Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Add Topographic data source #150

Merged
merged 49 commits into from
Sep 28, 2021
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
ef21a70
Start on adding Topographic data source
jacobbieker Sep 22, 2021
460f895
Copy more from SatelliteDataSource
jacobbieker Sep 22, 2021
7b3b25a
Add working generation script
jacobbieker Sep 24, 2021
7bb892a
Add mean and stddev
jacobbieker Sep 24, 2021
002ca09
Update variable name
jacobbieker Sep 24, 2021
b269930
Convert lat/lon to meters in NetCDF
jacobbieker Sep 24, 2021
11c33db
Simplify
jacobbieker Sep 24, 2021
11b75a8
Change inheritance
jacobbieker Sep 24, 2021
a6d9370
Squeeze extra dims
jacobbieker Sep 24, 2021
3a90c98
Add OSGB reprojection in generation
jacobbieker Sep 24, 2021
734693a
Remove prints
jacobbieker Sep 24, 2021
55f2faf
Add metadata to generated data
jacobbieker Sep 24, 2021
58715fc
Add rounding
jacobbieker Sep 24, 2021
9503301
Fix math
jacobbieker Sep 24, 2021
d296272
Ensure that the downsample scale is an int
jacobbieker Sep 24, 2021
4fb3ddc
Topographic data idea
jacobbieker Sep 24, 2021
b0ca3b0
Switch to xESMF
jacobbieker Sep 24, 2021
adae2a9
Simplify generation of topo map
jacobbieker Sep 27, 2021
a26ee82
Simplify data source
jacobbieker Sep 27, 2021
c625278
Add test GeoTIFF
jacobbieker Sep 27, 2021
bf3542c
Add requirement
jacobbieker Sep 27, 2021
8bdfaec
Merge branch 'main' into jacob/elevation
jacobbieker Sep 27, 2021
77c8f1d
Update comment
jacobbieker Sep 27, 2021
c2aa243
Fix tests, add docstrings
jacobbieker Sep 27, 2021
c09c242
Add topographic filepath to InputData
jacobbieker Sep 27, 2021
8622f5d
Fix typo
jacobbieker Sep 27, 2021
fae1bef
Fix typos
jacobbieker Sep 27, 2021
118a805
Rename option
jacobbieker Sep 27, 2021
a72fc78
Fix typo
jacobbieker Sep 27, 2021
015a621
Add TopographicDataSource options in datamodule
jacobbieker Sep 27, 2021
89aab00
Add another assert
jacobbieker Sep 27, 2021
b2c14c2
Squeeze down to 2D array
jacobbieker Sep 27, 2021
3a81924
Fix test
jacobbieker Sep 27, 2021
2938874
Update version
jacobbieker Sep 27, 2021
5774d01
Remove print
jacobbieker Sep 27, 2021
2601554
Fix description
jacobbieker Sep 27, 2021
cfedd43
Address most PR comments
jacobbieker Sep 27, 2021
d7a1922
Fix variable name
jacobbieker Sep 27, 2021
966b83d
Fix batch creation
jacobbieker Sep 27, 2021
182c5d6
Merge remote-tracking branch 'origin/main' into jacob/elevation
jacobbieker Sep 27, 2021
9b6c2dd
Update validate
jacobbieker Sep 27, 2021
cbc41f7
Add Topo to FakeDataset
jacobbieker Sep 27, 2021
56e874c
Remove name from paths
jacobbieker Sep 28, 2021
2d7e1c1
Merge remote-tracking branch 'origin/main' into jacob/elevation
jacobbieker Sep 28, 2021
afe9990
Add full filename
jacobbieker Sep 28, 2021
59b953e
Fix path
jacobbieker Sep 28, 2021
49311d8
Merge branch 'main' into jacob/elevation
jacobbieker Sep 28, 2021
fca7984
Add Topo data to prepare script and configs
jacobbieker Sep 28, 2021
6e63ba1
Update nowcasting_dataset/data_sources/topographic_data_source.py
jacobbieker Sep 28, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions nowcasting_dataset/config/example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ general:
name: example
input_data:
bucket: solar-pv-nowcasting-data
npw_base_path: NWP/UK_Met_Office/UKV__2018-01_to_2019-12__chunks__variable10__init_time1__step1__x548__y704__.zarr
satelite_filename: satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
nwp_base_path: NWP/UK_Met_Office/UKV__2018-01_to_2019-12__chunks__variable10__init_time1__step1__x548__y704__.zarr
satellite_filename: satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
solar_pv_data_filename: UK_PV_timeseries_batch.nc
solar_pv_metadata_filename: UK_PV_metadata.csv
solar_pv_path: PV/PVOutput.org
Expand Down
4 changes: 2 additions & 2 deletions nowcasting_dataset/config/gcp.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@ general:
name: example
input_data:
bucket: solar-pv-nowcasting-data
npw_base_path: NWP/UK_Met_Office/UKV__2018-01_to_2019-12__chunks__variable10__init_time1__step1__x548__y704__.zarr
satelite_filename: satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
nwp_base_path: NWP/UK_Met_Office/UKV__2018-01_to_2019-12__chunks__variable10__init_time1__step1__x548__y704__.zarr
satellite_filename: satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr
solar_pv_data_filename: UK_PV_timeseries_batch.nc
solar_pv_metadata_filename: UK_PV_metadata.csv
solar_pv_path: PV/PVOutput.org
Expand Down
8 changes: 6 additions & 2 deletions nowcasting_dataset/config/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,16 +32,20 @@ class InputData(BaseModel):
solar_pv_data_filename: str = Field("UK_PV_timeseries_batch.nc", description="TODO")
solar_pv_metadata_filename: str = Field("UK_PV_metadata.csv", description="TODO")

satelite_filename: str = Field(
satellite_filename: str = Field(
"satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr",
description="TODO",
)

npw_base_path: str = Field(
nwp_base_path: str = Field(
"NWP/UK_Met_Office/UKV__2018-01_to_2019-12__chunks__variable10__init_time1__step1__x548__y704__.zarr",
description="TODO",
)

topographic_filename: str = Field(
"europe_dem_1km_osgb.tif", description="Path to the GeoTIFF Topographic data source"
)

gsp_filename: str = Field("PV/GSP/v0/pv_gsp.zarr")


Expand Down
3 changes: 3 additions & 0 deletions nowcasting_dataset/consts.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,4 +55,7 @@
NWP_Y_COORDS = "nwp_y_coords"
X_METERS_CENTER = "x_meters_center"
Y_METERS_CENTER = "y_meters_center"
TOPOGRAPHIC_DATA = "topo_data"
TOPOGRAPHIC_X_COORDS = "topo_x_coords"
TOPOGRAPHIC_Y_COORDS = "topo_y_coords"
T0_DT = "t0_dt"
1 change: 1 addition & 0 deletions nowcasting_dataset/data_sources/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
from nowcasting_dataset.data_sources.pv_data_source import PVDataSource
from nowcasting_dataset.data_sources.nwp_data_source import NWPDataSource
from nowcasting_dataset.data_sources.datetime_data_source import DatetimeDataSource
from nowcasting_dataset.data_sources.topographic_data_source import TopographicDataSource
145 changes: 145 additions & 0 deletions nowcasting_dataset/data_sources/topographic_data_source.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
from nowcasting_dataset.data_sources.data_source import ImageDataSource
from nowcasting_dataset.dataset.example import Example
from nowcasting_dataset.consts import TOPOGRAPHIC_DATA
from nowcasting_dataset.geospatial import OSGB
from rasterio.warp import Resampling
from dataclasses import dataclass, InitVar
from numbers import Number
import pandas as pd
import xarray as xr
import numpy as np
import rioxarray

# Means computed with
# out_fp = "europe_dem_1km.tif"
# out = rasterio.open(out_fp)
# data = out.read(masked=True)
# print(np.mean(data))
# print(np.std(data))
TOPO_MEAN = xr.DataArray(
data=[
365.486887,
],
dims=["variable"],
coords={"variable": [TOPOGRAPHIC_DATA]},
).astype(np.float32)

TOPO_STD = xr.DataArray(
data=[
478.841369,
],
dims=["variable"],
coords={"variable": [TOPOGRAPHIC_DATA]},
).astype(np.float32)


@dataclass
class TopographicDataSource(ImageDataSource):
"""Add topographic/elevation map features."""

filename: str = None
normalize: bool = True

def __post_init__(self, image_size_pixels: int, meters_per_pixel: int):
super().__post_init__(image_size_pixels, meters_per_pixel)
self._shape_of_example = (
image_size_pixels,
image_size_pixels,
)
self._data = rioxarray.open_rasterio(
filename=self.filename, parse_coordinates=True, masked=True
)
self._data = self._data.fillna(0) # Set nodata values to 0 (mostly should be ocean)
# Add CRS for later, topo maps are assumed to be in OSGB
self._data.attrs["crs"] = OSGB
print(self._data.shape)
# Distance between pixels, giving their spatial extant, in meters
self._stored_pixel_size_meters = abs(self._data.coords["x"][1] - self._data.coords["x"][0])
self._meters_per_pixel = meters_per_pixel

def get_example(
self, t0_dt: pd.Timestamp, x_meters_center: Number, y_meters_center: Number
) -> Example:
"""
Get a single example

Args:
t0_dt: Current datetime for the example, unused
x_meters_center: Center of the example in meters in the x direction in OSGB coordinates
y_meters_center: Center of the example in meters in the y direction in OSGB coordinates

Returns:
Example containing topographic data for the selected area
"""

bounding_box = self._square.bounding_box_centered_on(
x_meters_center=x_meters_center, y_meters_center=y_meters_center
)
selected_data = self._data.sel(
x=slice(bounding_box.left, bounding_box.right),
y=slice(bounding_box.top, bounding_box.bottom),
)
if self._stored_pixel_size_meters != self._meters_per_pixel:
# Rescale here to the exact size, assumes that the above is good slice
# Useful if using different spatially sized grids
selected_data = selected_data.rio.reproject(
dst_crs=selected_data.attrs["crs"],
shape=(self._square.size_pixels, self._square.size_pixels),
resampling=Resampling.bilinear,
)

# selected_sat_data is likely to have 1 too many pixels in x and y
# because sel(x=slice(a, b)) is [a, b], not [a, b). So trim:
selected_data = selected_data.isel(
x=slice(0, self._square.size_pixels), y=slice(0, self._square.size_pixels)
)

selected_data = self._post_process_example(selected_data, t0_dt)
if selected_data.shape != self._shape_of_example:
raise RuntimeError(
"Example is wrong shape! "
f"x_meters_center={x_meters_center}\n"
f"y_meters_center={y_meters_center}\n"
f"t0_dt={t0_dt}\n"
f"expected shape={self._shape_of_example}\n"
f"actual shape {selected_data.shape}"
)

return self._put_data_into_example(selected_data)

def _put_data_into_example(self, selected_data: xr.DataArray) -> Example:
"""
Insert the data and coordinates into an Example

Args:
selected_data: DataArray containing the data to insert

Returns:
Example containing the Topographic data
"""
return Example(
topo_data=selected_data,
topo_x_coords=selected_data.x,
topo_y_coords=selected_data.y,
)

def _post_process_example(
self, selected_data: xr.DataArray, t0_dt: pd.Timestamp
) -> xr.DataArray:
"""
Post process the topographical data, removing an extra dim and optionally
normalizing

Args:
selected_data: DataArray containing the topographic data
t0_dt: Unused

Returns:
DataArray with optionally normalized data, and removed first dimension
"""
if self.normalize:
selected_data = selected_data - TOPO_MEAN
selected_data = selected_data / TOPO_STD
# Shrink extra dims
selected_data = selected_data.squeeze()
return selected_data
15 changes: 15 additions & 0 deletions nowcasting_dataset/dataset/datamodule.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ class NowcastingDataModule(pl.LightningDataModule):
"hcc",
)
satellite_image_size_pixels: int = 128 #: Passed to Data Sources.
topographic_filename: Optional[Union[str, Path]] = None
nwp_image_size_pixels: int = 2 #: Passed to Data Sources.
meters_per_pixel: int = 2000 #: Passed to Data Sources.
convert_to_numpy: bool = True #: Passed to Data Sources.
Expand Down Expand Up @@ -165,6 +166,20 @@ def prepare_data(self) -> None:

self.data_sources.append(self.nwp_data_source)

# Topographic data
if self.topographic_filename is not None:
self.topo_data_source = data_sources.TopographicDataSource(
filename=self.topographic_filename,
image_size_pixels=self.satellite_image_size_pixels,
meters_per_pixel=self.meters_per_pixel,
history_minutes=self.history_minutes,
forecast_minutes=self.forecast_minutes,
convert_to_numpy=self.convert_to_numpy,
normalize=self.normalise_sat,
)

self.data_sources.append(self.topo_data_source)

self.datetime_data_source = data_sources.DatetimeDataSource(
history_minutes=self.history_minutes,
forecast_minutes=self.forecast_minutes,
Expand Down
7 changes: 7 additions & 0 deletions nowcasting_dataset/dataset/example.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,13 @@ class Example(TypedDict):
sat_x_coords: Array #: OSGB geo-spatial coordinates.
sat_y_coords: Array

# Topographic data
# Elevation map of the area covered by the satellite data
# Shape: [batch_size,] width, height, 1
topo_data: Array
topo_x_coords: Array
topo_y_coords: Array

#: PV yield from all PV systems in the region of interest (ROI).
#: Includes central PV system, which will always be the first entry.
#: shape = [batch_size, ] seq_length, n_pv_systems_per_example
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,4 @@ plotly
tqdm
black
pre-commit
rioxarray
87 changes: 87 additions & 0 deletions scripts/generate_topographic_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
"""
Script that takes the SRTM 30m worldwide elevation maps and creates a
single representation of the area in NetCDF and GeoTIFF formats

The SRTM files can be downloaded from NASA, and are in CRS EPSG:4326
"""
import os.path
import glob
import rasterio
from rasterio.merge import merge
from rasterio.warp import Resampling
from nowcasting_dataset.geospatial import OSGB
import rioxarray

dst_crs = OSGB

# Go through, open all the files, combined by coords, then save out to NetCDF, or GeoTIFF
files = glob.glob("/run/media/jacob/data/SRTM/*.tif")
out_dir = "/run/media/jacob/data/SRTM1KM/"


upscale_factor = 0.12 # 30m to 250m-ish, just making it small enough files to actually merge
for f in files:
with rasterio.open(f) as dataset:

# resample data to target shape
data = dataset.read(
out_shape=(
dataset.count,
int(dataset.height * upscale_factor),
int(dataset.width * upscale_factor),
),
resampling=Resampling.bilinear,
)

# scale image transform
transform = dataset.transform * dataset.transform.scale(
(dataset.width / data.shape[-1]), (dataset.height / data.shape[-2])
)
name = f.split("/")[-1]
# Set the nodata values to 0, as nearly all ocean.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in a later PR, it might be nice to also give the ML model a binary map of "ocean versus land".

out_meta = dataset.meta.copy()
out_meta.update(
{
"driver": "GTiff",
"height": data.shape[1],
"width": data.shape[2],
"transform": transform,
}
)
with rasterio.open(os.path.join(out_dir, name), "w", **out_meta) as dest:
dest.write(data)
files = glob.glob("/run/media/jacob/data/SRTM1KM/*.tif")
src = rasterio.open(files[0])

mosaic, out_trans = merge(files)
out_meta = src.meta.copy()
out_meta.update(
{
"driver": "GTiff",
"height": mosaic.shape[1],
"width": mosaic.shape[2],
"transform": out_trans,
}
)
out_fp = "europe_dem_250m.tif"
with rasterio.open(out_fp, "w", **out_meta) as dest:
dest.write(mosaic)

xds = rioxarray.open_rasterio(out_fp, parse_coordinates=True)
# Reproject to exactly 1km pixels now
with rasterio.open(out_fp) as src:
src_crs = src.crs
xds.attrs["crs"] = src_crs

xds_resampled = xds.rio.reproject(dst_crs=dst_crs, resolution=500, resampling=Resampling.bilinear)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I've missed something but where does the 500 in resolution=500 come from? Half a kilometre per pixel?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, just in case we want a higher resolution map to downsample or whatever later. Its still only about 240mb I think? So still quite small.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool beans. It might be worth commenting on what the 500 means in the code? No big worries if not though!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it!

print(xds_resampled)
print(abs(xds_resampled.coords["x"][1] - xds_resampled.coords["x"][0]))
xds_resampled.rio.to_raster("europe_dem_500m_osgb.tif")
xds_resampled = xds.rio.reproject(dst_crs=dst_crs, resolution=1000, resampling=Resampling.bilinear)
print(xds_resampled)
print(abs(xds_resampled.coords["x"][1] - xds_resampled.coords["x"][0]))
xds_resampled.rio.to_raster("europe_dem_1km_osgb.tif")
xds_resampled = xds.rio.reproject(dst_crs=dst_crs, resolution=2000, resampling=Resampling.bilinear)
print(xds_resampled)
print(abs(xds_resampled.coords["x"][1] - xds_resampled.coords["x"][0]))
xds_resampled.rio.to_raster("europe_dem_2km_osgb.tif")
4 changes: 2 additions & 2 deletions scripts/prepare_ml_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,10 +57,10 @@
PV_DATA_FILENAME = PV_PATH / config.input_data.solar_pv_data_filename
PV_METADATA_FILENAME = PV_PATH / config.input_data.solar_pv_metadata_filename

SAT_FILENAME = BUCKET / config.input_data.satelite_filename
SAT_FILENAME = BUCKET / config.input_data.satellite_filename

# Numerical weather predictions
NWP_BASE_PATH = BUCKET / config.input_data.npw_base_path
NWP_BASE_PATH = BUCKET / config.input_data.nwp_base_path

# GSP data
GSP_FILENAME = BUCKET / config.input_data.gsp_filename
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

setup(
name="nowcasting_dataset",
version="0.1.5",
version="0.1.6",
license="MIT",
description="Nowcasting Dataset",
author="Jack Kelly, Peter Dudfield, Jacob Bieker",
Expand Down
5 changes: 3 additions & 2 deletions tests/config/nwp_size_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@ general:
name: example
input_data:
bucket: solar-pv-nowcasting-data
npw_base_path: tests/data/nwp_data/test.zarr
satelite_filename: tests/data/sat_data.zarr
nwp_base_path: tests/data/nwp_data/test.zarr
satellite_filename: tests/data/sat_data.zarr
solar_pv_data_filename: tests/data/pv_data/test.nc
solar_pv_metadata_filename: tests/data/pv_metadata/UK_PV_metadata.csv
solar_pv_path: tests/data/pv_data
gsp_filename: tests/data/gsp/test.zarr
topographic_filename: tests/data/europe_dem_2km_osgb.tif
output_data:
filepath: solar-pv-nowcasting-data/prepared_ML_training_data/v5/
process:
Expand Down
5 changes: 3 additions & 2 deletions tests/config/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@ general:
name: example
input_data:
bucket: solar-pv-nowcasting-data
npw_base_path: tests/data/nwp_data/test.zarr
satelite_filename: tests/data/sat_data.zarr
nwp_base_path: tests/data/nwp_data/test.zarr
satellite_filename: tests/data/sat_data.zarr
solar_pv_data_filename: tests/data/pv_data/test.nc
solar_pv_metadata_filename: tests/data/pv_metadata/UK_PV_metadata.csv
solar_pv_path: tests/data/pv_data
gsp_filename: tests/data/gsp/test.zarr
topographic_filename: tests/data/europe_dem_2km_osgb.tif
output_data:
filepath: solar-pv-nowcasting-data/prepared_ML_training_data/v5/
process:
Expand Down
Binary file added tests/data/europe_dem_2km_osgb.tif
Binary file not shown.
Loading