This repository was archived by the owner on Sep 11, 2023. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 6
Big new design Part 2 :) #307
Merged
Merged
Changes from 17 commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
b102da8
Making a start on the big new design! Sketched out the basic design i…
JackKelly 63f0a2a
Implement arg_logger decorator
JackKelly 663852d
enable load_solar_pv_data to load from any compute environment. Fixe…
JackKelly 61be554
Successfully gets t0 datetimes
JackKelly ff18699
fix incorrect logger message
JackKelly 8d5043b
successfully checks for CSV file
JackKelly 8bef05c
Check there is no overlap between split datetimes. Fixes #299
JackKelly 4d28923
Successfully creates directories and spatial_and_temporal_locations_o…
JackKelly 33318b3
tidy up check_directories
JackKelly 856fe64
Fix merge conflicts with main
JackKelly af0b8f7
implement Manager._get_first_batches_to_create()
JackKelly b2387f3
start fleshing out Manager.create_batches()
JackKelly 04d4fbb
Finish first rough draft of Manager.create_batches()
JackKelly 72e39c8
Finally, a full complete draft of #213. Not yet tested
JackKelly 07db836
open DataSource
JackKelly 7004973
Delete datamodule.py and datasets.py
JackKelly f896a5e
Remove n_timesteps_per_batch and _cache from DataSources.
JackKelly e3d1597
Implement get_filesystem()
JackKelly af6707a
prepare_ml_data.py runs and successfully creates GSP batches!
JackKelly 71fdd78
implement check_input_paths_exist() in all DataSources
JackKelly 01b364d
fixed about half the unittests
JackKelly bd57063
all tests pass except the test_data_source_list.py. Fixed some error…
JackKelly b1f54b1
All tests pass!
JackKelly 38ecb01
fix linter errors
JackKelly f047423
more linter fixes
JackKelly bb1fecf
fix variable naming
JackKelly d1ef6ab
Update comments
JackKelly 05b184c
update README
JackKelly b556a73
Convert get_maximum_batch_id() to use lexographical sorting. Fixes #308
JackKelly bb515dd
address reviewer comments
JackKelly 0dfbe45
implement more reviewer suggestions
JackKelly 9fbcfd2
addressing more reviewer comments
JackKelly eaae156
update docs
JackKelly c51fad5
remove pytorch lightning!
JackKelly 75b4ce8
Fix bug: Create target directory if it does not exist
JackKelly e303d94
Update docstring
JackKelly cbaab07
Fix bug: Set first_batch_to_create to zero if the target directory do…
JackKelly 51f3640
fix spelling mistake
JackKelly ce6e5ba
check columns names
JackKelly File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,3 @@ | ||
""" Configuration of the dataset """ | ||
from nowcasting_dataset.config.load import load_yaml_configuration | ||
from nowcasting_dataset.config.model import Configuration, InputData, set_git_commit |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,16 +15,18 @@ | |
from typing import Optional | ||
|
||
import git | ||
import pandas as pd | ||
from pathy import Pathy | ||
from pydantic import BaseModel, Field, root_validator, validator | ||
|
||
# nowcasting_dataset imports | ||
from nowcasting_dataset.consts import ( | ||
DEFAULT_N_GSP_PER_EXAMPLE, | ||
DEFAULT_N_PV_SYSTEMS_PER_EXAMPLE, | ||
NWP_VARIABLE_NAMES, | ||
SAT_VARIABLE_NAMES, | ||
) | ||
|
||
from nowcasting_dataset.dataset.split import split | ||
|
||
IMAGE_SIZE_PIXELS_FIELD = Field(64, description="The number of pixels of the region of interest.") | ||
METERS_PER_PIXEL_FIELD = Field(2000, description="The number of meters per pixel.") | ||
|
@@ -102,7 +104,7 @@ class Satellite(DataSourceMixin): | |
"""Satellite configuration model""" | ||
|
||
satellite_zarr_path: str = Field( | ||
"gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr", | ||
"gs://solar-pv-nowcasting-data/satellite/EUMETSAT/SEVIRI_RSS/OSGB36/all_zarr_int16_single_timestep.zarr", # noqa: E501 | ||
description="The path which holds the satellite zarr.", | ||
) | ||
satellite_channels: tuple = Field( | ||
|
@@ -116,7 +118,7 @@ class NWP(DataSourceMixin): | |
"""NWP configuration model""" | ||
|
||
nwp_zarr_path: str = Field( | ||
"gs://solar-pv-nowcasting-data/NWP/UK_Met_Office/UKV__2018-01_to_2019-12__chunks__variable10__init_time1__step1__x548__y704__.zarr", | ||
"gs://solar-pv-nowcasting-data/NWP/UK_Met_Office/UKV__2018-01_to_2019-12__chunks__variable10__init_time1__step1__x548__y704__.zarr", # noqa: E501 | ||
description="The path which holds the NWP zarr.", | ||
) | ||
nwp_channels: tuple = Field(NWP_VARIABLE_NAMES, description="the channels used in the nwp data") | ||
|
@@ -213,7 +215,8 @@ def set_forecast_and_history_minutes(cls, values): | |
Run through the different data sources and if the forecast or history minutes are not set, | ||
then set them to the default values | ||
""" | ||
|
||
# It would be much better to use nowcasting_dataset.data_sources.ALL_DATA_SOURCE_NAMES, | ||
# but that causes a circular import. | ||
ALL_DATA_SOURCE_NAMES = ("pv", "satellite", "nwp", "gsp", "topographic", "sun") | ||
enabled_data_sources = [ | ||
data_source_name | ||
|
@@ -249,8 +252,8 @@ def set_all_to_defaults(cls): | |
class OutputData(BaseModel): | ||
"""Output data model""" | ||
|
||
filepath: str = Field( | ||
"gs://solar-pv-nowcasting-data/prepared_ML_training_data/v7/", | ||
filepath: Pathy = Field( | ||
Pathy("gs://solar-pv-nowcasting-data/prepared_ML_training_data/v7/"), | ||
description=( | ||
"Where the data is saved to. If this is running on the cloud then should include" | ||
" 'gs://' or 's3://'" | ||
|
@@ -262,7 +265,29 @@ class Process(BaseModel): | |
"""Pydantic model of how the data is processed""" | ||
|
||
seed: int = Field(1234, description="Random seed, so experiments can be repeatable") | ||
batch_size: int = Field(32, description="the number of examples per batch") | ||
batch_size: int = Field(32, description="The number of examples per batch") | ||
t0_datetime_frequency: pd.Timedelta = Field( | ||
pd.Timedelta("5 minutes"), | ||
description=( | ||
"The temporal frequency at which t0 datetimes will be sampled." | ||
" Can be any string that `pandas.Timedelta()` understands." | ||
" For example, if this is set to '5 minutes', then, for each example, the t0 datetime" | ||
" could be at 0, 5, ..., 55 minutes past the hour. If there are DataSources with a" | ||
" lower sample rate (e.g. half-hourly) then these lower-sample-rate DataSources will" | ||
" still produce valid examples. For example, if a half-hourly DataSource is asked for" | ||
" an example with t0=12:05, history_minutes=60, forecast_minutes=60, then it will" | ||
" return data at 11:30, 12:00, 12:30, and 13:00." | ||
), | ||
) | ||
split_method: split.SplitMethod = Field( | ||
split.SplitMethod.DAY, | ||
description=( | ||
"The method used to split the t0 datetimes into train, validation and test sets." | ||
), | ||
) | ||
n_train_batches: int = 250 | ||
n_validation_batches: int = 10 | ||
n_test_batches: int = 10 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wonder wheather the defaults should be slightly larger. Might be good to add a Field description too, I know its fairly obvious from the name, but just to help in the future |
||
upload_every_n_batches: int = Field( | ||
16, | ||
description=( | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -56,7 +56,7 @@ input_data: | |
topographic_filename: /mnt/storage_b/data/ocf/solar_pv_nowcasting/nowcasting_dataset_pipeline/Topographic/europe_dem_1km_osgb.tif | ||
|
||
output_data: | ||
filepath: /mnt/storage_b/data/ocf/solar_pv_nowcasting/nowcasting_dataset_pipeline/prepared_ML_training_data/v8/ | ||
filepath: /mnt/storage_b/data/ocf/solar_pv_nowcasting/nowcasting_dataset_pipeline/prepared_ML_training_data/v_testing/ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ideally this would be v9 (or v11), but perhaps this is fine for now. A todo could be made to change this in the future |
||
process: | ||
batch_size: 32 | ||
seed: 1234 | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.