Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Commit 7d8a114

Browse files
committed
first try at #174
1 parent faab4fe commit 7d8a114

File tree

1 file changed

+63
-0
lines changed

1 file changed

+63
-0
lines changed

nowcasting_dataset/data_sources/README.md

+63
Original file line numberDiff line numberDiff line change
@@ -47,3 +47,66 @@ This inherits from 'datasource_output.DataSourceOutput'.
4747

4848
`fake.py` has several function to create fake `Batch` data. This is useful for testing,
4949
and hopefully useful outside this module too.
50+
51+
52+
## How to add a new data source
53+
54+
This should give a checklist of general things to do when creating a new data source.
55+
1. Assuming that data can not be made on the fly, create script to make process data.
56+
57+
2. Create folder in nowcasting/data_sources with the name of the new data source
58+
59+
3. Create a file called `<name>_datasource.py`. This file should contain class which
60+
inherits `nowcasting_dataset.data_source.DataSource`
61+
62+
4. This class will need `get_example` method. (there is also an option to use a `get_batch` method instead)
63+
```python
64+
def get_example(
65+
self, t0_dt: pd.Timestamp, x_meters_center: Number, y_meters_center: Number
66+
) -> NewDataSource:
67+
"""
68+
Get a single example
69+
70+
Args:
71+
t0_dt: Current datetime for the example, unused
72+
x_meters_center: Center of the example in meters in the x direction in OSGB coordinates
73+
y_meters_center: Center of the example in meters in the y direction in OSGB coordinates
74+
75+
Returns:
76+
Example containing xxx data for the selected area
77+
"""
78+
```
79+
80+
5. Create a file called `<name>_model.py` which a class with the name of the data soure. This class is an extension
81+
of an xr.Dataset with some pydantic validation
82+
```python
83+
class NewDataSource(DataSourceOutput):
84+
""" Class to store <name> data as a xr.Dataset with some validation """
85+
86+
# Use to store xr.Dataset data
87+
__slots__ = ()
88+
_expected_dimensions = ("x", "y")
89+
90+
@classmethod
91+
def model_validation(cls, v):
92+
""" Check that all values are not NaNs """
93+
assert (v.data != np.nan).all(), "Some data values are NaNs"
94+
return v
95+
96+
```
97+
6. Also in `<name>_model.py` create a pydantic model of the new data output which will be used for machine learning. The
98+
pydantic model is typically the data and coords of the xr.Dataset changed into `torch.Tensor`.
99+
Note this might move to `nowcatsing_dataloader` soon.
100+
101+
7. Add to new data source `Batch` object.
102+
103+
8. Add new data source to `nowcasting.dataset.datamodule.NowcastingDataModule`.
104+
105+
9. Add configuration data to configuration model, for example where the raw data is loaded from.
106+
107+
### Testing
108+
1. Create a test to check that new data source is loaded correctly.
109+
2. Create a script to make test data in `scritps/generate_data_for_tests`
110+
3. Create a function to make a randomly generated xr.Dataset for generating fake data in `nowcasting.dataset.fake.py` \
111+
and to batch fake function
112+
4. Re-run script to generate batch test data.

0 commit comments

Comments
 (0)