@@ -47,3 +47,66 @@ This inherits from 'datasource_output.DataSourceOutput'.
47
47
48
48
` fake.py ` has several function to create fake ` Batch ` data. This is useful for testing,
49
49
and hopefully useful outside this module too.
50
+
51
+
52
+ ## How to add a new data source
53
+
54
+ This should give a checklist of general things to do when creating a new data source.
55
+ 1 . Assuming that data can not be made on the fly, create script to make process data.
56
+
57
+ 2 . Create folder in nowcasting/data_sources with the name of the new data source
58
+
59
+ 3 . Create a file called ` <name>_datasource.py ` . This file should contain class which
60
+ inherits ` nowcasting_dataset.data_source.DataSource `
61
+
62
+ 4 . This class will need ` get_example ` method. (there is also an option to use a ` get_batch ` method instead)
63
+ ``` python
64
+ def get_example (
65
+ self , t0_dt : pd.Timestamp, x_meters_center : Number, y_meters_center : Number
66
+ ) -> NewDataSource:
67
+ """
68
+ Get a single example
69
+
70
+ Args:
71
+ t0_dt: Current datetime for the example, unused
72
+ x_meters_center: Center of the example in meters in the x direction in OSGB coordinates
73
+ y_meters_center: Center of the example in meters in the y direction in OSGB coordinates
74
+
75
+ Returns:
76
+ Example containing xxx data for the selected area
77
+ """
78
+ ```
79
+
80
+ 5 . Create a file called ` <name>_model.py ` which a class with the name of the data soure. This class is an extension
81
+ of an xr.Dataset with some pydantic validation
82
+ ``` python
83
+ class NewDataSource (DataSourceOutput ):
84
+ """ Class to store <name> data as a xr.Dataset with some validation """
85
+
86
+ # Use to store xr.Dataset data
87
+ __slots__ = ()
88
+ _expected_dimensions = (" x" , " y" )
89
+
90
+ @ classmethod
91
+ def model_validation (cls , v ):
92
+ """ Check that all values are not NaNs """
93
+ assert (v.data != np.nan).all(), " Some data values are NaNs"
94
+ return v
95
+
96
+ ```
97
+ 6 . Also in ` <name>_model.py ` create a pydantic model of the new data output which will be used for machine learning. The
98
+ pydantic model is typically the data and coords of the xr.Dataset changed into ` torch.Tensor ` .
99
+ Note this might move to ` nowcatsing_dataloader ` soon.
100
+
101
+ 7 . Add to new data source ` Batch ` object.
102
+
103
+ 8 . Add new data source to ` nowcasting.dataset.datamodule.NowcastingDataModule ` .
104
+
105
+ 9 . Add configuration data to configuration model, for example where the raw data is loaded from.
106
+
107
+ ### Testing
108
+ 1 . Create a test to check that new data source is loaded correctly.
109
+ 2 . Create a script to make test data in ` scritps/generate_data_for_tests `
110
+ 3 . Create a function to make a randomly generated xr.Dataset for generating fake data in ` nowcasting.dataset.fake.py ` \
111
+ and to batch fake function
112
+ 4 . Re-run script to generate batch test data.
0 commit comments