Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Compute datetime features on-the-fly in nowcasting_dataloader. Remove datetime features from the on-disk batches. #208

Closed
Tracked by #213
JackKelly opened this issue Oct 7, 2021 · 5 comments · Fixed by #332
Labels
data New data source or feature; or modification of existing data source refactoring

Comments

@JackKelly
Copy link
Member

It's possible that we might want to remove datetime features from the on-disk batches, and instead compute datetime features on-the-fly, not least because we'll want to experiment with a variety of different ways of encoding position when we really start experimenting with the perceiver IO models (see openclimatefix/nowcasting_utils#30)

These datetime features should be really fast to compute. And, if we start saving different 'modalities' to disk as different files, then it will almost certainly be faster to compute these on-the-fly rather than saving them into tiny files on disk.

@JackKelly JackKelly added data New data source or feature; or modification of existing data source refactoring labels Oct 7, 2021
@JackKelly JackKelly mentioned this issue Oct 7, 2021
7 tasks
@peterdudfield
Copy link
Contributor

It would be nice, probably with every data_source, to have a 'compute-on-the-file' option.
This means we could either load it from batches, or create each one on the file.

I realise that doing this general method is a little be harder than just changin DateTime to be on the fly, but it might be worth putting that extra effort in

@JackKelly
Copy link
Member Author

This means we could either load it from batches, or create each one on the file.

I must admit that I think we should lean towards dictating one way of doing things (at least, for now), as it keeps the code that much simpler. So, for example, in this instance, I think we should - at some point, if we all agree - switch entirely from saving datetime features to disk to computing them on the fly. Supporting both options does make the code a bit more complex and harder to maintain and, especially while we're doing ML R&D, I think it's fine to limit code complexity by dictating "this is the one way this works"

@jacobbieker
Copy link
Member

I agree with @JackKelly, I think we should try to keep this simpler if we can. There already will be a lot of different config options for all the different modalites, etc. that reducing them when we can I think is a good idea, and would make our process a bit faster. Especially for something like these datetime features that should be quick and easy to compute on the fly, dictating that we won't save them to disk I think is fine.

@flowirtz flowirtz moved this to Todo in Nowcasting Oct 15, 2021
@JackKelly JackKelly changed the title Compute datetime features on-the-fly. Remove them from the on-disk batches. Compute datetime features on-the-fly in nowcasting_dataloader. Remove datetime features from the on-disk batches. Oct 19, 2021
@JackKelly
Copy link
Member Author

IIUC, datetime encoding has been implemented in nowcasting_dataloader by Jacob in PR openclimatefix/nowcasting_dataloader#7

All that's left to do is to remove sin and cos datetime features from nowcasting_dataset :)

@jacobbieker
Copy link
Member

Yeah, once that is merged, it'll compute the datetime features as part of the position encoding

Repository owner moved this from Todo to Done in Nowcasting Nov 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
data New data source or feature; or modification of existing data source refactoring
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants