-
Notifications
You must be signed in to change notification settings - Fork 683
[ENH] Experimental PR- New Dataset Version-B #1791
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @fkiraly, @agobbifbk, the basic vignette here: https://colab.research.google.com/drive/1LS0JFIzHZ2_EbzY19l1Yuqyr9lTN1Jj8?usp=sharing please see if it works as it is meant to be. I haven't still fully understand how future exogenous data is handled in |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1791 +/- ##
=======================================
Coverage ? 86.49%
=======================================
Files ? 47
Lines ? 5548
Branches ? 0
=======================================
Hits ? 4799
Misses ? 749
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Great work thank you!
train_dataloader = data_module.train_dataloader()
sample_batch = next(iter(train_dataloader))
x, y = sample_batch
print(f"Encoder continuous shape: {x['encoder_cont'].shape}")
print(f"Decoder continuous shape: {x['decoder_cont'].shape}")
print(f"Target shape: {y.shape}")
encoder_input_size = x['encoder_cont'].shape[-1]
decoder_input_size = x['decoder_cont'].shape[-1]
model = TimeSeriesLightningModel(encoder_input_size=encoder_input_size, decoder_input_size=decoder_input_size) has still the issue about the dimension. One backup idea is to put all the size in the d2 layer metadata, add the key dictionary to the init of the model and create an instance of it using:
until we define how to link better the d2 and model layer. Really looking forward to seeing the code after these changes! |
Thanks for the comments @agobbifbk!
There were some doubts about where to put label encoders last time ig, @fkiraly can you please comment if those doubts were addressed.
Thanks, I will add it
This implementation was just a test to see if evrything works, I will add them in subsequent commits
Please look at #1805, there I have tried to use metadata for version a, if that works? I will add you suggestions in next commits, Thanks! |
Hi @fkiraly, I have tried to create the For the record: model = TFT(
loss=nn.MSELoss(),
logging_metrics=[MAE(), SMAPE()],
optimizer="adam",
optimizer_params={"lr": 1e-3},
lr_scheduler="reduce_lr_on_plateau",
lr_scheduler_params={"mode": "min", "factor": 0.1, "patience": 10},
hidden_size=64,
num_layers=2,
attention_head_size=4,
dropout=0.1,
cont_feature_size = encoder_cont
cat_feature_size = encoder_cat
static_cat_feature_size = static_categorical_features
static_cont_feature_size = static_continuous_features
max_encoder_length = max_encoder_length
max_prediction_length = max_prediction_length
) the user can initialise the model in this way: model = TFT(
loss=nn.MSELoss(),
logging_metrics=[MAE(), SMAPE()],
optimizer="adam",
optimizer_params={"lr": 1e-3},
lr_scheduler="reduce_lr_on_plateau",
lr_scheduler_params={"mode": "min", "factor": 0.1, "patience": 10},
hidden_size=64,
num_layers=2,
attention_head_size=4,
dropout=0.1,
metadata = data_module.metadata
) as all this information can be inferred from the data inputted by the user and this will make the interface simpler as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of metadata generation as requested:
- great! It seems like we can generate the entire D2 specific metadata from D1 metadata and inputs, right?
- for "cleanness" of the logic, I would suggest to move the entire logic for that into a method
_prepare_metadata
.
Some suggestions regarding documentation:
- The method
_prepare_metadata
optimally also has a docstring that lists the output metadata fields generated and how - since it is a private method, primarily explained to a developer. metadata
already has this for the user, although I would go closer to the numpydoc style withReturns
paragraph that is properly populated
Yes "almost" all keys are taken from D1 metadata, some keys like
Sure then I will call this function in the property |
That is one way - I was thinking about pre-populating at |
this would mean the function is called in
This I think is the best way, for now I will do this. Even if we need it anywhere else as well, I think it can easily be calculated by calling the property, and also from the user perspective, it always has to called |
We are finally moving on with this version of the implementation and the final PRs of the prototype are mentioned below, |
Description
This PR tires to implement another version of
TimeSeries
Dataset and data module wherefuture_data
is merged with the existingx
and is not stored differently and one more tensorcutoff
is added to remember the present time. Usestime_mask
in the data module to differentiate the data into past and future with the help ofcutoff