Skip to content

[ENH] EXPERIMENTAL: Example notebook based on the new data pipeline #1813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 66 commits into
base: main
Choose a base branch
from

Conversation

phoeenniixx
Copy link
Contributor

@phoeenniixx phoeenniixx commented Apr 6, 2025

Description

This PR adds example notebook for the new v2 data pipeline vignette, having the basic implementation of the tft model using this version. For more info see #1812 , #1811

Colab link: https://colab.research.google.com/drive/148MyhcNfYEh4CZ6vBXLqQNsUBF0n6_0v?usp=sharing

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@phoeenniixx
Copy link
Contributor Author

Hi @fkiraly, I am getting this error:
image

I just downloaded the notebook from colab and pasted it in the repo, is there anything else I should do to avoid this? Really have no idea 😅

Copy link

codecov bot commented Apr 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Please upload report for BASE (main@cfc7fc6). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1813   +/-   ##
=======================================
  Coverage        ?   83.14%           
=======================================
  Files           ?       61           
  Lines           ?     6153           
  Branches        ?        0           
=======================================
  Hits            ?     5116           
  Misses          ?     1037           
  Partials        ?        0           
Flag Coverage Δ
cpu 83.14% <ø> (?)
pytest 83.14% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fkiraly fkiraly moved this from PR in progress to PR under review in May - Sep 2025 mentee projects May 19, 2025
@xandie985
Copy link

Hi @phoeenniixx , the implementation looks good and insightful. Here are some questions that I have about the implementation. Its possible that there is difference between the objectives we had with DSIPTS and SKTIME, still I would like to discuss about your pov wrt sktime.

  1. EncoderDecoderTimeSeriesDataModule assumes that the data would fit int he memory. How would your approach scale to datasets that do not fit into RAM? Are there plans to incorporate memory-efficient loading strategies like chunking or on-demand loading from disk?
  2. Does your module have any mechanisms to detect or correct irregular time series within its scope?
  3. Your _create_windows method checks for sufficient sequence length.. how does your module handle missing values within a window, and what are the potential consequences for model training if windows contain significant NaNs?
  4. Adding random seed for reproducibility, especially in setup() where random shuffling takes place.

@phoeenniixx
Copy link
Contributor Author

Thank you for the review @xandie985!

EncoderDecoderTimeSeriesDataModule assumes that the data would fit int he memory. How would your approach scale to datasets that do not fit into RAM? Are there plans to incorporate memory-efficient loading strategies like chunking or on-demand loading from disk?

So right now we assume that data could fit in the memory, but yes, in future we plan to add features like chunking, on-demand loading etc

  1. Does your module have any mechanisms to detect or correct irregular time series within its scope?
  2. Your _create_windows method checks for sufficient sequence length.. how does your module handle missing values within a window, and what are the potential consequences for model training if windows contain significant NaNs?
  3. Adding random seed for reproducibility, especially in setup() where random shuffling takes place.

These are some open questions, we still need to work on - We will tackle these questions in future iterations once an end-to-end prototype is ready and we get some reviews from the users of the package on this prototype.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More detailed review.

  • please remove the install from the start of the notebook
  • we should test that this is running, while we are working on v2. One way is to move the content to docs/examples/tutorials, the contents of which are automatically run an tested.
  • the data generation cell is useful, but not too illustrative. Can you move the code to a function load_toydata or similar, in pytorch_forecasting.data, new module, e.g., toydata? Then we can also use this in testing later!
  • can you add basic markdown cells that explain what the notebook is showing, and what each steps are? E.g., a summary at the top of the multiple steps, and then again small headers for the steps with minimal explanations.

@fkiraly fkiraly added the documentation Improvements or additions to documentation label May 29, 2025
@phoeenniixx
Copy link
Contributor Author

Thanks! I would make the changes accordingly, Just one doubt:

the data generation cell is useful, but not too illustrative. Can you move the code to a function load_toydata or similar, in pytorch_forecasting.data, new module, e.g., toydata? Then we can also use this in testing later!

I think we can add it to pytorch_forecasting.data.examples? Like right now people import get_stallion_data from there, so they can import toydata from there as well. Just think that this would follow an already available mapping of "test data" to examples...

@fkiraly
Copy link
Collaborator

fkiraly commented May 30, 2025

Like right now people import get_stallion_data from there, so they can import toydata from there as well.

Makes sense, to add it to the established location with data loaders.

Would it make sense to split the file up and have on loader per file? Need not be done in this PR.

@phoeenniixx
Copy link
Contributor Author

Would it make sense to split the file up and have on loader per file?

Then I think we need to should create a new folder called loaders or datasets and have these files there, and we can add more loaders to that folder in future

@fkiraly fkiraly moved this from PR under review to PR in progress in May - Sep 2025 mentee projects May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: PR in progress
Development

Successfully merging this pull request may close these issues.

3 participants