Skip to content

[ENH] Refactor the data_module layer in ptf-v2 #1829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
phoeenniixx opened this issue May 12, 2025 · 6 comments
Open

[ENH] Refactor the data_module layer in ptf-v2 #1829

phoeenniixx opened this issue May 12, 2025 · 6 comments

Comments

@phoeenniixx
Copy link
Contributor

Is your feature request related to a problem? Please describe.
For now the EncoderDecoderTimeSeriesDataModule code is a little messy, that contains a nested dataset class (_ProcessedEncoderDecoderDataset) and preprocessing and splitting functions. It makes the code a little hard to understand and maintain as we move to more complex implementations

Describe the solution you'd like
We should make it more modular:

  • Move the nested _ProcessedEncoderDecoderDataset class out of the data_module class.
  • Create separate modules for preprocessing and splitting etc
  • Call references of these implementations in the data_module class

This will make the code a little more clean and modular and easy to maintain.

Additional context
Please look at how the tsl handles these splitting and preprocessing using different modules

@phoeenniixx
Copy link
Contributor Author

FYI @agobbifbk, @fkiraly , @PranavBhatP

@PranavBhatP
Copy link
Contributor

Hi, @phoeenniixx, after going through the code for the EncoderDecoderDataModule, I did notice that many parts of this module are reusable and can be adapted for usage by other implementations of data modules for v2.0, so I think it's a necessary step to do this early on before we introduce a lot duplication in code.

@fkiraly
Copy link
Collaborator

fkiraly commented May 13, 2025

I did notice that many parts of this module are reusable and can be adapted for usage by other implementations of data modules for v2.0, so I think it's a necessary step to do this early on before we introduce a lot duplication in code.

Interesting. I think it could be instructive if you try to draft this in a hacky way, but then we do not merge this but take this as a study motivating a clean refactor.
Does #1830 already do this, or is it only the case for non-decoder/encoder models?

@fkiraly
Copy link
Collaborator

fkiraly commented May 13, 2025

Another point I noticed why finalizing #1811 is that you the loaders and data modules are not dataclass-like, i.e., some __init__ arguments get mutated through coercions or defaults.

It is general best practice to make specification-like classes also dataclass-like, so I tried to move the processed attributes to private versions, but this causes test failures. So instead, I did that and then I added overwrites in "todo" sections in the __init__ so the code does not break.

My question would be, how much problems does it cause if we try to make the modules and loaders dataclass-like? Is it just a matter of changing tests, or would we be changing a public contract that you had in mind, @phoeenniixx?

@phoeenniixx
Copy link
Contributor Author

My question would be, how much problems does it cause if we try to make the modules and loaders dataclass-like? Is it just a matter of changing tests, or would we be changing a public contract that you had in mind, @phoeenniixx?

yes, ig all we need to do is just make some changes to tests and code and it will become dataclass-like :)

@fkiraly
Copy link
Collaborator

fkiraly commented May 13, 2025

yes, ig all we need to do is just make some changes to tests and code and it will become dataclass-like :)

I prepared it in a way that you just need to delete the "todo" blocks in each class - but I was worried it would break some implicit contract that you had in your head. It certainly breaks the tests.

@fkiraly fkiraly moved this to PR in progress in May - Sep 2025 mentee projects May 15, 2025
@fkiraly fkiraly moved this from PR in progress to In Progress in May - Sep 2025 mentee projects May 15, 2025
fkiraly pushed a commit that referenced this issue May 18, 2025
This PR makes the `data_modulel` dataclass-like
See discussion in #1829
PranavBhatP pushed a commit to PranavBhatP/pytorch-forecasting that referenced this issue May 22, 2025
…1832)

This PR makes the `data_modulel` dataclass-like
See discussion in sktime#1829
PranavBhatP pushed a commit to PranavBhatP/pytorch-forecasting that referenced this issue May 22, 2025
…1832)

This PR makes the `data_modulel` dataclass-like
See discussion in sktime#1829
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants