-
Notifications
You must be signed in to change notification settings - Fork 676
[ENH] EXPERIMENTAL PR: D1 and D2 layer for v2 refactor #1811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
FYI @fkiraly @agobbifbk, I have removed the imputation as I just found one thing related to handling missing data: Here it says, the nans should be handled before-hand |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1811 +/- ##
=======================================
Coverage ? 87.21%
=======================================
Files ? 47
Lines ? 5553
Branches ? 0
=======================================
Hits ? 4843
Misses ? 710
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Hello @phoeenniixx! I will try to put here some comments
We are almost done (label encoder still missing in the D1). May I suggest to create a dummy pandas data frame (with at least two groups) and check if the code is behaving as expected? If you have any blocker or question, please fell free to ask! |
Thanks for the comments @agobbifbk, I will reply all of them shortly, here are some replies:
I have removed the imputation from D2 layer as discussed, the only nan fills are in D1 layer
yes the
Because we agreed on calling the |
In my opinion imputing is something not correlated with the D1 layer, or, if you want, you can set a flag (default False or put a warning when it is triggered) and just skip the samples that contains NaNs
I see, but in this way you are copying 3 times the same object, where only the windows is changed. In my opinion with minimal effort you can use the same get_item for the three DataLoaders without duplicating possibly large dataframes. Happy to continue the discussion also during the technical meeting, if needed we can arrange one in the next days! |
Yes, please! IF possible, we can meet atleast once this week before my exams starts, because I think I need a discussion to understand all your points, and I will be able to learn new aspects which can be helpful for me to carry on the work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main comment: I think we need to start adding tests now, otherwise the devlopment work is getting too complex.
If you have time before the exams, that is what I would focus on - add tests in all three PR, and ensure they are stacked (keep them updated).
Hi @fkiraly, @agobbifbk , I have added the tests for D1, D2 layer (still to make changes suggested by @agobbifbk ), but while working on the tests, I realized that the Also is there any other way (better way) than using windows here? (for now I have commented out the tests checking this train-val splits) |
Description
This PR implements the basic skeleton of D1 and D2 layer for v2