-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get plot data for prepostfit experiments #438
base: main
Are you sure you want to change the base?
Conversation
Humble apologies for taking so long to getting to this PR @lpoug. I've unfortunately not had much time to spend on CausalPy as I'd have liked, but hoping to catch up with the backlog. There are currently a couple of issues with the remote checks. I'm hoping to get these resolved in #437, at which point I'll test this out locally and give feedback if necessary before we can merge this :) |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #438 +/- ##
==========================================
- Coverage 94.40% 92.98% -1.43%
==========================================
Files 31 31
Lines 1985 2025 +40
==========================================
+ Hits 1874 1883 +9
- Misses 111 142 +31 ☔ View full report in Codecov by Sentry. |
Absolutely no problem whatsoever @drbenvincent! Let me know when the time comes, I'll be around 😄 |
Hi @lpoug. I pushed some changes, can you make sure to pull the latest version? I'll try to review this in the next few days :) |
Hey there @drbenvincent. Just to be sure, are you waiting on anything on my side? |
Apologies for the delay! Just dropping in some review comments now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the slow review on this. My bad.
Overall this looks good. I've suggested some minor changes. Other than that the main thing is to update the tests to ensure this functionality remains working into the future.
Could you add new tests to test_integration_pymc_examples.py
and test_integration_skl_examples.py
. I imagine we can just test that we successfully get back a dataframe from calling result.get_plot_data
on the experiments that you've implemented so far. You could optionally test that the contents of that dataframe is as expected, e.g. has the desired columns.
In theory an ultra pedantic person might want to test that we get an exception when calling the get_plot_data
on experiments that.
Because this PR involves additional methods, can you run make uml
. This should update the UML diagram that we include in CONTRIBUTING.md
Sorry again about the latency on this review.
causalpy/experiments/base.py
Outdated
raise ValueError("Unsupported model type") | ||
|
||
@abstractmethod | ||
def get_plot_data_bayesian(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to _get_plot_data_bayesian
to emphasise that the preferred user facing method to call is get_plot_data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifications made in 6a6face
I took the liberty of renaming plot_bayesian
and plot_ols
to _plot_bayesian
and _plot_ols
, respectively. I guess this should follow the same logic as for _get_plot_data_bayesian
? Let me know if you think differently and I'll reverse these changes.
causalpy/experiments/base.py
Outdated
raise NotImplementedError("get_plot_data_bayesian method not yet implemented") | ||
|
||
@abstractmethod | ||
def get_plot_data_ols(self, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before... Change to _get_plot_data_ols
to emphasise that the preferred user facing method to call is get_plot_data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifications made in 6a6face as well
elif isinstance(self.model, RegressorMixin): | ||
return self.get_plot_data_ols(*args, **kwargs) | ||
else: | ||
raise ValueError("Unsupported model type") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a good feel whether this should be a ValueError
or a NotImplementedError
. Have a think about whatever works best, but I don't have a strong opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't made any changes here yet. I guess we could argue that the NotImplementedError
makes sense within the _get_plot_data_bayesian
and _get_plot_data_old
because this tells us that these functionalities could be implemented in the future.
In get_plot_data
, the ValueError
could make sense as the function is not technically not implemented.
Happy to discuss or have your final opinion on this in any case!
…ata_ols to _get_plot_data_ols, bayesian_plot to _bayesian_plot, and ols_plot to _ols_plot
No problem at all! I was just starting to get worried about something that still needed to be done on my end 😅 Thank you for the reviews. I've added links to the commits directly in your comments. 6a6face (renaming of functions) Regarding tests, I have added them in 97f0d79 I have not added anything yet to test that we get an exception when calling the Finally, I have updated the diagrams in 0edca77 Let me know if these changes look good or not or if you had anything else in mind! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think we are very nearly there :) Thanks for adding in the tests.
It could be prudent to rename the *_hdi_lower
and *_hdi_upper
columns to include the numerical hdi_prob
as a percentage. For example if hdi_prob=0.8
then the columns could be labelled as *_hdi_lower_80
and *_hdi_upper_80
. That way there is much less scope to make a mistake like generating 80% HDI's but forgetting that and thinking that you generated 95% HDI's for example. I think that should be pretty simple to do in _get_plot_data_bayesian
or _get_plot_data_ols
.
Should also be pretty simple to resolve the merge conflicts - it's just the updated uml images as far as I can see.
Could you also update the docstring of the tests? At the moment they list out what is tested, so you can just flag up that it tests the functionality of plot_data
. I'm not sure we'll carry on doing that in the future if the number of tests gets large, but let's keep up with it for the moment.
…ted tests accordingly, and updated tests' docstring
) | ||
expected_columns = [ | ||
"prediction", | ||
"pred_hdi_lower_94", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it ok here to have fixed names with the default value of hdi_prob (i.e. 0.94)?
.get_plot_data()
functionNote: First time doing a proper PR so not sure if all pre-requisites are here.
📚 Documentation preview 📚: https://causalpy--438.org.readthedocs.build/en/438/