Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get plot data for prepostfit experiments #438

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

lpoug
Copy link

@lpoug lpoug commented Feb 7, 2025


Note: First time doing a proper PR so not sure if all pre-requisites are here.


📚 Documentation preview 📚: https://causalpy--438.org.readthedocs.build/en/438/

@lpoug lpoug marked this pull request as ready for review February 17, 2025 08:44
@drbenvincent
Copy link
Collaborator

Humble apologies for taking so long to getting to this PR @lpoug. I've unfortunately not had much time to spend on CausalPy as I'd have liked, but hoping to catch up with the backlog.

There are currently a couple of issues with the remote checks. I'm hoping to get these resolved in #437, at which point I'll test this out locally and give feedback if necessary before we can merge this :)

Copy link

codecov bot commented Feb 27, 2025

Codecov Report

Attention: Patch coverage is 24.39024% with 31 lines in your changes missing coverage. Please review.

Project coverage is 92.98%. Comparing base (3a7ddea) to head (6dca884).

Files with missing lines Patch % Lines
causalpy/experiments/prepostfit.py 12.00% 22 Missing ⚠️
causalpy/experiments/base.py 46.15% 7 Missing ⚠️
causalpy/plot_utils.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #438      +/-   ##
==========================================
- Coverage   94.40%   92.98%   -1.43%     
==========================================
  Files          31       31              
  Lines        1985     2025      +40     
==========================================
+ Hits         1874     1883       +9     
- Misses        111      142      +31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lpoug
Copy link
Author

lpoug commented Mar 3, 2025

Humble apologies for taking so long to getting to this PR @lpoug. I've unfortunately not had much time to spend on CausalPy as I'd have liked, but hoping to catch up with the backlog.

There are currently a couple of issues with the remote checks. I'm hoping to get these resolved in #437, at which point I'll test this out locally and give feedback if necessary before we can merge this :)

Absolutely no problem whatsoever @drbenvincent! Let me know when the time comes, I'll be around 😄

@drbenvincent drbenvincent added the enhancement New feature or request label Mar 3, 2025
@drbenvincent
Copy link
Collaborator

Hi @lpoug. I pushed some changes, can you make sure to pull the latest version?

I'll try to review this in the next few days :)

@lpoug
Copy link
Author

lpoug commented Apr 1, 2025

Hey there @drbenvincent. Just to be sure, are you waiting on anything on my side?

@drbenvincent
Copy link
Collaborator

Apologies for the delay! Just dropping in some review comments now

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the slow review on this. My bad.

Overall this looks good. I've suggested some minor changes. Other than that the main thing is to update the tests to ensure this functionality remains working into the future.

Could you add new tests to test_integration_pymc_examples.py and test_integration_skl_examples.py. I imagine we can just test that we successfully get back a dataframe from calling result.get_plot_data on the experiments that you've implemented so far. You could optionally test that the contents of that dataframe is as expected, e.g. has the desired columns.

In theory an ultra pedantic person might want to test that we get an exception when calling the get_plot_data on experiments that.

Because this PR involves additional methods, can you run make uml. This should update the UML diagram that we include in CONTRIBUTING.md

Sorry again about the latency on this review.

raise ValueError("Unsupported model type")

@abstractmethod
def get_plot_data_bayesian(self, *args, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to _get_plot_data_bayesian to emphasise that the preferred user facing method to call is get_plot_data

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifications made in 6a6face

I took the liberty of renaming plot_bayesian and plot_ols to _plot_bayesian and _plot_ols, respectively. I guess this should follow the same logic as for _get_plot_data_bayesian? Let me know if you think differently and I'll reverse these changes.

raise NotImplementedError("get_plot_data_bayesian method not yet implemented")

@abstractmethod
def get_plot_data_ols(self, *args, **kwargs):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before... Change to _get_plot_data_ols to emphasise that the preferred user facing method to call is get_plot_data

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifications made in 6a6face as well

elif isinstance(self.model, RegressorMixin):
return self.get_plot_data_ols(*args, **kwargs)
else:
raise ValueError("Unsupported model type")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a good feel whether this should be a ValueError or a NotImplementedError. Have a think about whatever works best, but I don't have a strong opinion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't made any changes here yet. I guess we could argue that the NotImplementedError makes sense within the _get_plot_data_bayesian and _get_plot_data_old because this tells us that these functionalities could be implemented in the future.
In get_plot_data, the ValueError could make sense as the function is not technically not implemented.
Happy to discuss or have your final opinion on this in any case!

@lpoug
Copy link
Author

lpoug commented Apr 7, 2025

Sorry about the slow review on this. My bad.

Overall this looks good. I've suggested some minor changes. Other than that the main thing is to update the tests to ensure this functionality remains working into the future.

Could you add new tests to test_integration_pymc_examples.py and test_integration_skl_examples.py. I imagine we can just test that we successfully get back a dataframe from calling result.get_plot_data on the experiments that you've implemented so far. You could optionally test that the contents of that dataframe is as expected, e.g. has the desired columns.

In theory an ultra pedantic person might want to test that we get an exception when calling the get_plot_data on experiments that.

Because this PR involves additional methods, can you run make uml. This should update the UML diagram that we include in CONTRIBUTING.md

Sorry again about the latency on this review.

No problem at all! I was just starting to get worried about something that still needed to be done on my end 😅

Thank you for the reviews. I've added links to the commits directly in your comments. 6a6face (renaming of functions)

Regarding tests, I have added them in 97f0d79

I have not added anything yet to test that we get an exception when calling the get_plot_data on experiments for which it is not implemented. I'll try to take a moment to think about how to do so precisely.

Finally, I have updated the diagrams in 0edca77

Let me know if these changes look good or not or if you had anything else in mind!

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think we are very nearly there :) Thanks for adding in the tests.

It could be prudent to rename the *_hdi_lower and *_hdi_upper columns to include the numerical hdi_prob as a percentage. For example if hdi_prob=0.8 then the columns could be labelled as *_hdi_lower_80 and *_hdi_upper_80. That way there is much less scope to make a mistake like generating 80% HDI's but forgetting that and thinking that you generated 95% HDI's for example. I think that should be pretty simple to do in _get_plot_data_bayesian or _get_plot_data_ols.

Should also be pretty simple to resolve the merge conflicts - it's just the updated uml images as far as I can see.

Could you also update the docstring of the tests? At the moment they list out what is tested, so you can just flag up that it tests the functionality of plot_data. I'm not sure we'll carry on doing that in the future if the number of tests gets large, but let's keep up with it for the moment.

…ted tests accordingly, and updated tests' docstring
)
expected_columns = [
"prediction",
"pred_hdi_lower_94",
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it ok here to have fixed names with the default value of hdi_prob (i.e. 0.94)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get model results data
2 participants