New example notebook for auto-imputation aka handle missing values with a simple dataset and full workflow #722

jonsedar · 2024-11-09T07:05:23Z

PR to implement a response to #721

Notebook follows style guide https://docs.pymc.io/en/latest/contributing/jupyter_style.html
PR description contains a link to the relevant issue:
- a tracker one for existing notebooks (tracker issues have the "tracker id" label)
- or a proposal one for new notebooks
Check the notebook is not excluded from any pre-commit check: https://github.com/pymc-devs/pymc-examples/blob/main/.pre-commit-config.yaml

📚 Documentation preview 📚: https://pymc-examples--722.org.readthedocs.build/en/722/

+ ran pre-commit checks etc

review-notebook-app · 2024-11-09T07:05:27Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

+ remove recently deprecated sym kwarg in seaborn boxplot + improved a couple of explanations + reran precommit etc checks

review-notebook-app · 2024-11-09T13:05:46Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2024-11-09T13:05:45Z
----------------------------------------------------------------

Change title to missing covariates,emphasize that unlike other examples (namely the coal mining disaster) in this case we're missing covariates besides/in addition to y. That should make it more discoverable and clear the focus?

jonsedar commented on 2024-11-09T13:30:53Z
----------------------------------------------------------------

Sure, that makes sense - I've updated the text to mention covariates specifically

review-notebook-app · 2024-11-09T13:05:47Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2024-11-09T13:05:46Z
----------------------------------------------------------------

What is the reason behind emphasizing the "numeric"? This should work fine for missing categorical predictors as well?

jonsedar commented on 2024-11-09T13:37:06Z
----------------------------------------------------------------

Interesting point, and possibly my lack of knowledge. I've used hierarchical priors xk_mu on xk_unobserved and assumed that xk is a continuous (and zscored) value. How would you suggest transforming to allow a categorical index? I'm totally open to extending the example to categoricals!

ricardoV94 commented on 2024-11-11T14:31:54Z
----------------------------------------------------------------

No need to extend the example, but if you have a categorical predictor you would conceivably have a pm.Categorical prior on it, and have it partially observed just the same as you did with pm.Normal

jonsedar commented on 2024-11-12T05:38:13Z
----------------------------------------------------------------

Aha, yes nice idea! As you suggest, let's leave this one here, but that seems like a really useful thing to demonstrate, and would be potentially a nice companion to my other new ordinal-features example (#717).

When I get a minute I'll create a new NB (based on this one) that includes Categorical features, and I'll omit the out-of-sample forecast to keep it lean

review-notebook-app · 2024-11-09T13:05:47Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2024-11-09T13:05:47Z
----------------------------------------------------------------

Line #14.    warnings.simplefilter(action="ignore", category=FutureWarning)  # isort:skip

Don't? FutureWarning filter is pretty broad

jonsedar commented on 2024-11-09T14:00:23Z
----------------------------------------------------------------

Haha yeah, good point - I tend to leave it in cos seaborn v12 (which I still use because they massively changed the API in >12) gets really annoying with all the warnings... But there's doesnt seem to be any here after I remove it, so, removed :)

review-notebook-app · 2024-11-09T13:05:48Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2024-11-09T13:05:48Z
----------------------------------------------------------------

Line #11.    # set target_accept quite high to minimise divergences in mdlb

target_accept is not being changed

jonsedar commented on 2024-11-09T13:38:22Z
----------------------------------------------------------------

Good catch, that's a cutpaste from elsewhere

review-notebook-app · 2024-11-09T13:05:49Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2024-11-09T13:05:48Z
----------------------------------------------------------------

This plot_univariate is not that great? How does a histogram/kdeplot look like?

jonsedar commented on 2024-11-09T13:39:35Z
----------------------------------------------------------------

I quite like violins :D I'll make it taller, then it's basically a kdeplot

review-notebook-app · 2024-11-09T13:05:50Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2024-11-09T13:05:49Z
----------------------------------------------------------------

Why is the observed off scale? How is it reasonable?

jonsedar commented on 2024-11-09T13:45:18Z
----------------------------------------------------------------

FWIW I wouldnt expect the prior to be too close, but I've added a clarification

review-notebook-app · 2024-11-09T13:05:50Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2024-11-09T13:05:50Z
----------------------------------------------------------------

Confusing helper name? plot_distribution?

Also should be done before calling pm.sample?

jonsedar commented on 2024-11-09T13:48:20Z
----------------------------------------------------------------

Fair call, I've renamed it to plot_krushke , since that's what it's doing :) FWIW arviz calls it plot_posterior

review-notebook-app · 2024-11-09T13:05:51Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2024-11-09T13:05:50Z
----------------------------------------------------------------

Remove section?

jonsedar commented on 2024-11-09T13:48:54Z
----------------------------------------------------------------

I quite like leaving the placeholder, but sure, it could be confusing - removed!

review-notebook-app · 2024-11-09T13:05:51Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2024-11-09T13:05:51Z
----------------------------------------------------------------

Line #3.    mdl0.set_data("y", dfrawx_holdout[ft_y].values, coords=COORDS_F)

No need to set dummy data for posterior_predictive

jonsedar commented on 2024-11-09T13:49:57Z
----------------------------------------------------------------

Good point, thanks!

review-notebook-app · 2024-11-09T13:05:52Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2024-11-09T13:05:52Z
----------------------------------------------------------------

This picture needs a legend, only the green dot is explained

jonsedar commented on 2024-11-09T13:52:13Z
----------------------------------------------------------------

Legends get hairy with seaborn... I've added more explanation in the title!

review-notebook-app · 2024-11-09T13:05:53Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2024-11-09T13:05:52Z
----------------------------------------------------------------

There is little cost to "rebuilding" the model compared to reusing an existing model for sample_posterior_predictive, because none of the random functions are cached anyway across calls

jonsedar commented on 2024-11-09T14:37:42Z
----------------------------------------------------------------

Thanks for the clarification - I've changed the language to "re-specify" (only a minor concern to my mind because it's not DRY)

review-notebook-app · 2024-11-09T13:05:54Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2024-11-09T13:05:53Z
----------------------------------------------------------------

Line #10.        # NOTE: ... but there's no way to put a nan-containing array into a pm.Data,

This is slightly incorrect. You can put nan but they won't trigger the special behavior that passing a numpy with nan to observed does. And you'll have nan in the logp. Not sure it matters to be precise

jonsedar commented on 2024-11-09T14:07:39Z
----------------------------------------------------------------

FWIW I've found that this isn't possible... e.g. if were to put this line into the model spec (which tries to create a pm.Data with an array that contains nans):

xkk = pm.Data("xkk", dfx_holdout[FTS_XK].values, dims=("oid", "xj_nm"))

I get error:

NotImplementedError: Masked arrays or arrays with nan entries are not supported. Pass them directly to observed if you want to trigger auto-imputation

ricardoV94 commented on 2024-11-11T14:33:30Z
----------------------------------------------------------------

Ahh pm.Data is trying being helpful, pytensor.shared definitely doesn't mind numpy arrays with nan

jonsedar commented on 2024-11-12T04:48:05Z
----------------------------------------------------------------

Ah, interesting - so is it worthwhile for me to drop down to use a pt.shared instead? I'll give that a try

review-notebook-app · 2024-11-09T13:05:54Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2024-11-09T13:05:54Z
----------------------------------------------------------------

Not sure these sort of table outputs are useful? If so, perhaps discuss what can be gathered from them? If not, ommit?

jonsedar commented on 2024-11-09T14:08:59Z
----------------------------------------------------------------

Yeah they can go - I was just trying to show the build-up of the dataframe, but people can introspect if they want

review-notebook-app · 2024-11-09T13:05:55Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2024-11-09T13:05:54Z
----------------------------------------------------------------

Mention multivariate prior for missing x, perhaps link to Junpeng talk? https://discourse.pymc.io/t/partial-missing-multivariate-observation-and-what-to-do-with-them-by-junpeng-lao/6050

jonsedar commented on 2024-11-09T14:20:21Z
----------------------------------------------------------------

Aha yes, I was looking for @junpenglao 's old blogposts on this technique, esp w.r.t the hierarchical prior xk_mu.I'll make reference!

ricardoV94 · 2024-11-09T13:07:22Z

@jonsedar seems like a great addition. I left some comments above. I hope we can make the prediction parts nicer in the future, with a helper that accepts a shared mask.

jonsedar · 2024-11-09T13:21:45Z

Thanks for the review! I'll go through and make changes etc now

jonsedar · 2024-11-09T13:30:54Z

Sure, that makes sense - I've updated the text to mention covariates specifically

New example notebook for auto-imputation aka handle missing values with a simple dataset and full workflow #722

New example notebook for auto-imputation aka handle missing values with a simple dataset and full workflow #722

Conversation

jonsedar commented Nov 9, 2024 • edited Loading

review-notebook-app bot commented Nov 9, 2024

review-notebook-app bot commented Nov 9, 2024 • edited Loading

review-notebook-app bot commented Nov 9, 2024 • edited Loading

review-notebook-app bot commented Nov 9, 2024 • edited Loading

review-notebook-app bot commented Nov 9, 2024 • edited Loading

review-notebook-app bot commented Nov 9, 2024 • edited Loading

review-notebook-app bot commented Nov 9, 2024 • edited Loading

review-notebook-app bot commented Nov 9, 2024 • edited Loading

review-notebook-app bot commented Nov 9, 2024 • edited Loading

review-notebook-app bot commented Nov 9, 2024 • edited Loading

review-notebook-app bot commented Nov 9, 2024 • edited Loading

review-notebook-app bot commented Nov 9, 2024 • edited Loading

review-notebook-app bot commented Nov 9, 2024 • edited Loading

review-notebook-app bot commented Nov 9, 2024 • edited Loading

review-notebook-app bot commented Nov 9, 2024 • edited Loading

ricardoV94 commented Nov 9, 2024

jonsedar commented Nov 9, 2024

jonsedar commented Nov 9, 2024

jonsedar commented Nov 9, 2024

jonsedar commented Nov 9, 2024

jonsedar commented Nov 9, 2024

jonsedar commented Nov 9, 2024

jonsedar commented Nov 9, 2024

jonsedar commented Nov 9, 2024

jonsedar commented Nov 9, 2024

jonsedar commented Nov 9, 2024

jonsedar commented Nov 9, 2024

review-notebook-app bot commented Dec 15, 2024 • edited Loading

review-notebook-app bot commented Dec 15, 2024 • edited Loading

review-notebook-app bot commented Dec 15, 2024 • edited Loading

review-notebook-app bot commented Dec 15, 2024 • edited Loading

review-notebook-app bot commented Dec 15, 2024 • edited Loading

review-notebook-app bot commented Dec 15, 2024 • edited Loading

jonsedar commented Dec 16, 2024

jonsedar commented Dec 16, 2024

jonsedar commented Dec 16, 2024

jonsedar commented Dec 16, 2024

jonsedar commented Dec 16, 2024

jonsedar commented Dec 16, 2024

jonsedar commented Dec 16, 2024

jonsedar commented Dec 16, 2024

jonsedar commented Dec 16, 2024

jonsedar commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Nov 9, 2024 •

edited

Loading

review-notebook-app bot commented Dec 15, 2024 •

edited

Loading

review-notebook-app bot commented Dec 15, 2024 •

edited

Loading

review-notebook-app bot commented Dec 15, 2024 •

edited

Loading

review-notebook-app bot commented Dec 15, 2024 •

edited

Loading

review-notebook-app bot commented Dec 15, 2024 •

edited

Loading

review-notebook-app bot commented Dec 15, 2024 •

edited

Loading