Skip to content

Truncated/Censored regression example #159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

drbenvincent
Copy link
Contributor

Updating this truncated/censored regression example notebook to follow best practice, as fas as I can tell. See #90. Let me know if there's anything else that needs to be done.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented May 12, 2021

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2021-05-12T06:38:10Z
----------------------------------------------------------------

these should be sigma instead of sd. 


drbenvincent commented on 2021-05-13T09:34:05Z
----------------------------------------------------------------

done

@review-notebook-app
Copy link

review-notebook-app bot commented May 12, 2021

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2021-05-12T06:38:10Z
----------------------------------------------------------------

sd->sigma


drbenvincent commented on 2021-05-13T09:34:14Z
----------------------------------------------------------------

done

@review-notebook-app
Copy link

review-notebook-app bot commented May 12, 2021

View / edit / reply to this conversation on ReviewNB

twiecki commented on 2021-05-12T06:38:11Z
----------------------------------------------------------------

This is really the key section where readers could learn a lot with some intuitive explanations of what's happening with the CDF and the potential. So a paragraph or two, and maybe a plot on what the CDF does would be immensly helpful here.


drbenvincent commented on 2021-05-13T09:34:31Z
----------------------------------------------------------------

Agreed. Will work on this

drbenvincent commented on 2021-05-13T10:34:09Z
----------------------------------------------------------------

I've had a stab at this. I think it's an improvement. Let me know if there are any tweaks that would improve it.

Copy link
Contributor Author

done


View entire conversation on ReviewNB

Copy link
Contributor Author

done


View entire conversation on ReviewNB

Copy link
Contributor Author

Agreed. Will work on this


View entire conversation on ReviewNB

Copy link
Contributor Author

I've had a stab at this. I think it's an improvement. Let me know if there are any tweaks that would improve it.


View entire conversation on ReviewNB

@review-notebook-app
Copy link

review-notebook-app bot commented May 22, 2021

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-05-22T10:24:40Z
----------------------------------------------------------------

Try changing:

xi = np.array([np.min(x), np.max(x)])
n_samples = 1000
for n in range(n_samples):
    slope_sample = fit.posterior["slope"].values[1, n]
    intercept_sample = fit.posterior["intercept"].values[1, n]
    y_ppc = xi * slope_sample + intercept_sample
    ax.plot(xi, y_ppc, c="steelblue", alpha=0.01, rasterized=True)

for

xi = xr.DataArray(np.array([np.min(x), np.max(x)]), dims=["obs_id"])
post = fit.posterior
y_ppc = xi * post["slope"] + post["intercept"]
ax.plot(xi, y_ppc.stack(sample=("chain", "draw"), c="steelblue", alpha=0.01, rasterized=True)

drbenvincent commented on 2021-05-22T12:23:12Z
----------------------------------------------------------------

I've implemented this. I do find it more opaque, but that's probably just because I'm way less experienced with xarray

OriolAbril commented on 2021-05-22T12:51:50Z
----------------------------------------------------------------

Would keeping the for loop to plot the multiple lines be a bit more clear? It is not too well known that matplotlib allows multiple lines to be plotted by giving it 2d arrays. I do want to compute y_ppc with xarray because it doesn't require knowing that [1, n] is selecting the (n+1)th draw from the 2nd chain and shows how to do postprocessing with xarray. In this case we are only using the y_ppc to plot, but if computing complicated deterministics, doing poststratification... using xarray has many clear advantages even allowing to work with arrays that don't fit in memory with dask without changing any or much of the code.

Also, as a general rule for xarray/ArviZ related code, if something depends on the order of the dimensions, it needs to be done very carefully and for a reason. Thus, if we have xarray objects that have a chain and draw dimensions but these are not the 1st and 2nd dims respectively, all ArviZ functions should still work.

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good, thanks! The issue is that it looks like you took the original branch where you added the example and this resulted in merge conflicts. These need to be fixed before we can merge.

Copy link
Contributor Author

I've implemented this. I do find it more opaque, but that's probably just because I'm way less experienced with xarray


View entire conversation on ReviewNB

@drbenvincent
Copy link
Contributor Author

Changes look good, thanks! The issue is that it looks like you took the original branch where you added the example and this resulted in merge conflicts. These need to be fixed before we can merge.

Am a bit rusty with command line github, but have followed instructions here https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line I can't actually see any merge conflicts with my local file. I think the simplest approach will be to close this pull request and I'll make another one from a fresh branch?

@OriolAbril
Copy link
Member

Am a bit rusty with command line github, but have followed instructions here https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/addressing-merge-conflicts/resolving-a-merge-conflict-using-the-command-line I can't actually see any merge conflicts with my local file. I think the simplest approach will be to close this pull request and I'll make another one from a fresh branch?

It' probably easier to fix this way yes, notebooks and git diffs are not a good combination

Copy link
Member

Would keeping the for loop to plot the multiple lines be a bit more clear? It is not too well known that matplotlib allows multiple lines to be plotted by giving it 2d arrays. I do want to compute y_ppc with xarray because it doesn't require knowing that [1, n] is selecting the (n+1)th draw from the 2nd chain and shows how to do postprocessing with xarray. In this case we are only using the y_ppc to plot, but if computing complicated deterministics, doing poststratification... using xarray has many clear advantages even allowing to work with arrays that don't fit in memory with dask without changing any or much of the code.

Also, as a general rule for xarray/ArviZ related code, if something depends on the order of the dimensions, it needs to be done very carefully and for a reason. Thus, if we have xarray objects that have a chain and draw dimensions but these are not the 1st and 2nd dims respectively, all ArviZ functions should still work.


View entire conversation on ReviewNB

@drbenvincent drbenvincent deleted the truncated-regression-example branch February 8, 2022 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants