-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[WIP] Proposed revision to sample_posterior_predictive() #3468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I like that this unifies the API consistently. IMHO, I would have left vars
and deprecated varnames
. However, I would like to hear what others think about this. Maybe varnames
is safer because you ensure that the tensors and RVs are part of the model, whereas, vars
could hold anything out of the model's context.
Finally, if this goes through, you should add a line at the end of the deprecations section of release-notes.
I think that this change is nice but I think we should discuss two things:
|
The deprecation warning was too aggressive. Fixing that now. |
pymc3/sampling.py
Outdated
raise Exception("Should not specify both vars and varnames arguments.") | ||
vars = (model[x] for x in varnames) | ||
else: | ||
raise DeprecationWarning("vars argument is deprecated in favor of varnames.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the warnings package instead of raising the DeprecationWarning
. Also, this warning should be raised if varnames is None
and vars is not None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got this now, I believe. Waiting for the tests to complete.
There's enough uninteresting history on this topic branch that it's probably worth squash-merging it, or I could clean up the history. Need to rebase it, anyway.
Its not clear (to me) that these need to be unified. Sampling takes place in a model context, so it makes sense to pass the PyMC variables directly. Whereas, plotting occurs outside the model context, so the variables are not generally available. To me, there is a clear line between the two, because they are associated with distinct tasks in different phases of the modeling workflow. |
That may be true, but in addition to being confusing, and having the potential for error that @lucianopaz points out, there's also the issue that since the user is typically working with variable names, they just end up having to write code that looks like:
...at least that has been my experience using this API (more accurately, I do that after I have encountered the wrong argument error! 😜). This is inconvenient and potentially error-prone. Typically I write a function that builds the model and I don't "hang onto" the variable objects, and it's easier to work with the variable names. |
pymc3/sampling.py
Outdated
@@ -24,6 +27,8 @@ | |||
import sys | |||
sys.setrecursionlimit(10000) | |||
|
|||
assert Any # ugly way to make pyflakes ignore "unused" Any import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to just not import Any. It's only used in the commented variable annotation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found a way to import this symbol only during type checking, which seems to make pylint and mypy both happy.
Rebased and force-pushed this change for review.
@fonnesbeck, sampling from the posterior predictive should also be done within the model's context. Once we get the sample, we can plot outside the context. I'm also hesitant of making this change so I labeled it for discussion. I now think that using names is safer because that way, we ensure that the variables are defined in the model. If we accept tensors, the users could mix-up tensors that come from two separate models, and could make a strange mess. |
I agree with @lucianopaz and this sounds like a good idea to me in general. However, we might want to separate the type hint part with the actual PR? Or is it the plan now to do this for each PR? You should also do the same change for |
a98054f
to
7effb76
Compare
I see your point about the type hint part, but separating out the type hints will be quite burdensome (it's not something I use type hints both to provide valuable checking, and as a tool in figuring out PyMC3, which is a very complex code base; they have become an integral part of my Python coding process.
But this function doesn't have a |
oh right sample_posterior_predictive_w only works for observed variable - never mind :-) |
Integrated the history, with blind alleys, into a single commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still need to add this change to the bottom of the deprecations section of the release notes
Once #3470 gets sorted out and merged, you'll have to rebase this PR to get the minimum syntax check of python3.5 working |
I subscribed to that PR. When I hear that it's merged, I will rebase and someone can merge this and close it out. |
Just did that and pushed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks fine now and could be merged. Do most of us agree that it's better to use varnames
vs vars
? I am in favor of varnames
but I would like know the rest of us approved this change. @twiecki, @fonnesbeck, @junpenglao, @ColCarroll, and anyone else care to give their final opinion?
I approve, it's more consistent. But didn't we switched to var_names?
…On Sat, May 11, 2019, 07:50 Luciano Paz ***@***.***> wrote:
***@***.**** approved this pull request.
The PR looks fine now and could be merged. Do most of us agree that it's
better to use varnames vs vars? I am in favor of varnames but I would
like know the rest of us approved this change. @twiecki
<https://github.com/twiecki>, @fonnesbeck <https://github.com/fonnesbeck>,
@junpenglao <https://github.com/junpenglao>, @ColCarroll
<https://github.com/ColCarroll>, and anyone else care to give their final
opinion?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3468 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFETGBIA6KPCQGHXJJOM3TPUZNBJANCNFSM4HLCBZAA>
.
|
For now it's still mostly inconsistent but we made the switch to |
Yeah let's move everything to var_names.
…On Sat, May 11, 2019, 08:12 Luciano Paz ***@***.***> wrote:
For now it's still mostly inconsistent but we made the switch to var_names
in the plots. We can do it here too. Also, I just checked
sample_prior_predictive and that uses vars too. It should be dealt with
just like in sample_posterior_predictive
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3468 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAFETGBIZQRYRYBUJQNT4HLPUZPWVANCNFSM4HLCBZAA>
.
|
I have pushed the changes to sample_prior_predictive(). I am waiting to make sure they pass the tests. If so, I will do the |
OK, I think that this is all ready to go now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks generally good! can you tighten up the exceptions before merge,though?
pymc3/model.py
Outdated
@@ -3,6 +3,7 @@ | |||
import itertools | |||
import threading | |||
import warnings | |||
from typing import Optional, Dict, Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only Optional
used here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 087ea2d8ebfaa2e34474eba0ce79f3420b78a4b4
pymc3/sampling.py
Outdated
@@ -1070,6 +1083,13 @@ def sample_posterior_predictive(trace, samples=None, model=None, vars=None, size | |||
|
|||
model = modelcontext(model) | |||
|
|||
if var_names is not None: | |||
if vars is not None: | |||
raise Exception("Should not specify both vars and var_names arguments.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ValueError
for a bad argument here (Exception
is too broad, I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See 5e546b252ec20b5b202037de99f377da91b3356c
@ColCarroll @lucianopaz @twiecki I think I've fixed everything requested. If the tests go fine, and you like the current version, I will squash this down (or you can do a squash merge), and I hope it will be done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me once test pass! thanks for sticking with this @rpgoldman!
pymc3/sampling.py
Outdated
@@ -1283,8 +1322,9 @@ def sample_prior_predictive(samples=500, model=None, vars=None, random_seed=None | |||
values = draw_values([model[name] for name in names], size=samples) | |||
|
|||
data = {k: v for k, v in zip(names, values)} | |||
assert data is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raw assert
s are considered bad form (they can be ignored if you use some flag that no one actually uses) -- if data is None: raise AssertionError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 65439179b1abe7d8e14b2dbf01d257034f669e7e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ColCarroll --
OK, I have no idea why this happened: when I replaced that line per above I now get a bunch of test failures involving failure to find scipy.misc.factorial
that look like this:
pymc3/tests/test_distributions.py:500: in check_dlogp
assert_almost_equal(dlogp(pt), ndlogp(pt), decimal=decimals, err_msg=str(pt))
../../../miniconda3/envs/testenv/lib/python3.6/site-packages/numdifftools/core.py:723: in __call__
*args, **kwds).squeeze()
../../../miniconda3/envs/testenv/lib/python3.6/site-packages/numdifftools/core.py:666: in __call__
return super(Jacobian, self).__call__(np.atleast_1d(x), *args, **kwds)
../../../miniconda3/envs/testenv/lib/python3.6/site-packages/numdifftools/core.py:297: in __call__
results = self._derivative(xi, args, kwds)
../../../miniconda3/envs/testenv/lib/python3.6/site-packages/numdifftools/core.py:663: in _derivative_nonzero_order
return self._apply_fd_rule(step_ratio, results, steps2)
../../../miniconda3/envs/testenv/lib/python3.6/site-packages/numdifftools/core.py:394: in _apply_fd_rule
fd_rule = self._get_finite_difference_rule(step_ratio)
../../../miniconda3/envs/testenv/lib/python3.6/site-packages/numdifftools/core.py:378: in _get_finite_difference_rule
fd_mat = self._fd_matrix(step_ratio, parity, num_terms)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
step_ratio = 2.0, parity = 1, nterms = 1
@staticmethod
def _fd_matrix(step_ratio, parity, nterms):
"""
Return matrix for finite difference and complex step derivation.
Parameters
----------
step_ratio : real scalar
ratio between steps in unequally spaced difference rule.
parity : scalar, integer
0 (one sided, all terms included but zeroth order)
1 (only odd terms included)
2 (only even terms included)
3 (only every 4'th order terms included starting from order 2)
4 (only every 4'th order terms included starting from order 4)
5 (only every 4'th order terms included starting from order 1)
6 (only every 4'th order terms included starting from order 3)
nterms : scalar, integer
number of terms
"""
_assert(0 <= parity <= 6,
'Parity must be 0, 1, 2, 3, 4, 5 or 6! ({0:d})'.format(parity))
step = [1, 2, 2, 4, 4, 4, 4][parity]
inv_sr = 1.0 / step_ratio
offset = [1, 1, 2, 2, 4, 1, 3][parity]
c0 = [1.0, 1.0, 1.0, 2.0, 24.0, 1.0, 6.0][parity]
c = c0 / \
> misc.factorial(np.arange(offset, step * nterms + offset, step))
E AttributeError: module 'scipy.misc' has no attribute 'factorial'
../../../miniconda3/envs/testenv/lib/python3.6/site-packages/numdifftools/core.py:330: AttributeError
I do not see these on my laptop -- all the tests pass there. This does not seem to have anything to do with my change to the assert
, but what do I know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my guess is a version of something (numdifftools
?) changed under this diff. Comparing versions from the previous travis build and this one should let us know (and figure out if there's a good change). I'll dig into it after work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks right -- I don't think it's anything I did. Is there some way to force a re-run of Travis on master
, so that we can see if this failure also happens there (and my little tweak is innocent)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ColCarroll numdifftools
didn't change, but scipy
1.3.0 was released three days ago (17 May). This looks like an upstream problem with numdifftools
.
And yes,factorial
was removed from scipy.misc
: scipy/scipy@1cde305#diff-0763dd201fcbf9b088f9eb502b4f3d42
AFAICT this is fixed in the version of numdifftools
on GitHub (or at least it is different -- factorial comes from scipy.special
instead of scipy.misc
now) but numdifftools
has not seen a release in two years.
When you've got the factorial problem fixed, please squash merge this PR, or LMK if you would prefer me to smash it down to one commit for you to merge. Thanks!
@rpgoldman What's your email? |
Rebased onto the latest master. If the tests pass now, I will squash this down to one commit so there's a clean history. |
We can also just squash merge here.
…On Tue, May 21, 2019, 17:31 rpgoldman ***@***.***> wrote:
Rebased onto the latest master. If the tests pass now, I will squash this
down to one commit so there's a clean history.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3468>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFETGDGNESNL56V7UZ7TU3PWQIVHANCNFSM4HLCBZAA>
.
|
Add the option to take varnames (`var_names`), rather than var objects as parameters. Extend the set of parameters to accept varnames as an alternative to vars, preserving backwards compatibility. Also revise the docstring, to clarify the return type and add type comments.
Pushed a cleaned up version of the history. Waiting for the checks, then I hope we can merge and close. |
Thanks @rpgoldman! |
Most other sampling functions take varnames, rather than var objects. Extend the set of parameters to accept varnames as an alternative to vars, preserving backwards compatibility.
Also revise the docstring, to clarify the return type and add type comments.