Skip to content

[WIP] Proposed revision to sample_posterior_predictive() #3468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 21, 2019

Conversation

rpgoldman
Copy link
Contributor

Most other sampling functions take varnames, rather than var objects. Extend the set of parameters to accept varnames as an alternative to vars, preserving backwards compatibility.
Also revise the docstring, to clarify the return type and add type comments.

Copy link
Contributor

@lucianopaz lucianopaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I like that this unifies the API consistently. IMHO, I would have left vars and deprecated varnames. However, I would like to hear what others think about this. Maybe varnames is safer because you ensure that the tensors and RVs are part of the model, whereas, varscould hold anything out of the model's context.

Finally, if this goes through, you should add a line at the end of the deprecations section of release-notes.

@lucianopaz
Copy link
Contributor

I think that this change is nice but I think we should discuss two things:

  1. Whether we want to change the function parameters to make the API more consistent.
  2. Whether to add a very small test environment for python 3.5, so we ensure that we build and at least sample correctly in that version of python. After Python 3.5 support? #3465 and the variables annotations problems, I think that we need this.

@twiecki, @junpenglao, @ericmjl, @ColCarroll

@rpgoldman
Copy link
Contributor Author

The deprecation warning was too aggressive. Fixing that now.

raise Exception("Should not specify both vars and varnames arguments.")
vars = (model[x] for x in varnames)
else:
raise DeprecationWarning("vars argument is deprecated in favor of varnames.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the warnings package instead of raising the DeprecationWarning. Also, this warning should be raised if varnames is None and vars is not None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got this now, I believe. Waiting for the tests to complete.
There's enough uninteresting history on this topic branch that it's probably worth squash-merging it, or I could clean up the history. Need to rebase it, anyway.

@fonnesbeck
Copy link
Member

Its not clear (to me) that these need to be unified. Sampling takes place in a model context, so it makes sense to pass the PyMC variables directly. Whereas, plotting occurs outside the model context, so the variables are not generally available. To me, there is a clear line between the two, because they are associated with distinct tasks in different phases of the modeling workflow.

@rpgoldman
Copy link
Contributor Author

Its not clear (to me) that these need to be unified. Sampling takes place in a model context, so it makes sense to pass the PyMC variables directly. Whereas, plotting occurs outside the model context, so the variables are not generally available. To me, there is a clear line between the two, because they are associated with distinct tasks in different phases of the modeling workflow.

That may be true, but in addition to being confusing, and having the potential for error that @lucianopaz points out, there's also the issue that since the user is typically working with variable names, they just end up having to write code that looks like:

vars=(model[x] for x in name_list)

...at least that has been my experience using this API (more accurately, I do that after I have encountered the wrong argument error! 😜). This is inconvenient and potentially error-prone.

Typically I write a function that builds the model and I don't "hang onto" the variable objects, and it's easier to work with the variable names.

@@ -24,6 +27,8 @@
import sys
sys.setrecursionlimit(10000)

assert Any # ugly way to make pyflakes ignore "unused" Any import
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to just not import Any. It's only used in the commented variable annotation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have found a way to import this symbol only during type checking, which seems to make pylint and mypy both happy.
Rebased and force-pushed this change for review.

@lucianopaz
Copy link
Contributor

@fonnesbeck, sampling from the posterior predictive should also be done within the model's context. Once we get the sample, we can plot outside the context. I'm also hesitant of making this change so I labeled it for discussion.

I now think that using names is safer because that way, we ensure that the variables are defined in the model. If we accept tensors, the users could mix-up tensors that come from two separate models, and could make a strange mess.

@junpenglao
Copy link
Member

I agree with @lucianopaz and this sounds like a good idea to me in general. However, we might want to separate the type hint part with the actual PR? Or is it the plan now to do this for each PR?

You should also do the same change for sample_posterior_predictive_w
https://github.com/pymc-devs/pymc3/blob/master/pymc3/sampling.py#L1114

@rpgoldman rpgoldman force-pushed the ppc-params branch 2 times, most recently from a98054f to 7effb76 Compare May 8, 2019 13:56
@rpgoldman
Copy link
Contributor Author

rpgoldman commented May 8, 2019

I agree with @lucianopaz and this sounds like a good idea to me in general. However, we might want to separate the type hint part with the actual PR? Or is it the plan now to do this for each PR?

I see your point about the type hint part, but separating out the type hints will be quite burdensome (it's not something diff is good at). Perhaps someone who has a better diff tool could split out the type hints as part of squash-merging this PR?

I use type hints both to provide valuable checking, and as a tool in figuring out PyMC3, which is a very complex code base; they have become an integral part of my Python coding process.

You should also do the same change for sample_posterior_predictive_w

But this function doesn't have a vars argument at all, so I'm not sure what you would like to have done to it. I'm happy to make whatever changes, but I need some more direction.

@junpenglao
Copy link
Member

oh right sample_posterior_predictive_w only works for observed variable - never mind :-)

@rpgoldman
Copy link
Contributor Author

Integrated the history, with blind alleys, into a single commit.
Retained the history with tag.

Copy link
Contributor

@lucianopaz lucianopaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still need to add this change to the bottom of the deprecations section of the release notes

@lucianopaz
Copy link
Contributor

Once #3470 gets sorted out and merged, you'll have to rebase this PR to get the minimum syntax check of python3.5 working

@rpgoldman
Copy link
Contributor Author

Once #3470 gets sorted out and merged, you'll have to rebase this PR to get the minimum syntax check of python3.5 working

I subscribed to that PR. When I hear that it's merged, I will rebase and someone can merge this and close it out.

@rpgoldman
Copy link
Contributor Author

You still need to add this change to the bottom of the deprecations section of the release notes

Just did that and pushed.

Copy link
Contributor

@lucianopaz lucianopaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks fine now and could be merged. Do most of us agree that it's better to use varnames vs vars? I am in favor of varnames but I would like know the rest of us approved this change. @twiecki, @fonnesbeck, @junpenglao, @ColCarroll, and anyone else care to give their final opinion?

@twiecki
Copy link
Member

twiecki commented May 11, 2019 via email

@lucianopaz
Copy link
Contributor

For now it's still mostly inconsistent but we made the switch to var_names in the plots. We can do it here too. Also, I just checked sample_prior_predictive and that uses vars too. It should be dealt with just like in sample_posterior_predictive

@twiecki
Copy link
Member

twiecki commented May 11, 2019 via email

@rpgoldman
Copy link
Contributor Author

For now it's still mostly inconsistent but we made the switch to var_names in the plots. We can do it here too. Also, I just checked sample_prior_predictive and that uses vars too. It should be dealt with just like in sample_posterior_predictive

I have pushed the changes to sample_prior_predictive(). I am waiting to make sure they pass the tests. If so, I will do the varnames -> var_names change, and then I hope we can do the merge.

@rpgoldman
Copy link
Contributor Author

OK, I think that this is all ready to go now. varnames have become var_names. If the tests all pass, I will make sure it's properly rebased and squash it down to a single commit for merging, unless there are any further issues?

Copy link
Member

@ColCarroll ColCarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks generally good! can you tighten up the exceptions before merge,though?

pymc3/model.py Outdated
@@ -3,6 +3,7 @@
import itertools
import threading
import warnings
from typing import Optional, Dict, Any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only Optional used here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 087ea2d8ebfaa2e34474eba0ce79f3420b78a4b4

@@ -1070,6 +1083,13 @@ def sample_posterior_predictive(trace, samples=None, model=None, vars=None, size

model = modelcontext(model)

if var_names is not None:
if vars is not None:
raise Exception("Should not specify both vars and var_names arguments.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError for a bad argument here (Exception is too broad, I think)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 5e546b252ec20b5b202037de99f377da91b3356c

@rpgoldman
Copy link
Contributor Author

@ColCarroll @lucianopaz @twiecki I think I've fixed everything requested. If the tests go fine, and you like the current version, I will squash this down (or you can do a squash merge), and I hope it will be done.

Copy link
Member

@ColCarroll ColCarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me once test pass! thanks for sticking with this @rpgoldman!

@@ -1283,8 +1322,9 @@ def sample_prior_predictive(samples=500, model=None, vars=None, random_seed=None
values = draw_values([model[name] for name in names], size=samples)

data = {k: v for k, v in zip(names, values)}
assert data is not None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw asserts are considered bad form (they can be ignored if you use some flag that no one actually uses) -- if data is None: raise AssertionError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 65439179b1abe7d8e14b2dbf01d257034f669e7e

Copy link
Contributor Author

@rpgoldman rpgoldman May 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ColCarroll --
OK, I have no idea why this happened: when I replaced that line per above I now get a bunch of test failures involving failure to find scipy.misc.factorial that look like this:

pymc3/tests/test_distributions.py:500: in check_dlogp
    assert_almost_equal(dlogp(pt), ndlogp(pt), decimal=decimals, err_msg=str(pt))
../../../miniconda3/envs/testenv/lib/python3.6/site-packages/numdifftools/core.py:723: in __call__
    *args, **kwds).squeeze()
../../../miniconda3/envs/testenv/lib/python3.6/site-packages/numdifftools/core.py:666: in __call__
    return super(Jacobian, self).__call__(np.atleast_1d(x), *args, **kwds)
../../../miniconda3/envs/testenv/lib/python3.6/site-packages/numdifftools/core.py:297: in __call__
    results = self._derivative(xi, args, kwds)
../../../miniconda3/envs/testenv/lib/python3.6/site-packages/numdifftools/core.py:663: in _derivative_nonzero_order
    return self._apply_fd_rule(step_ratio, results, steps2)
../../../miniconda3/envs/testenv/lib/python3.6/site-packages/numdifftools/core.py:394: in _apply_fd_rule
    fd_rule = self._get_finite_difference_rule(step_ratio)
../../../miniconda3/envs/testenv/lib/python3.6/site-packages/numdifftools/core.py:378: in _get_finite_difference_rule
    fd_mat = self._fd_matrix(step_ratio, parity, num_terms)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
step_ratio = 2.0, parity = 1, nterms = 1
    @staticmethod
    def _fd_matrix(step_ratio, parity, nterms):
        """
        Return matrix for finite difference and complex step derivation.
    
        Parameters
        ----------
        step_ratio : real scalar
            ratio between steps in unequally spaced difference rule.
        parity : scalar, integer
            0 (one sided, all terms included but zeroth order)
            1 (only odd terms included)
            2 (only even terms included)
            3 (only every 4'th order terms included starting from order 2)
            4 (only every 4'th order terms included starting from order 4)
            5 (only every 4'th order terms included starting from order 1)
            6 (only every 4'th order terms included starting from order 3)
        nterms : scalar, integer
            number of terms
        """
        _assert(0 <= parity <= 6,
                'Parity must be 0, 1, 2, 3, 4, 5 or 6! ({0:d})'.format(parity))
        step = [1, 2, 2, 4, 4, 4, 4][parity]
        inv_sr = 1.0 / step_ratio
        offset = [1, 1, 2, 2, 4, 1, 3][parity]
        c0 = [1.0, 1.0, 1.0, 2.0, 24.0, 1.0, 6.0][parity]
        c = c0 / \
>           misc.factorial(np.arange(offset, step * nterms + offset, step))
E       AttributeError: module 'scipy.misc' has no attribute 'factorial'
../../../miniconda3/envs/testenv/lib/python3.6/site-packages/numdifftools/core.py:330: AttributeError

I do not see these on my laptop -- all the tests pass there. This does not seem to have anything to do with my change to the assert, but what do I know?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my guess is a version of something (numdifftools?) changed under this diff. Comparing versions from the previous travis build and this one should let us know (and figure out if there's a good change). I'll dig into it after work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks right -- I don't think it's anything I did. Is there some way to force a re-run of Travis on master, so that we can see if this failure also happens there (and my little tweak is innocent)?

Copy link
Contributor Author

@rpgoldman rpgoldman May 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ColCarroll numdifftools didn't change, but scipy 1.3.0 was released three days ago (17 May). This looks like an upstream problem with numdifftools.

And yes,factorial was removed from scipy.misc: scipy/scipy@1cde305#diff-0763dd201fcbf9b088f9eb502b4f3d42

AFAICT this is fixed in the version of numdifftools on GitHub (or at least it is different -- factorial comes from scipy.special instead of scipy.misc now) but numdifftools has not seen a release in two years.

When you've got the factorial problem fixed, please squash merge this PR, or LMK if you would prefer me to smash it down to one commit for you to merge. Thanks!

@ColCarroll ColCarroll mentioned this pull request May 21, 2019
@twiecki
Copy link
Member

twiecki commented May 21, 2019

@rpgoldman What's your email?

@rpgoldman
Copy link
Contributor Author

Rebased onto the latest master. If the tests pass now, I will squash this down to one commit so there's a clean history.

@twiecki
Copy link
Member

twiecki commented May 21, 2019 via email

rpgoldman and others added 3 commits May 21, 2019 13:18
Add the option to take varnames (`var_names`), rather than var objects
as parameters.

Extend the set of parameters to accept varnames as an alternative to
vars, preserving backwards compatibility.

Also revise the docstring, to clarify the return type and add type comments.
@rpgoldman
Copy link
Contributor Author

Pushed a cleaned up version of the history. Waiting for the checks, then I hope we can merge and close.

@lucianopaz lucianopaz merged commit 78ce11c into pymc-devs:master May 21, 2019
@lucianopaz
Copy link
Contributor

Thanks @rpgoldman!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants