Skip to content

Recursion error pickling step functions #4273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rpgoldman opened this issue Nov 29, 2020 · 7 comments · Fixed by #4297
Closed

Recursion error pickling step functions #4273

rpgoldman opened this issue Nov 29, 2020 · 7 comments · Fixed by #4297
Assignees
Labels
bug winOS windows OS related
Milestone

Comments

@rpgoldman
Copy link
Contributor

Description of your problem

First reported on #3811

When testing on Mac (#3811) or Windows (#4269), RecursionError: maximum recursion depth exceeded in comparison errors in TestDensityDist in parallel_sampling.py when pickling step_method:

pymc3/parallel_sampling.py:421: in __init__
    step_method_pickled = pickle.dumps(step_method, protocol=-1)

Checking with pdb, we try to save the logp of a DensityDist. This leads to needing to pickle Normal.logp which leads to needing to pickle pymc3.distributions.continuous.Normal and I fear to trying to pickle all of pymc3.

It is odd that this shows up on Mac and Windows, but not on our Linux tests.

Please provide the full traceback.

From @MarcoGorelli :

2020-11-29T10:28:07.9704940Z =================================== FAILURES ===================================
2020-11-29T10:28:07.9705530Z _________ TestDensityDist.test_density_dist_with_random_sampleable[()] _________
2020-11-29T10:28:07.9705980Z 
2020-11-29T10:28:07.9706630Z self = <pymc3.tests.test_distributions_random.TestDensityDist object at 0x7fe5ab3d6358>
2020-11-29T10:28:07.9707300Z shape = ()
2020-11-29T10:28:07.9708190Z 
2020-11-29T10:28:07.9708650Z     @pytest.mark.parametrize("shape", [(), (3,), (3, 2)], ids=str)
2020-11-29T10:28:07.9709320Z     def test_density_dist_with_random_sampleable(self, shape):
2020-11-29T10:28:07.9709830Z         with pm.Model() as model:
2020-11-29T10:28:07.9710220Z             mu = pm.Normal("mu", 0, 1)
2020-11-29T10:28:07.9710690Z             normal_dist = pm.Normal.dist(mu, 1, shape=shape)
2020-11-29T10:28:07.9711190Z             obs = pm.DensityDist(
2020-11-29T10:28:07.9711580Z                 "density_dist",
2020-11-29T10:28:07.9711960Z                 normal_dist.logp,
2020-11-29T10:28:07.9712420Z                 observed=np.random.randn(100, *shape),
2020-11-29T10:28:07.9712860Z                 shape=shape,
2020-11-29T10:28:07.9713260Z                 random=normal_dist.random,
2020-11-29T10:28:07.9713610Z             )
2020-11-29T10:28:07.9714330Z >           trace = pm.sample(100)
2020-11-29T10:28:07.9714630Z 
2020-11-29T10:28:07.9722330Z pymc3/tests/test_distributions_random.py:1151: 
2020-11-29T10:28:07.9722810Z _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
2020-11-29T10:28:07.9723220Z pymc3/sampling.py:553: in sample
2020-11-29T10:28:07.9723730Z     trace = _mp_sample(**sample_args, **parallel_args)
2020-11-29T10:28:07.9724240Z pymc3/sampling.py:1469: in _mp_sample
2020-11-29T10:28:07.9724710Z     pickle_backend=pickle_backend,
2020-11-29T10:28:07.9725200Z pymc3/parallel_sampling.py:421: in __init__
2020-11-29T10:28:07.9726340Z     step_method_pickled = pickle.dumps(step_method, protocol=-1)
2020-11-29T10:28:07.9727030Z pymc3/distributions/distribution.py:516: in __getstate__
2020-11-29T10:28:07.9727600Z     logp = dill.dumps(self.logp)
2020-11-29T10:28:07.9728630Z /usr/local/miniconda/envs/testenv/lib/python3.6/site-packages/dill/_dill.py:273: in dumps
2020-11-29T10:28:07.9729410Z     dump(obj, file, protocol, byref, fmode, recurse, **kwds)#, strictio)
2020-11-29T10:28:07.9730540Z /usr/local/miniconda/envs/testenv/lib/python3.6/site-packages/dill/_dill.py:267: in dump
2020-11-29T10:28:07.9731240Z     Pickler(file, protocol, **_kwds).dump(obj)
2020-11-29T10:28:07.9732280Z /usr/local/miniconda/envs/testenv/lib/python3.6/site-packages/dill/_dill.py:454: in dump
2020-11-29T10:28:07.9732970Z     StockPickler.dump(self, obj)
2020-11-29T10:28:07.9733580Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:409: in dump
2020-11-29T10:28:07.9734090Z     self.save(obj)
2020-11-29T10:28:07.9734610Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:476: in save
2020-11-29T10:28:07.9735240Z     f(self, obj) # Call unbound method with explicit self
2020-11-29T10:28:07.9736430Z /usr/local/miniconda/envs/testenv/lib/python3.6/site-packages/dill/_dill.py:1127: in save_instancemethod0
2020-11-29T10:28:07.9737270Z     pickler.save_reduce(MethodType, (obj.__func__, obj.__self__), obj=obj)
2020-11-29T10:28:07.9738000Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:610: in save_reduce
2020-11-29T10:28:07.9738520Z     save(args)
2020-11-29T10:28:07.9739030Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:476: in save
2020-11-29T10:28:07.9739660Z     f(self, obj) # Call unbound method with explicit self
2020-11-29T10:28:07.9740310Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:736: in save_tuple
2020-11-29T10:28:07.9740850Z     save(element)
2020-11-29T10:28:07.9741370Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:521: in save
2020-11-29T10:28:07.9741930Z     self.save_reduce(obj=obj, *rv)
2020-11-29T10:28:07.9742520Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:634: in save_reduce
2020-11-29T10:28:07.9743050Z     save(state)
2020-11-29T10:28:07.9743550Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:476: in save
2020-11-29T10:28:07.9744170Z     f(self, obj) # Call unbound method with explicit self
2020-11-29T10:28:07.9745320Z /usr/local/miniconda/envs/testenv/lib/python3.6/site-packages/dill/_dill.py:941: in save_module_dict
2020-11-29T10:28:07.9746070Z     StockPickler.save_dict(pickler, obj)
2020-11-29T10:28:07.9747000Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:821: in save_dict
2020-11-29T10:28:07.9747600Z     self._batch_setitems(obj.items())
2020-11-29T10:28:07.9748210Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:847: in _batch_setitems
2020-11-29T10:28:07.9748730Z     save(v)
2020-11-29T10:28:07.9749230Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:521: in save
2020-11-29T10:28:07.9749780Z     self.save_reduce(obj=obj, *rv)
2020-11-29T10:28:07.9750500Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:634: in save_reduce
2020-11-29T10:28:07.9751030Z     save(state)
2020-11-29T10:28:07.9751540Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:476: in save
2020-11-29T10:28:07.9752160Z     f(self, obj) # Call unbound method with explicit self
2020-11-29T10:28:07.9753540Z /usr/local/miniconda/envs/testenv/lib/python3.6/site-packages/dill/_dill.py:941: in save_module_dict
2020-11-29T10:28:07.9754320Z     StockPickler.save_dict(pickler, obj)
2020-11-29T10:28:07.9754980Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:821: in save_dict
2020-11-29T10:28:07.9755560Z     self._batch_setitems(obj.items())
2020-11-29T10:28:07.9756180Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:847: in _batch_setitems
2020-11-29T10:28:07.9756700Z     save(v)
2020-11-29T10:28:07.9757190Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:521: in save
2020-11-29T10:28:07.9757750Z     self.save_reduce(obj=obj, *rv)
2020-11-29T10:28:07.9758340Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:634: in save_reduce
2020-11-29T10:28:07.9758860Z     save(state)
2020-11-29T10:28:07.9759370Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:476: in save
2020-11-29T10:28:07.9759990Z     f(self, obj) # Call unbound method with explicit self
2020-11-29T10:28:07.9761160Z /usr/local/miniconda/envs/testenv/lib/python3.6/site-packages/dill/_dill.py:941: in save_module_dict
2020-11-29T10:28:07.9761910Z     StockPickler.save_dict(pickler, obj)
2020-11-29T10:28:07.9762560Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:821: in save_dict
2020-11-29T10:28:07.9763150Z     self._batch_setitems(obj.items())
2020-11-29T10:28:07.9763760Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:847: in _batch_setitems
2020-11-29T10:28:07.9764290Z     save(v)
2020-11-29T10:28:07.9764770Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:521: in save
2020-11-29T10:28:07.9765340Z     self.save_reduce(obj=obj, *rv)
2020-11-29T10:28:07.9765920Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:631: in save_reduce
2020-11-29T10:28:07.9766520Z     self._batch_setitems(dictitems)
2020-11-29T10:28:07.9767140Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:847: in _batch_setitems
2020-11-29T10:28:07.9767660Z     save(v)
2020-11-29T10:28:07.9768160Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:521: in save
2020-11-29T10:28:07.9768720Z     self.save_reduce(obj=obj, *rv)
2020-11-29T10:28:07.9769310Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:634: in save_reduce
2020-11-29T10:28:07.9769830Z     save(state)
2020-11-29T10:28:07.9770340Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:476: in save
2020-11-29T10:28:07.9771060Z     f(self, obj) # Call unbound method with explicit self
2020-11-29T10:28:07.9772240Z /usr/local/miniconda/envs/testenv/lib/python3.6/site-packages/dill/_dill.py:941: in save_module_dict
2020-11-29T10:28:07.9772980Z     StockPickler.save_dict(pickler, obj)
2020-11-29T10:28:07.9773620Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:821: in save_dict
2020-11-29T10:28:07.9774210Z     self._batch_setitems(obj.items())
2020-11-29T10:28:07.9774830Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:847: in _batch_setitems
2020-11-29T10:28:07.9775350Z     save(v)
2020-11-29T10:28:07.9775850Z /usr/local/miniconda/envs/testenv/lib/python3.6/pickle.py:496: in save
2020-11-29T10:28:07.9776400Z     rv = reduce(self.proto)
2020-11-29T10:28:07.9776930Z pymc3/distributions/distribution.py:516: in __getstate__
2020-11-29T10:28:07.9777740Z     logp = dill.dumps(self.logp)
2020-11-29T10:28:07.9778290Z E   RecursionError: maximum recursion depth exceeded
2020-11-29T10:28:07.9778900Z !!! Recursion detected (same locals & position)
2020-11-29T10:28:07.9779950Z ----------------------------- Captured stderr call -----------------------------
2020-11-29T10:28:07.9780510Z Only 100 samples in chain.
2020-11-29T10:28:07.9781280Z Auto-assigning NUTS sampler...
2020-11-29T10:28:07.9781840Z Initializing NUTS using jitter+adapt_diag...
2020-11-29T10:28:07.9782470Z Multiprocess sampling (2 chains in 2 jobs)
2020-11-29T10:28:07.9782880Z NUTS: [mu]
2020-11-29T10:28:07.9783760Z ------------------------------ Captured log call -------------------------------
2020-11-29T10:28:07.9784430Z WARNING  pymc3:sampling.py:480 Only 100 samples in chain.
2020-11-29T10:28:07.9785800Z INFO     pymc3:sampling.py:490 Auto-assigning NUTS sampler...
2020-11-29T10:28:07.9786590Z INFO     pymc3:sampling.py:2034 Initializing NUTS using jitter+adapt_diag...
2020-11-29T10:28:07.9787330Z INFO     pymc3:sampling.py:550 Multiprocess sampling (2 chains in 2 jobs)
2020-11-29T10:28:07.9787920Z INFO     pymc3:sampling.py:236 NUTS: [mu]

Versions and main components

  • PyMC3 Version: edbafaa
  • Theano Version: theano-pymc==1.0.11
  • Python Version: 3.7.7
  • Operating system: MacOS 10.15.7
  • How did you install PyMC3: pip

/cc @michaelosthege @MarcoGorelli

@michaelosthege michaelosthege added bug winOS windows OS related labels Nov 29, 2020
@Spaak Spaak added this to the 3.10 milestone Nov 30, 2020
@Spaak
Copy link
Member

Spaak commented Dec 2, 2020

I think the reason the error does not show up on Linux is that on Linux multiprocessing is by default done by forking (requiring no pickling), whereas on Windows and MacOS the default is spawn. (On MacOS this didn't use to be the case, but has been since Python 3.8. Although your stack trace shows Python 3.6. In any case I'm pretty sure the multiprocessing context is the deciding factor here.)

@Spaak
Copy link
Member

Spaak commented Dec 2, 2020

Yep, can confirm, adding pm.sample(100, mp_ctx="spawn") to the failing test line reproduces the same error on Linux.

@MarcoGorelli
Copy link
Contributor

Great investigative work! 🧐

@Spaak
Copy link
Member

Spaak commented Dec 3, 2020

I checked whether this also arises on 3.9.3:

cd repos/pymc3/
git checkout 611eceb
cd ../Theano-pymc3/
git checkout e0167f2

made the same change (pm.sample(100, mp_ctx="spawn")), and the test runs through fine.

@Spaak
Copy link
Member

Spaak commented Dec 3, 2020

Ran a git bisect (thanks for the tip @MarcoGorelli):

Switched to branch 'master'
Your branch is up to date with 'origin/master'.
(base) eelke@eelkedesktop:~/repos/pymc3$ git bisect start
(base) eelke@eelkedesktop:~/repos/pymc3$ git bisect bad
(base) eelke@eelkedesktop:~/repos/pymc3$ git bisect good v3.9.3
Bisecting: 63 revisions left to test after this (roughly 6 steps)
[6b765e4834ffe61bce6907110b2155e102c2edc4] Formatted Next 15 Files (#4150)
(base) eelke@eelkedesktop:~/repos/pymc3$ cd ../Theano-PyMC/
(base) eelke@eelkedesktop:~/repos/Theano-PyMC$ git checkout e0167f2
HEAD is now at e0167f24a Modifications for the 1.0.4 release
(base) eelke@eelkedesktop:~/repos/Theano-PyMC$ cd ../pymc3
(base) eelke@eelkedesktop:~/repos/pymc3$ git bisect bad
Bisecting: 31 revisions left to test after this (roughly 5 steps)
[67155fa4d000e73f99d761970ece73c2236a97d5] Run pyupgrade (#4083)
(base) eelke@eelkedesktop:~/repos/pymc3$ git bisect bad
Bisecting: 15 revisions left to test after this (roughly 4 steps)
[8bb38f0e0ceb092e414dba33607c0f7141a8d221] DOC simplify line in getting_started notebook (#4069)
(base) eelke@eelkedesktop:~/repos/pymc3$ git bisect bad
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[bf8f0bcd86bf1985efa2619370786f6d905ed4da] Cosmetic fixes to getting started notebook. (#4062)
(base) eelke@eelkedesktop:~/repos/pymc3$ git bisect bad
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[aaafa8d010a9dd100938b9175a1c9df2b21f4aa1] Remove awkward `from theano import theano` statements (#4054)
(base) eelke@eelkedesktop:~/repos/pymc3$ git bisect good
Bisecting: 1 revision left to test after this (roughly 1 step)
[ba08fad52f84ee0a8565def1bd67c26a4eb4970a] Update README.rst (#4056)
(base) eelke@eelkedesktop:~/repos/pymc3$ git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[e08ad0c6412c4e986f1b6c1412c61531e356ccd5] Use dill to serialize logp functions in DensityDist (#4053)
(base) eelke@eelkedesktop:~/repos/pymc3$ git bisect bad
e08ad0c6412c4e986f1b6c1412c61531e356ccd5 is the first bad commit
commit e08ad0c6412c4e986f1b6c1412c61531e356ccd5
Author: Adrian Seyboldt <[email protected]>
Date:   Sun Aug 16 22:49:44 2020 +0200

    Use dill to serialize logp functions in DensityDist (#4053)

    * Use dill to serialize logp functions in DensityDist

    * Update testenv based on yml file on travis

    * Explicitly test pickling and unpickling of DensityDist

    * Improve release notes

    * Use conda activate in create testenv

 RELEASE-NOTES.md                    |  1 +
 environment-dev.yml                 |  2 +-
 pymc3/distributions/distribution.py | 14 ++++++++++++++
 pymc3/tests/test_distributions.py   | 14 ++++++++++++++
 requirements-dev.txt                |  2 +-
 requirements.txt                    |  1 +
 scripts/create_testenv.sh           |  5 ++++-
 7 files changed, 36 insertions(+), 3 deletions(-)

So #4053 introduces this.

@Spaak
Copy link
Member

Spaak commented Dec 3, 2020

Ping @aseyboldt.

@Spaak Spaak self-assigned this Dec 3, 2020
@Spaak
Copy link
Member

Spaak commented Dec 3, 2020

I'm wondering what the best route here is. To me, #4053 looks like it should be unproblematic. In the now-failing test script, the DensityDist is created as follows:

            normal_dist = pm.Normal.dist(mu, 1, shape=shape)
            pm.DensityDist(
                'density_dist',
                normal_dist.logp,
                observed=np.random.randn(100, *shape),
                shape=shape,
                random=normal_dist.random,
                wrap_random_with_dist_shape=False
            )

so with a bound method normal_dist.logp as logp argument into the constructor. This is the root of the issue (as already identified by @rpgoldman). We could impose a restriction on DensityDist that it should only be used with simple functions (which works fine, I checked), rather than bound methods (and adapt the test scripts accordingly). The vast majority of use cases should be covered by this, and if anything more complex is desired users can create custom Distribution subclasses anyway.

The annoying thing is that apparently pickle was able to handle these bound method cases fine, whereas dill chokes on them. On the other hand, pickle apparently failed in notebooks on Windows/MacOS (which is why #4053 was filed) where dill succeeded.

michaelosthege pushed a commit that referenced this issue Dec 5, 2020
* informative warnings on bound method logp in DensityDist

* black

* run test on single core only to avoid dill error on windows/macos

* adding tests for DensityDist serialize recursion handling

* forgot a test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug winOS windows OS related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants