Skip to content

SMC improve its efficiency by using samples from all (high temperature) stages #2519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aloctavodia opened this issue Aug 19, 2017 · 13 comments

Comments

@aloctavodia
Copy link
Member

I have been using the new Sequential Monte Carlo sampler for some very hard to sample multidimensional distributions with great success, thanks @hvasbath for that!

Currently the SMC sampler only uses the samples from the lowest temperature (latest iteration). I know there are way to recycle/use samples from the highest temperatures. One reference I have found is this one https://arxiv.org/abs/1504.05753. Before trying to implement anything I wonder if someone has other references to share or thoughts about this point.

@aloctavodia aloctavodia changed the title SMC improve efficientcy by using samples from all (high temperature) stages SMC improve its efficiency by using samples from all (high temperature) stages Aug 19, 2017
@seanlaw
Copy link

seanlaw commented Aug 19, 2017

@aloctavodia Can you provide a quick explanation of SMC?

In the case of temperature replica exchange (a la molecular dynamics simulations), there are multiple replicas of the same system being simulated at various temperatures and where neighboring temperatures would occasionally swap as long as some metropolis criteria is met. This method was popularized by Sugita and Okamoto.

Note that one is not only confined to exchanges in temperature space and, in fact, one could also exchange in Hamiltonian space.

While exchanges between neighboring replicas is one aspect of the sampling, the other key is to be able to use the data coming from ALL temperatures (or Hamiltonians) and not only derive your probability distribution strictly from the lowest temperature. In order to accomplish this, one simply needs to use a method called Weighted Histogram Analysis Method (WHAM). And, at the limit of zero-bin widths, WHAM is also often referred to by it's equivalent name of Multistate Bennet's Acceptance Ratio (MBAR).

A well known Python implementation of MBAR by Shirts and Chodera can be found here.

And an open source C++ implementation of (zero-bin width) WHAM can be found here. Full disclosure, I wrote this code but no longer support it as it's been several years since I've last touched it.

@aloctavodia
Copy link
Member Author

Hi @seanlaw, nice to see you here! and thanks for you input, I will check those links and see if they are applicable to SMC. Now that you mention I remember using WHAM years ago in the protein-folding context, but it was a black-box for me :)

In SMC you start at a high temperature and then you move sequentially to lower temperatures. A SMC is somehow similar to a genetic algorithm.

  1. We initialize a population (independent points in the sample space).
  2. We mutate them (run a short MCMC for each point)
  3. For the last sample of each run we perform selection by weighting them according to their posterior probability, for the next stage points with higher weights will be used to initialize more runs than points with lower weights (the populations is fixed, so some points in the previous stage will not be used to initialize any new run).
  4. We repeat from 2 until we reach the lowest temperature.

When I said posterior probability I am referring to a tempered posterior.

$p(\theta | y, \beta_m) = p(y | \theta)^{\beta_m} p (\theta)$

where $\beta_m$ is the temperature-like parameter and goes from 0 (just the prior) to 1 (full posterior).

smc

I hope this helps

@seanlaw
Copy link

seanlaw commented Aug 20, 2017

The visual helped. I think I have a better conceptual idea of how SMC works. In relation to our offline conversation, can you tell me what you mean by "how to recycle high temperature samples"?

When I say "use all samples from all temperatures" using replica exchange, I am referring to how to reconstruct the (posterior?) distribution after all sampling is complete and that would be via WHAM. I'm not sure I am using the same terminology so you'll have to forgive my lack of Bayesian thinking.

@aloctavodia
Copy link
Member Author

I don't know if "recycling" is the better term. But I am also talking about using all samples from all temperatures.

@hvasbath
Copy link
Contributor

@aloctavodia thanks a lot for starting this! I am glad it works so well for you!
Where did you find this nice figure? Did you prepare it yourself?
I think we shouldnt use all samples from all temperatures but rather, selective samples from all temperatures?

@junpenglao Actually the current implementation is a mixture of the one you mentioned and this:
http://ascelibrary.org/doi/abs/10.1061/%28ASCE%290733-9399%282007%29133%3A7%28816%29

But the original paper where it has been developed is this:
http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2006.00553.x/abstract

The other ones more or less introduce it to the engineering and geoscience communities, respectively, the math is basically the same. Just for citation purposes we should give credit to the original developers...

@aloctavodia
Copy link
Member Author

@hvasbath I got the figure from Google Images. Is my intention to prepare one myself mainly for an internal group meeting talk (I could send it to you when ready), I am thinking now that we should probably have a notebook describing the SMC sampler :)

We can use all samples, if they are weighted properly, to estimate any desired quantity. Still the details are elusive to me, I hope things become clear when I read about WHAM and take a second look at https://arxiv.org/abs/1504.05753

@hvasbath
Copy link
Contributor

I started describing the SMC sampler in a condensed version for a paper I am writing.
We could use this as a base and change it otherwise we will get into troubles with copyright later on, if eventually my article gets published ...

Good point @aloctavodia ! Unfortunately, I wont have time to look into it until the mentioned article is in good shape, but maybe October I could have a look as well...

@hvasbath
Copy link
Contributor

Another paper which might be worth noting and implementing:
https://arxiv.org/abs/1103.3965

Is about efficient sampling with as few chains as possible for high-dimensional problems...
Maybe the other points work toward the same direction ...
For now the number of chains has to be very high compared to number of variables in order to get a valid posterior...

@junpenglao
Copy link
Member

Fixed in #2563

@hvasbath
Copy link
Contributor

I think this is not off the table and #2563 is rather unrelated or am I missing something?

@hvasbath hvasbath reopened this Oct 30, 2017
@aloctavodia
Copy link
Member Author

I agree with @hvasbath this is not off table (at least not yet). But I understand @junpenglao confusion. Let me try to clarify this, the computation of the marginal likelihood in #2563 use important weights from all stages. Those same weights could be used to get (weighted) samples from all stages/temperatures, but at the moment we are not doing that, instead we are just returning all the samples from the last stage. It is not entirely clear to me which strategy is the best, intuitively I see pros/cons for both of them. Anyway, I could start working on this issue in the next few days, if it is ok to you guys.

@junpenglao
Copy link
Member

I see, thanks for the explanation!

@twiecki twiecki closed this as completed Dec 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants