-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
SMC improve its efficiency by using samples from all (high temperature) stages #2519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@aloctavodia Can you provide a quick explanation of SMC? In the case of temperature replica exchange (a la molecular dynamics simulations), there are multiple replicas of the same system being simulated at various temperatures and where neighboring temperatures would occasionally swap as long as some metropolis criteria is met. This method was popularized by Sugita and Okamoto. Note that one is not only confined to exchanges in temperature space and, in fact, one could also exchange in Hamiltonian space. While exchanges between neighboring replicas is one aspect of the sampling, the other key is to be able to use the data coming from ALL temperatures (or Hamiltonians) and not only derive your probability distribution strictly from the lowest temperature. In order to accomplish this, one simply needs to use a method called Weighted Histogram Analysis Method (WHAM). And, at the limit of zero-bin widths, WHAM is also often referred to by it's equivalent name of Multistate Bennet's Acceptance Ratio (MBAR). A well known Python implementation of MBAR by Shirts and Chodera can be found here. And an open source C++ implementation of (zero-bin width) WHAM can be found here. Full disclosure, I wrote this code but no longer support it as it's been several years since I've last touched it. |
Hi @seanlaw, nice to see you here! and thanks for you input, I will check those links and see if they are applicable to SMC. Now that you mention I remember using WHAM years ago in the protein-folding context, but it was a black-box for me :) In SMC you start at a high temperature and then you move sequentially to lower temperatures. A SMC is somehow similar to a genetic algorithm.
When I said posterior probability I am referring to a tempered posterior. where I hope this helps |
The visual helped. I think I have a better conceptual idea of how SMC works. In relation to our offline conversation, can you tell me what you mean by "how to recycle high temperature samples"? When I say "use all samples from all temperatures" using replica exchange, I am referring to how to reconstruct the (posterior?) distribution after all sampling is complete and that would be via WHAM. I'm not sure I am using the same terminology so you'll have to forgive my lack of Bayesian thinking. |
I don't know if "recycling" is the better term. But I am also talking about using all samples from all temperatures. |
@aloctavodia thanks a lot for starting this! I am glad it works so well for you! @junpenglao Actually the current implementation is a mixture of the one you mentioned and this: But the original paper where it has been developed is this: The other ones more or less introduce it to the engineering and geoscience communities, respectively, the math is basically the same. Just for citation purposes we should give credit to the original developers... |
@hvasbath I got the figure from Google Images. Is my intention to prepare one myself mainly for an internal group meeting talk (I could send it to you when ready), I am thinking now that we should probably have a notebook describing the SMC sampler :) We can use all samples, if they are weighted properly, to estimate any desired quantity. Still the details are elusive to me, I hope things become clear when I read about WHAM and take a second look at https://arxiv.org/abs/1504.05753 |
I started describing the SMC sampler in a condensed version for a paper I am writing. Good point @aloctavodia ! Unfortunately, I wont have time to look into it until the mentioned article is in good shape, but maybe October I could have a look as well... |
Another paper which might be worth noting and implementing: Is about efficient sampling with as few chains as possible for high-dimensional problems... |
Fixed in #2563 |
I think this is not off the table and #2563 is rather unrelated or am I missing something? |
I agree with @hvasbath this is not off table (at least not yet). But I understand @junpenglao confusion. Let me try to clarify this, the computation of the marginal likelihood in #2563 use important weights from all stages. Those same weights could be used to get (weighted) samples from all stages/temperatures, but at the moment we are not doing that, instead we are just returning all the samples from the last stage. It is not entirely clear to me which strategy is the best, intuitively I see pros/cons for both of them. Anyway, I could start working on this issue in the next few days, if it is ok to you guys. |
I see, thanks for the explanation! |
I have been using the new Sequential Monte Carlo sampler for some very hard to sample multidimensional distributions with great success, thanks @hvasbath for that!
Currently the SMC sampler only uses the samples from the lowest temperature (latest iteration). I know there are way to recycle/use samples from the highest temperatures. One reference I have found is this one https://arxiv.org/abs/1504.05753. Before trying to implement anything I wonder if someone has other references to share or thoughts about this point.
The text was updated successfully, but these errors were encountered: