Apply AD gradient if `optimizer` is a first-order one #1365

wupeifan · 2020-07-29T03:18:27Z

In estimating MLE and MAP, the routines from Optim.jl are called. However, even for gradient-based methods, Optim.jl only support ForwardDiff as an AD backend, and apply finite differences otherwise. Therefore, it makes sense to use the AD backend in Turing (Turing.setadbackend) to define the gradient function and feed it to the optimizer.

Basically, we replace https://github.com/TuringLang/Turing.jl/blob/master/src/modes/ModeEstimation.jl#L383 by a structure like

if optimizer isa Optim.FirstOrderOptimizer
   ...
else
   ...
end

In the first branch, you will need to pass the original function and another that computes the function value and gradient-based on Turing.gradient_logp.
An example is https://github.com/JuliaNLSolvers/Optim.jl/blob/7b660484724755ee2b306dd3ceed13d6633067ae/test/multivariate/optimize/interface.jl#L54

The text was updated successfully, but these errors were encountered:

pkofod · 2020-07-29T06:51:43Z

Let me know if you need help here. Is it possible to calculate the gradient and get the value simultaneously with your setup? That would help performance.

wupeifan · 2020-07-29T07:14:43Z

I think in this setting the gradient is Turing.gradient_logp. Therefore, you can wrap the value and the gradient together. @mohamed82008 also pointed out this https://github.com/JuliaNLSolvers/NLSolversBase.jl/blob/9e89526817d5932489d8fdfead5f6e537c1291a1/src/objective_types/oncedifferentiable.jl#L172 in Slack discussion thread.

devmotion · 2020-07-29T07:24:57Z

I guess the branch is not needed? It seems one could just define

function (f::OptimLogDensity)(F, G, H, x)
    if G !== nothing
        ...
    end
    if H !== nothing
        ...
    end
    if F !== nothing
        return ...
    end
    nothing
end

and then call Optim.optimize(Optim.only_fgh!(f), ...), similar to https://github.com/JuliaNLSolvers/Optim.jl/blob/7b660484724755ee2b306dd3ceed13d6633067ae/test/multivariate/optimize/interface.jl#L73-L78.

wupeifan · 2020-07-29T07:33:07Z

Yeah, probably. Thanks!
I'm swamped with more urgent stuff so I might not be able to make a PR myself for this one... Maybe wait for Cameron after he finishes his qualifier, or maybe I'll revisit this in a couple of days...

cpfiffer · 2020-08-03T15:14:38Z

@mohamed82008 what would be a quick way to get the Hessian as well? I'm switching things up a little to use gradient_logp to get the gradient too, but I'm wondering if I should add a method like hessian_logp. Anyone have thoughts on that?

wupeifan · 2020-08-03T18:12:55Z

Hessians are usually computationally expensive and that’s why the original Newton method is not preferred. I don’t think it is that necessary to add Hessians for optimization per se but maybe other people have different ideas.

That said, I think it would be great if the information matrix can be provided...

cpfiffer · 2020-08-03T19:38:33Z

Well, if we're going to use the Optim.only_fgh! method (or the non-Hessian variant Optim.only_fg!) we should probably consider extending support for Hessians in an efficient way in case someone wants to use Newton or the other Hessian-based methods.

We've got support already for the information matrix, but I'm not sure if it's finite-difference based or not (I think it is) -- you can do it with

using StatsBase

m = optimize(model, MLE())
StatsBase.informationmatrix(m)

wupeifan · 2020-08-03T19:46:09Z

in case someone wants to use Newton or the other Hessian-based methods.

For small scale problems yeah. I don't know, maybe other people have better ideas.

mohamed82008 · 2020-08-04T00:25:57Z

I'm wondering if I should add a method like hessian_logp.

I think that's reasonable, but perhaps as a separate PR. For now, we can say that second-order methods are not supported if H !== nothing.

wupeifan · 2020-08-05T02:55:08Z

We've got support already for the information matrix, but I'm not sure if it's finite-difference based or not (I think it is) -- you can do it with

using StatsBase

m = optimize(model, MLE())
StatsBase.informationmatrix(m)

The information matrix actually spits an error. I think it calls ForwardDiff automatically. @cpfiffer

cpfiffer · 2020-08-05T03:35:25Z

Welp, all the more reason to get the actual Hessian stuff built in too.

ChrisRackauckas · 2020-08-11T16:27:47Z

@mohamed82008 what would be a quick way to get the Hessian as well? I'm switching things up a little to use gradient_logp to get the gradient too, but I'm wondering if I should add a method like hessian_logp. Anyone have thoughts on that?

Just do forward mode over whatever reverse mode is chosen.

wupeifan · 2020-08-20T08:44:10Z

Just do forward mode over whatever reverse mode is chosen.

It could be possible that under custom adjoints the users cannot provide a Hessian in some of the intermediate steps, so the forward mode might not proceed.
I think it's not that trivial when custom adjoints are provided; however, what you said should be feasible if everything is taken care of by an AD backend...

wupeifan · 2020-08-20T17:23:55Z

Thanks for implementing this feature request! I just experimented with the new codes and they work perfectly.

cpfiffer · 2020-08-20T17:40:24Z

Did you get a noticeable speed up of any kind? I'd expect the AD gradient to be a lot faster and more precise in general.

…

On Thu, Aug 20, 2020 at 10:24 AM Peifan Wu ***@***.***> wrote: Closed #1365 <#1365>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1365 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADHITXRDZNTZA3ICH64VSTSBVL3VANCNFSM4PLEYVDA> .

wupeifan · 2020-08-20T17:51:34Z

@cpfiffer It's rather a feasibility issue for me as previously I can't run gradient-based methods. For sure it will be faster than the previous simplex method

cpfiffer mentioned this issue Aug 4, 2020

Provide AD gradient for MLE/MAP #1369

Merged

wupeifan closed this as completed Aug 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Apply AD gradient if `optimizer` is a first-order one #1365

Apply AD gradient if `optimizer` is a first-order one #1365

wupeifan commented Jul 29, 2020

pkofod commented Jul 29, 2020

Uh oh!

wupeifan commented Jul 29, 2020

Uh oh!

devmotion commented Jul 29, 2020

Uh oh!

wupeifan commented Jul 29, 2020

Uh oh!

cpfiffer commented Aug 3, 2020

Uh oh!

wupeifan commented Aug 3, 2020 •

edited

Loading

Uh oh!

cpfiffer commented Aug 3, 2020

Uh oh!

wupeifan commented Aug 3, 2020

Uh oh!

mohamed82008 commented Aug 4, 2020

Uh oh!

wupeifan commented Aug 5, 2020 •

edited

Loading

Uh oh!

cpfiffer commented Aug 5, 2020

Uh oh!

ChrisRackauckas commented Aug 11, 2020

Uh oh!

wupeifan commented Aug 20, 2020

Uh oh!

wupeifan commented Aug 20, 2020

Uh oh!

cpfiffer commented Aug 20, 2020 via email

Uh oh!

wupeifan commented Aug 20, 2020

Uh oh!

Apply AD gradient if optimizer is a first-order one #1365

Apply AD gradient if optimizer is a first-order one #1365

Comments

wupeifan commented Jul 29, 2020

pkofod commented Jul 29, 2020

Uh oh!

wupeifan commented Jul 29, 2020

Uh oh!

devmotion commented Jul 29, 2020

Uh oh!

wupeifan commented Jul 29, 2020

Uh oh!

cpfiffer commented Aug 3, 2020

Uh oh!

wupeifan commented Aug 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cpfiffer commented Aug 3, 2020

Uh oh!

wupeifan commented Aug 3, 2020

Uh oh!

mohamed82008 commented Aug 4, 2020

Uh oh!

wupeifan commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cpfiffer commented Aug 5, 2020

Uh oh!

ChrisRackauckas commented Aug 11, 2020

Uh oh!

wupeifan commented Aug 20, 2020

Uh oh!

wupeifan commented Aug 20, 2020

Uh oh!

cpfiffer commented Aug 20, 2020 via email

Uh oh!

wupeifan commented Aug 20, 2020

Uh oh!

Apply AD gradient if `optimizer` is a first-order one #1365

Apply AD gradient if `optimizer` is a first-order one #1365

wupeifan commented Aug 3, 2020 •

edited

Loading

wupeifan commented Aug 5, 2020 •

edited

Loading