-
Notifications
You must be signed in to change notification settings - Fork 226
Apply AD gradient if optimizer
is a first-order one
#1365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Let me know if you need help here. Is it possible to calculate the gradient and get the value simultaneously with your setup? That would help performance. |
I think in this setting the gradient is |
I guess the branch is not needed? It seems one could just define function (f::OptimLogDensity)(F, G, H, x)
if G !== nothing
...
end
if H !== nothing
...
end
if F !== nothing
return ...
end
nothing
end and then call |
Yeah, probably. Thanks! |
@mohamed82008 what would be a quick way to get the Hessian as well? I'm switching things up a little to use |
Hessians are usually computationally expensive and that’s why the original Newton method is not preferred. I don’t think it is that necessary to add Hessians for optimization per se but maybe other people have different ideas. That said, I think it would be great if the information matrix can be provided... |
Well, if we're going to use the We've got support already for the information matrix, but I'm not sure if it's finite-difference based or not (I think it is) -- you can do it with using StatsBase
m = optimize(model, MLE())
StatsBase.informationmatrix(m) |
For small scale problems yeah. I don't know, maybe other people have better ideas. |
I think that's reasonable, but perhaps as a separate PR. For now, we can say that second-order methods are not supported if H !== nothing. |
The information matrix actually spits an error. I think it calls ForwardDiff automatically. @cpfiffer |
Welp, all the more reason to get the actual Hessian stuff built in too. |
Just do forward mode over whatever reverse mode is chosen. |
It could be possible that under custom adjoints the users cannot provide a Hessian in some of the intermediate steps, so the forward mode might not proceed. |
Thanks for implementing this feature request! I just experimented with the new codes and they work perfectly. |
Did you get a noticeable speed up of any kind? I'd expect the AD gradient
to be a lot faster and more precise in general.
…On Thu, Aug 20, 2020 at 10:24 AM Peifan Wu ***@***.***> wrote:
Closed #1365 <#1365>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1365 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADHITXRDZNTZA3ICH64VSTSBVL3VANCNFSM4PLEYVDA>
.
|
@cpfiffer It's rather a feasibility issue for me as previously I can't run gradient-based methods. For sure it will be faster than the previous simplex method |
In estimating
MLE
andMAP
, the routines fromOptim.jl
are called. However, even for gradient-based methods, Optim.jl only supportForwardDiff
as an AD backend, and apply finite differences otherwise. Therefore, it makes sense to use the AD backend in Turing (Turing.setadbackend
) to define the gradient function and feed it to the optimizer.Basically, we replace https://github.com/TuringLang/Turing.jl/blob/master/src/modes/ModeEstimation.jl#L383 by a structure like
In the first branch, you will need to pass the original function and another that computes the function value and gradient-based on
Turing.gradient_logp
.An example is https://github.com/JuliaNLSolvers/Optim.jl/blob/7b660484724755ee2b306dd3ceed13d6633067ae/test/multivariate/optimize/interface.jl#L54
The text was updated successfully, but these errors were encountered: