Skip to content

Bayesian optimization #88

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rcnlee opened this issue Aug 14, 2018 · 13 comments
Closed

Bayesian optimization #88

rcnlee opened this issue Aug 14, 2018 · 13 comments

Comments

@rcnlee
Copy link

rcnlee commented Aug 14, 2018

Hello, can you provide a usage example for Bayesian optimization, where the points come in one at a time? From the code, it looks like the package may already support this, although I didn't see anything in the documentation. For example, the update_cK! function seems to do incremental covariance matrix, but I'm not sure.

Is this right?

gp = GP(X,y,mean,kernel)
X1, y1 #new with 1 point added
gp.X, gp.y = X1, y1
update_cK!(gp)
predict_f(gp, ...)

However, I also see push! defined, but it doesn't seem to use the update_cK! function.

Thanks!

@chris-nemeth
Copy link
Member

I'm away at the moment, but I'll look into this when I get back. A while back I did write some code for Bayesian optimization using expected improvement. This hasn't been added to the package and it's more of a long-term goal.

The update_cK! function is used make the optimization of the parameters faster and I'm not sure if this would be the right way to implement what you want as you're recursively increasing the size of the covariance matrix, so I think there might be a more efficient way of doing this.

@jbrea
Copy link
Collaborator

jbrea commented Oct 1, 2018

I am also trying to figure out, how to use this package for Bayesian optimization.
I saw that a Bayesian Optimization package just fits again the whole GP, which seems pretty wasteful, given that the kernel has to be applied only to pairs that involve new data. But it is unclear to me, how to efficiently grow cK. If you can give me some hints, I would be happy to contribute, before writing my own GP regression with incremental update of the covariance matrix in mind.

@maximerischard
Copy link
Contributor

maximerischard commented Oct 1, 2018

Hi @rcnlee and @jbrea, I'm excited to hear you're interested in using this package for Bayesian optimization. I do not think there is currently support for adding a single datum without recomputing the entire covariance matrix and its cholesky decomposition, but you're right that it's inefficient to do so. Thinking about it out loud, there's a few steps involved:

  • Updating the input matrix x, output vector y, and nobs. Nothing too challenging here.
  • Update the full covariance matrix cK.mat to have an additional row and column. Ideally, one would want to do so without having to allocate memory for a new (n+1)x(n+1) matrix, copying over the old n x n matrix, and computing the covariance for the new row/column. There might be julia packages out there that already support growing matrices (e.g. using views). If not, I have something like that written up for a different project that I could share.
  • Update the cholesky decomposition of the covariance matrix cK.chol. See this stackoverflow question for example. There's a good chance that someone has already written julia code for this, but I haven't found it.
  • It would then be wise to implement a new type of abstract positive definite matrix for matrices that can grow, see the “Common interface” section of PDMats.jl. It would be worth first searching around to see if someone has already implemented this.
    • Modify the GPE type so it can take abstract PD matrices.
  • The next problem would be to avoid recomputing the various types of KernelData objects, which pre-compute various stuff that can be reused between subsequent computations of the covariance matrix but with different hyperparameters. Mostly, these are distance matrices. This is the most tedious step I would think.
  • After that, a few quantities need updating. Basically the steps in the update_mll! function without updating the covariance matrix.

PS: I should note that I haven't looked into how packages in other languages handle this. There may well be a lot to learn from other people's code.

@jbrea
Copy link
Collaborator

jbrea commented Oct 2, 2018

Thanks a lot. This looks like a bit of work 😄. I'll have a look at it as soon as I find some time.

@jbrea
Copy link
Collaborator

jbrea commented Oct 2, 2018

Here are some small initial steps.

To update the covariance matrix, one could use something like

import Base.size, Base.getindex, Base.setindex!, Base.append!, Base.size
import LinearAlgebra.mul!
mutable struct GrowingSymmetricMatrix{T} <: AbstractArray{T, 2}
    N::Int64
    capacity::Int64
    stepsize::Int64
    data::Array{T, 2}
end
GrowingSymmetricMatrix(N, data; stepsize = 500) = GrowingSymmetricMatrix(N, size(data, 1), stepsize, data)
size(m::GrowingSymmetricMatrix) = (m.N, m.N)
getindex(m::GrowingSymmetricMatrix, i::Integer, j::Integer) = m.data[i, j]
setindex!(m::GrowingSymmetricMatrix, x, i::Integer, j::Integer) = m.data[i, j] = x 
mul!(Y::AbstractArray{T, 1}, M::GrowingSymmetricMatrix{T}, V::AbstractArray{T, 1}) where T = mul!(Y, view(M.data, 1:M.N, 1:M.N), V)
mul!(Y::AbstractArray{T, 2}, M::GrowingSymmetricMatrix{T}, V::AbstractArray{T, 2}) where T = mul!(Y, view(M.data, 1:M.N, 1:M.N), V)
mul!(Y::AbstractArray{T, 2}, M1::GrowingSymmetricMatrix{T}, M2::GrowingSymmetricMatrix{T}) where T = mul!(Y, view(M1.data, 1:M1.N, 1:M1.N), view(M2.data, 1:M2.N, 1:M2.N))
function append!(m::GrowingSymmetricMatrix, v)
    m.N += 1
    if m.N > m.capacity
        tmp = zeros(Int64, m.capacity + m.stepsize, m.capacity + m.stepsize)
        ind = CartesianIndices((1:m.capacity, 1:m.capacity))
        copyto!(tmp, ind, m.data, ind)
        m.data = tmp
        m.capacity += m.stepsize
    end
    @inbounds for (i, val) in enumerate(v)
        setindex!(m.data, val, m.N, i)
        setindex!(m.data, val, i, m.N)
    end
    m
end

@maximerischard I guess you had something like this in mind when you mentioned "using views".

An alternative to the above approach would be to store only the values in the upper half of the symmetric matrix. I just don't know how to get mul! to speed for large matrices.

mutable struct FlatSymmetric{T} <: AbstractArray{T, 2}
    N::Int
    data::Vector{T}
end
@inline function _index(i, j)
    i > j && return _index(j, i)
    div(j*(j - 1), 2) + i
end
getindex(m::FlatSymmetric, i::Integer, j::Integer) = m.data[_index(i, j)]
setindex!(m::FlatSymmetric, x, i::Integer, j::Integer) = m.data[_index(i, j)] = x 
size(m::FlatSymmetric) = (m.N, m.N)
function append!(m::FlatSymmetric, v)
    m.N += 1
    append!(m.data, v)
    m
end
function mul!(Y::AbstractVector, M::FlatSymmetric{T}, V::Vector{T}) where T
    @inbounds @simd for i in 1:M.N
        Y[i] = T(0)
        for j in 1:M.N
            Y[i] += M.data[_index(i, j)] * V[j]
        end
    end
    Y
end

This thesis and code could be interesting for the cholesky update.

So far I didn't find useful julia code for the growing and Cholesky update. ElasticArrays allows only to grow the last dimension (which could be useful for updating the input matrix x).

I looks like GPyOpt fits the full data again when new observations arrive (see here and here).

@maximerischard
Copy link
Contributor

Hi @jbrea, yes that looks exactly like what I had in mind for the growing matrix. One thing to experiment with would be to double the size of the buffer (data) when the capacity is reached. That way we need to allocate new memory O(log(N)) times instead of O(N).

It looks like the matlab code you linked implements appendix B.1 of the corresponding thesis, which indeed looks like exactly what we need, and generalizes the stackoverflow thread I linked above to adding more than one point at a time (which seems like it could be useful). Again, it would be worth implementing it so as to minimize new memory allocations.

@jbrea
Copy link
Collaborator

jbrea commented Oct 7, 2018

Here is the current status up to the fourth point of @maximerischard's list. I get a factor 60 speedup for growing a GP from 2 to 2000 datapoints relative to refiting it every time.

In case you want to include something like this here, I would start a PR soon. An alternative would be to include it in a Bayesian Optimization package.

@maximerischard I'm not sure, I would really scale the buffer size exponentially, because as soon as it reaches a bit more than half the memory, I don't want it to try to allocate again the same size. The stepsize is rather thought of as a safety margin. Ideally the user knows roughly, how many samples will be collected and sets capacity like this. But users can of course scale the stepsize manually.

@maximerischard
Copy link
Contributor

This is wonderful, and very exciting to confirm that the speed-ups are real. Your reasoning about stepsize makes sense to me, you're right that as long as the user sets judicious values for the capacity and stepsize, the number of memory allocation events can be kept low.

I would say yes, this is definitely functionality that should be in this package, and a PR would be very welcome. But I wonder if the growing PDMat functionality might find uses elsewhere, and should be its own mini-package. Let me know if there's anything I can do to help with the PR.

One side-thought: being able to remove one data point would also be useful, for example to implement cross-validation. I wonder how much code such a ShrinkingPDMat would have in common with your GrowingPDMat.

@chris-nemeth
Copy link
Member

Thanks for working on this @jbrea. A PR would be great as this functionality would be a useful addition to the package.

@jbrea
Copy link
Collaborator

jbrea commented Oct 8, 2018

Thanks for your feedback @maximerischard & @chris-nemeth.
I liked the idea of a small separate package. I also implemented removal of data points now, and therefore called it ElasticPDMats. I'll start a PR soon to integrate it with GaussianProcesses.

@jbrea
Copy link
Collaborator

jbrea commented Oct 8, 2018

I am a bit lost with updating the KernelData objects and adapting the update_mll! function.
I thought I could just append cov(kernel, x_old[i], x_new) for i in 1:nobs to cK when adding new data and whenever I want to readjust the kernel parameters I could create a new KernelData object (to reuse precomputed values for different kernel parameters). But, if I understand correctly, I would have to write new functions cov(kernel, x_old[i], x_new) for all kernels, or create elastic/updatable KernelData objects and use the existing cov functions. Do I miss something? What would you suggest?

Edit: I think I figured out how to use the existing cov functions.

@jbrea
Copy link
Collaborator

jbrea commented Oct 12, 2018

Here is a very preliminary version of my attempt at BayesianOptimization.

@jbrea
Copy link
Collaborator

jbrea commented Nov 1, 2018

A first usable version of BayesianOptimization is now online. Currently it requires GaussianProcesses#master and still requires #97 to be merged to run with option gradientfree = false.

Thanks a lot @maximerischard for all the help!!!

jbrea added a commit that referenced this issue Nov 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants