Success calculating embeddings with Continue from continue.dev? #12879
Unanswered
bitbottrap
asked this question in
Q&A
Replies: 1 comment 1 reply
-
See #6722 (comment) Short answer is:
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm trying to use llama.cpp to locally calculate embeddings locally using an embedding model.
I'm not having much success in that llama.cpp keeps crashing. With a small code base I can get by indexing and then have a problem with the @codebase queries and larger codebases usually fail attempting to index the code base. Two prominent errors:
llama-graph.cpp:171: GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == MEAN") failed
and:
My continue embedding provider configuration is:
And my llama-server command line is:
LLAMA_LOG_VERBOSITY=1 CUDA_VISIBLE_DEVICES=0 llama.cpp/build/bin/llama-server -a gte-qwen1.5B --host 0.0.0.0 --port 8081 --no-warmup --pooling mean --threads 4 -np 128 -b 8192 -ub 1024 -ngl 99 -c 262144 --flash-attn --embedding -m gte-Qwen2-1.5B-instruct.gguf
I'm sure continue hasn't been thoroughly tested in this configuration but the crashes are definitely bugging me.
Anyone have success?
Beta Was this translation helpful? Give feedback.
All reactions