Success calculating embeddings with Continue from continue.dev? #12879

bitbottrap · 2025-04-10T19:36:45Z

bitbottrap
Apr 10, 2025

I'm trying to use llama.cpp to locally calculate embeddings locally using an embedding model.

I'm not having much success in that llama.cpp keeps crashing. With a small code base I can get by indexing and then have a problem with the @codebase queries and larger codebases usually fail attempting to index the code base. Two prominent errors:
llama-graph.cpp:171: GGML_ASSERT(seq_id < n_tokens && "seq_id cannot be larger than n_tokens with pooling_type == MEAN") failed

and:

clear_adapter_lora: call
kv_self_update: defragmenting KV cache
(tmp log) KV defrag cell moves: 4
expected gf nodes: 672
kv_self_update: reserving a worst case graph
llama.cpp/ggml/src/ggml.c:2744: GGML_ASSERT(ggml_can_mul_mat(a, b)) failed

My continue embedding provider configuration is:

  "embeddingsProvider": {
    "title": "GTE Qwen 1.5B",
    "provider": "openai",
    "apiKey": "NONE",
    "model": "gte-qwen1.5B",
    "apiBase": "http://localhost:8081/v1",
    "maxEmbeddingChunkSize": 512,
    "maxEmbeddingBatchSize": 16
  },

And my llama-server command line is:
LLAMA_LOG_VERBOSITY=1 CUDA_VISIBLE_DEVICES=0 llama.cpp/build/bin/llama-server -a gte-qwen1.5B --host 0.0.0.0 --port 8081 --no-warmup --pooling mean --threads 4 -np 128 -b 8192 -ub 1024 -ngl 99 -c 262144 --flash-attn --embedding -m gte-Qwen2-1.5B-instruct.gguf

I'm sure continue hasn't been thoroughly tested in this configuration but the crashes are definitely bugging me.

Anyone have success?

ggerganov · 2025-04-10T19:45:53Z

ggerganov
Apr 10, 2025
Maintainer

See #6722 (comment)

Short answer is:

Use -np 4
Filter any requests that have less than -np tokens until this is fixed

1 reply

bitbottrap Apr 10, 2025
Author

Ah. Thank you. Looks like I pushed llama.cpp way too hard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Success calculating embeddings with Continue from continue.dev? #12879

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Success calculating embeddings with Continue from continue.dev? #12879

bitbottrap Apr 10, 2025

Replies: 1 comment · 1 reply

ggerganov Apr 10, 2025 Maintainer

bitbottrap Apr 10, 2025 Author

bitbottrap
Apr 10, 2025

Replies: 1 comment 1 reply

ggerganov
Apr 10, 2025
Maintainer

bitbottrap Apr 10, 2025
Author