Is it possible to use llama.cpp engine to generate with LLM from embeddings and not from tokens? #13047

Ouna-the-Dataweaver · 2025-04-21T12:55:44Z

Ouna-the-Dataweaver
Apr 21, 2025

So, basically, for multimodal models you quite often need to generate straight from embeddings. You take in prompt = tokens, apply embedding layer to them to get your tensor for text, and insert inside embeddings tensors from other modality encoders. In pytorch you can directly pass embeddings into LLM like that:

llama_model.generate(
            inputs_embeds=embeds,...)

Is there any way to do something similar in llama.cpp?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to use llama.cpp engine to generate with LLM from embeddings and not from tokens? #13047

{{title}}

Replies: 0 comments

Select a reply

Is it possible to use llama.cpp engine to generate with LLM from embeddings and not from tokens? #13047

Ouna-the-Dataweaver Apr 21, 2025

Replies: 0 comments

Ouna-the-Dataweaver
Apr 21, 2025