Is it possible to use llama.cpp engine to generate with LLM from embeddings and not from tokens? #13047
Unanswered
Ouna-the-Dataweaver
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
So, basically, for multimodal models you quite often need to generate straight from embeddings. You take in prompt = tokens, apply embedding layer to them to get your tensor for text, and insert inside embeddings tensors from other modality encoders. In pytorch you can directly pass embeddings into LLM like that:
Is there any way to do something similar in llama.cpp?
Beta Was this translation helpful? Give feedback.
All reactions