You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The combination stream=True, tool_choice="auto" raises an exception right now, which means that developers are stuck with one of two unfortunate choices:
Developing an application that streams the response but cannot use tools
Developing an LLM application that can use tools but cannot stream the response
Admittedly this is the wrong place to ask this question but as a beginner I feel like you're the right person to answer:
Does something need to be done to llama.cpp directly in order to handle streaming tool calling? I see from your feature branch that you added a RAG layer to this python implementation. I ask because I built llamma.cpp from source figuring it would be better optimized for my system, but I am stuck with this server error {"code":500,"message":"Cannot use tools with stream","type":"server_error"}.
Is it the case that if I installed the pre-built python version that this would go away?
Edit: I see here that there's a PR in draft. We're too close to the bleeding edge!
The combination
stream=True, tool_choice="auto"
raises an exception right now, which means that developers are stuck with one of two unfortunate choices:Relevant discussion: #1615
The text was updated successfully, but these errors were encountered: