-
Notifications
You must be signed in to change notification settings - Fork 17.7k
Anthropic's prompt caching in langchain does not work with ChatPromptTemplate. #26701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, @raajChit. I'm Dosu, and I'm helping the LangChain team manage their backlog. I'm marking this issue as stale. Issue Summary:
Next Steps:
Thank you for your understanding and contribution! |
This is a major issue. Without caching it's just too slow and too expensive. I tried:
The only thing that does get cached is the system prompt. |
@eyurtsev, the user has indicated that the issue regarding prompt caching is still a major concern, as it significantly impacts performance and cost. Could you please assist them with this matter? |
@raajChit @drorm @eyurtsev ; I'm struggling with this one too. I have 80k token prompt that has some placeholders there. The moment I use It is not possible by design to cache that because it is not static prompt?? |
I've posted a discussion on #29747 to enable the feature mentioned in this issue. |
After initial investigation, it appears the only method for passing arbitrary key-value pairs (e.g., An example as a reference: messages = [
HumanMessage(
content=[{
"type": "text",
"text": TRANSCRIPT,
"cache_control": {
"type": "ephemeral",
}
}],
),
HumanMessage(
content="Summarize the transcript in 2-3 sentences.",
),
]
response = model.invoke(messages)
response.usage_metadata Now when working with in langchain/libs/core/langchain_core/prompts/chat.py Lines 523 to 538 in 33354f9
in langchain/libs/core/langchain_core/prompts/chat.py Lines 647 to 658 in 33354f9
So you can see that when it receives a list of dict with One solution I can think of with minimal change is to store these additional properties while creating an instance of @baskaryan @ccurme If this sounds good then I can potentially open a PR for this |
I have a partial solution. It reduced my cost by 2/3, but there's still some work to do here.
This results in:
I'm planning on fixing this in drorm/vmpilot#27. Subscribe if you want to keep track. I expect an improvement of another 20%-40%, but won't know for sure till I've implemented it. I didn't submit this as a pull request because this needs much more work and testing for a generalized solution. For my purpose, this works fine. |
@0ca this was working before as well but great to see it being added to the official docs as an example. This issue actually focused on it working with Right now to work around this we manually call format_messages first and then add back the |
…Template The ChatPromptTemplate previously did not preserve additional fields like `cache_control` when formatting messages, which prevented using Anthropic's prompt caching feature. This change: - Introduces `_PromptBlockWrapper` class to preserve additional fields in content blocks - Modifies message template formatting to maintain field structure - Preserves arbitrary fields (e.g. cache_control) in both text and image content - Works with async operations and partial variables - Adds comprehensive test coverage for field preservation Fixes langchain-ai#26701
This is all working correctly for me now, but I had to hack the code. Here's a log demonstrating it:
Notice how the second exchange continues 'cache_read_input_tokens': 11538, which is the sum of the last message in the previous exchange. What I did: I've gone up to 50K cached tokens, but starting around 20K - 30K the quality starts degrading and the speed becomes painful and I start seeing timeouts. |
Still doesnt work for me |
This should be closed by #30967 upon release. Here's an example: Define prompt: from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate(
[
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are a technology expert.",
},
{
"type": "text",
"text": "{context}",
"cache_control": {"type": "ephemeral"},
},
],
},
{
"role": "user",
"content": "{query}",
},
]
) Usage: import requests
from langchain_anthropic import ChatAnthropic
llm = ChatAnthropic(model="claude-3-7-sonnet-20250219")
# Pull LangChain readme
get_response = requests.get(
"https://raw.githubusercontent.com/langchain-ai/langchain/master/README.md"
)
readme = get_response.text
chain = prompt | llm
response_1 = chain.invoke(
{
"context": readme,
"query": "What's LangChain, according to its README?",
}
)
response_2 = chain.invoke(
{
"context": readme,
"query": "Extract a link to the LangChain tutorials.",
}
)
usage_1 = response_1.usage_metadata["input_token_details"]
usage_2 = response_2.usage_metadata["input_token_details"]
print(f"First invocation:\n{usage_1}")
print(f"\nSecond:\n{usage_2}")
# First invocation:
# {'cache_read': 0, 'cache_creation': 1519}
# Second:
# {'cache_read': 1519, 'cache_creation': 0} |
URL
https://python.langchain.com/docs/how_to/llm_caching/
Checklist
Issue with current documentation:
I have not found any documentation for prompt caching in the langchain documentation. There seems to be only one post on twitter regarding prompt caching in langchain. I am trying to implement prompt caching in my rag system. I am using history aware retriever.
I have instantiated the model like this:
llm_claude = ChatAnthropic(
model="claude-3-5-sonnet-20240620",
temperature=0.1,
extra_headers={"anthropic-beta": "prompt-caching-2024-07-31"}
)
And using the ChatPromptTemplate like this:
contextualize_q_prompt = ChatPromptTemplate.from_messages(
[
("system", contextualize_q_system_prompt),
("human", "{input}"),
]
)
I am not able to find a way to include prompt caching with this.
I tried making the prompt like this, but still doesnt work.
prompt = ChatPromptTemplate.from_messages([
SystemMessage(content=contextualize_q_system_prompt, additional_kwargs={"cache_control": {"type": "ephemeral"}}),
HumanMessage(content= "{input}")
])
Please help me with how I should enable prompt caching in langchain.
Idea or request for content:
Langchain documentation should be updated with how to use prompt caching with different prompt templates. And especially with a RAG system.
The text was updated successfully, but these errors were encountered: