Skip to content

Remove token-counting library for conversation history truncation #2449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 26, 2025

Conversation

pamelafox
Copy link
Collaborator

@pamelafox pamelafox commented Mar 26, 2025

Purpose

This PR removes the functionality that reduced conversation size in context windows by counting tokens.

Reasons for removal:

  • We now default to a model with a 128K token limit, so it is much less likely for chatting users to hit the limit. (128K is about 128K words in English, 30K words in languages with the highest token:word ratio).
  • The token-counting approach has been difficult to maintain since we've had to reverse-engineer the token-counting scheme that OpenAI uses behind the scenes, and it only works with tiktoken currently. If we remove token-counting, the code will more smoothly work with non-OpenAI models.
  • I've realized that the current approach of "first-in-first-out" isn't necessarily the best way to truncate conversation history, according to some research, as sometimes the start of a conversation has important context. Another approach is to summarize the conversation history. We could introduce that in the future, but I think we'd only want it for sufficiently long conversations, since it does add an extra LLM call and therefore latency.
  • We're adding support for reasoning models, and we can't estimate the reasoning tokens that will be used by those models.

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[X] No

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[X] No

Type of change

[ ] Bugfix
[X] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

  • The current tests all pass (python -m pytest).
  • I added tests that prove my fix is effective or that my feature works
  • I ran python -m pytest --cov to verify 100% coverage of added lines
  • I ran python -m mypy to check for type errors
  • I either used the pre-commit hooks or ran ruff and black manually on my code.

@pamelafox pamelafox marked this pull request as ready for review March 26, 2025 21:27
@@ -59,7 +59,7 @@ jobs:
run: black . --check --verbose
- name: Run Python tests
if: runner.os != 'Windows'
run: pytest -s -vv --cov --cov-fail-under=86
run: pytest -s -vv --cov --cov-fail-under=89
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🥳

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know what's even better? Our Windows tests almost pass! Tiktoken was the main thing holding them back. There are about 6 that fail now due to how relative file URLs are handled with test files, but we could get those working. Anyway thats for another PR.

@@ -27,7 +27,6 @@ PyMuPDF
beautifulsoup4
types-beautifulsoup4
msgraph-sdk
openai-messages-token-helper
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

@@ -238,7 +239,7 @@ async def validate_qr_and_mock_search(*args, **kwargs):
use_vector_search=True,
use_semantic_ranker=True,
use_semantic_captions=True,
use_query_rewrites=True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this test

@pamelafox pamelafox merged commit cb5149d into Azure-Samples:main Mar 26, 2025
11 checks passed
@pamelafox pamelafox deleted the removetokencounting branch March 26, 2025 22:40
@TaylorN15
Copy link

@pamelafox won't this now cause an exception if the conversation exceeds the token limit? I know it's unlikely, but I have encountered this within my organisation especially when people are conversing about code.

@pamelafox
Copy link
Collaborator Author

@TaylorN15 It should cause an exception that gets handled here:
https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/app/backend/error.py#L12
The users should see a helpful message, and they can start a new chat in that case.
If you're not seeing it handled well, let me know.

Have they managed to run over it with 128K tokens, or only with the older models?

@TaylorN15
Copy link

Ah I missed that, thanks. Yes I've had people go over the 128K which is pretty crazy. I'm not using this code base directly as it is, mine is highly customised (I.e I have handling in CosmosDB for conversation that exceed 2MB, caused by this same issue).

@pamelafox
Copy link
Collaborator Author

@TaylorN15 Good to know! Btw, we did recently change the CosmosDB to only store a message per-document, to avoid hitting the 2MB limit.
Given that we also support chat history now, my hope is that people are okay with starting a new chat in that case. If not, we'll have to add a summarization step, and maybe base it off a heuristic to decide when to do it (since it wouldnt be needed in most cases).

@gaborvar
Copy link

token counting helper library served this repo very well for a good period of time. I hope it finds utilisation elsewhere. RIP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants