Remove token-counting library for conversation history truncation #2449

pamelafox · 2025-03-26T06:32:13Z

Purpose

This PR removes the functionality that reduced conversation size in context windows by counting tokens.

Reasons for removal:

We now default to a model with a 128K token limit, so it is much less likely for chatting users to hit the limit. (128K is about 128K words in English, 30K words in languages with the highest token:word ratio).
The token-counting approach has been difficult to maintain since we've had to reverse-engineer the token-counting scheme that OpenAI uses behind the scenes, and it only works with tiktoken currently. If we remove token-counting, the code will more smoothly work with non-OpenAI models.
I've realized that the current approach of "first-in-first-out" isn't necessarily the best way to truncate conversation history, according to some research, as sometimes the start of a conversation has important context. Another approach is to summarize the conversation history. We could introduce that in the future, but I think we'd only want it for sufficiently long conversations, since it does add an extra LLM call and therefore latency.
We're adding support for reasoning models, and we can't estimate the reasoning tokens that will be used by those models.

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[X] No

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[X] No

Type of change

[ ] Bugfix
[X] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

The current tests all pass (python -m pytest).
I added tests that prove my fix is effective or that my feature works
I ran python -m pytest --cov to verify 100% coverage of added lines
I ran python -m mypy to check for type errors
I either used the pre-commit hooks or ran ruff and black manually on my code.

mattgotteiner · 2025-03-26T22:37:15Z

.github/workflows/python-test.yaml

@@ -59,7 +59,7 @@ jobs:
          run: black . --check --verbose
        - name: Run Python tests
          if: runner.os != 'Windows'
-          run: pytest -s -vv --cov --cov-fail-under=86
+          run: pytest -s -vv --cov --cov-fail-under=89


You know what's even better? Our Windows tests almost pass! Tiktoken was the main thing holding them back. There are about 6 that fail now due to how relative file URLs are handled with test files, but we could get those working. Anyway thats for another PR.

mattgotteiner · 2025-03-26T22:38:00Z

app/backend/requirements.in

@@ -27,7 +27,6 @@ PyMuPDF
 beautifulsoup4
 types-beautifulsoup4
 msgraph-sdk
-openai-messages-token-helper


mattgotteiner · 2025-03-26T22:38:33Z

tests/test_chatapproach.py

@@ -238,7 +239,7 @@ async def validate_qr_and_mock_search(*args, **kwargs):
        use_vector_search=True,
        use_semantic_ranker=True,
        use_semantic_captions=True,
-        use_query_rewrites=True,


Thanks for fixing this test

TaylorN15 · 2025-03-31T01:23:32Z

@pamelafox won't this now cause an exception if the conversation exceeds the token limit? I know it's unlikely, but I have encountered this within my organisation especially when people are conversing about code.

pamelafox · 2025-03-31T06:27:50Z

@TaylorN15 It should cause an exception that gets handled here:
https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/app/backend/error.py#L12
The users should see a helpful message, and they can start a new chat in that case.
If you're not seeing it handled well, let me know.

Have they managed to run over it with 128K tokens, or only with the older models?

TaylorN15 · 2025-03-31T06:29:57Z

Ah I missed that, thanks. Yes I've had people go over the 128K which is pretty crazy. I'm not using this code base directly as it is, mine is highly customised (I.e I have handling in CosmosDB for conversation that exceed 2MB, caused by this same issue).

pamelafox · 2025-03-31T16:49:09Z

@TaylorN15 Good to know! Btw, we did recently change the CosmosDB to only store a message per-document, to avoid hitting the 2MB limit.
Given that we also support chat history now, my hope is that people are okay with starting a new chat in that case. If not, we'll have to add a summarization step, and maybe base it off a heuristic to decide when to do it (since it wouldnt be needed in most cases).

gaborvar · 2025-04-16T09:21:52Z

token counting helper library served this repo very well for a good period of time. I hope it finds utilisation elsewhere. RIP.

pamelafox added 3 commits March 25, 2025 23:27

Simplify RenderedPrompt class

321a114

Fix reference

418bf5b

Remove (EXAMPLE) from prompts

01e348b

pamelafox mentioned this pull request Mar 26, 2025

get_token_limit, build_messages for o series model pamelafox/openai-messages-token-helper#24

Open

pamelafox added 3 commits March 26, 2025 11:55

Remove long history test

731fc8e

Fix test with missing async, run on Windows too

ee458ca

Remove Windows again as a few tests fail

1edde11

pamelafox marked this pull request as ready for review March 26, 2025 21:27

mattgotteiner reviewed Mar 26, 2025

View reviewed changes

mattgotteiner approved these changes Mar 26, 2025

View reviewed changes

mattgotteiner reviewed Mar 26, 2025

View reviewed changes

pamelafox merged commit cb5149d into Azure-Samples:main Mar 26, 2025
11 checks passed

pamelafox deleted the removetokencounting branch March 26, 2025 22:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove token-counting library for conversation history truncation #2449

Remove token-counting library for conversation history truncation #2449

pamelafox commented Mar 26, 2025 •

edited

Loading

mattgotteiner Mar 26, 2025

pamelafox Mar 26, 2025

mattgotteiner Mar 26, 2025

mattgotteiner Mar 26, 2025

TaylorN15 commented Mar 31, 2025

pamelafox commented Mar 31, 2025

TaylorN15 commented Mar 31, 2025

pamelafox commented Mar 31, 2025

gaborvar commented Apr 16, 2025

Remove token-counting library for conversation history truncation #2449

Remove token-counting library for conversation history truncation #2449

Conversation

pamelafox commented Mar 26, 2025 • edited Loading

Purpose

Reasons for removal:

Does this introduce a breaking change?

Does this require changes to learn.microsoft.com docs?

Type of change

Code quality checklist

mattgotteiner Mar 26, 2025

Choose a reason for hiding this comment

pamelafox Mar 26, 2025

Choose a reason for hiding this comment

mattgotteiner Mar 26, 2025

Choose a reason for hiding this comment

mattgotteiner Mar 26, 2025

Choose a reason for hiding this comment

TaylorN15 commented Mar 31, 2025

pamelafox commented Mar 31, 2025

TaylorN15 commented Mar 31, 2025

pamelafox commented Mar 31, 2025

gaborvar commented Apr 16, 2025

pamelafox commented Mar 26, 2025 •

edited

Loading