Releases: Azure-Samples/azure-search-openai-demo
2025-04-02: Support for reasoning models and token usage display
You can now optionally use a reasoning model (o1 or o3-mini) for all chat completion requests, following the reasoning guide.
When using a reasoning model, you can select the reasoning effort (low/medium/high):
For all models, you can now see token usage in the "Thought process" tab:
Reasoning models incur more latency, due to the thinking process, so it is an option for developers to try, but not necessarily what you want to use for most RAG domains.
This PR also includes several fixes for performance, Windows support, and deployment.
What's Changed
- Add quotes to azd env set by @mattgotteiner in #2413
- Upgrade ms graph SDK packages to remove pendulum dependency by @pamelafox in #2454
- Reduce list to only the available ones for gpt-4o-mini/Standard by @pamelafox in #2459
- Add support for reasoning models and token usage display by @mattgotteiner in #2448
- Upgrade prompty by @pamelafox in #2475
Full Changelog: 2025-03-26...2025-04-02
2025-03-26: Removal of conversation truncation logic
Previously, we had logic that would truncate conversation history by counting the tokens (with tiktoken) and only keeping the messages that fit inside the context window. Now that we are using a model with a higher context window (128K) and most models have that high limit, we have removed that truncation logic, so all conversations will be sent in full to the model.
See the pull request for more reasoning behind the decision.
## What's Changed
- Remove token-counting library for conversation history truncation by @pamelafox in #2449
Full Changelog: 2025-03-25...2025-03-26
2025-03-25: Chat completion model is gpt-4o-mini by default
The infrastructure for this project was previously deploying a gpt-35-turbo model. We have since upgraded to the more recent gpt-4o-mini model, which has a much higher context window (128K) and cheaper per-token costs.
In terms of performance, it gives similarly accurate responses, but it does tend to produce more verbose responses. You can see the comparisons on the sample data in the evals folder, and you can read my blog post summarizing the differences. You may want to adjust the prompt to generate shorter results if you find the new answers to be too verbose.
For developers with existing deployments, it will continue to use gpt-35-turbo. You can follow the steps in the docs to use gpt-4o-mini or other models.
## What's Changed
- Port to gpt-4o-mini as default by @pamelafox in #2443
Full Changelog: 2025-03-21...2025-03-25
Container apps deployment now allows scaling to zero
To lower costs for developers experimenting, we've adjusted the scaling rules for the container apps deployment. See the productionizing guide for tips of what to change if you're preparing code based on this repository for production:
https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/productionizing.md#azure-container-apps
What's Changed
- Adjust container apps to scale to zero by @pamelafox in #2440
- Bump dompurify from 3.2.0 to 3.2.4 in /app/frontend by @dependabot in #2363
- Bump react-i18next from 15.1.1 to 15.4.1 in /app/frontend by @dependabot in #2376
Full Changelog: 2025-03-19...2025-03-21
2025-03-19: Query rewriting from Azure AI Search
This release adds a new optional feature, the query rewriting option from Azure AI Search. This is distinct from the already existing query rewriting step in our RAG flows, which incorporates conversation history. The query rewriting from Azure AI Search focuses on expanding the query to semantically similar queries that can improve retrieval.
Learn more from the search team in this blog post:
https://techcommunity.microsoft.com/blog/azure-ai-services-blog/raising-the-bar-for-rag-excellence-query-rewriting-and-new-semantic-ranker/4302729
Enable the feature following the documentation:
https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/docs/deploy_features.md#enabling-query-rewriting
What's Changed
- Add auth-related azd env variable checks and improve docs by @pamelafox in #2386
- Upgrade to latest GA API Version by @pamelafox in #2334
- Upgrade Ubuntu runner for tests in Github Workflow to
latest
by @egor-yudkin in #2428 - Bump jinja2 from 3.1.5 to 3.1.6 in /app/backend by @dependabot in #2435
- Add query rewriting option by @mattgotteiner in #2437
New Contributors
- @egor-yudkin made their first contribution in #2428
Full Changelog: 2025-02-20...2025-03-19
2025-02-20: Safety evaluations
This project now includes optional AI Safety evaluations, using an Azure AI Project and the Azure Azure AI evaluation SDK.
See documentation for instructons on running the evaluations.
What's Changed
- Upgrading openai and removing numpy dependency by @pamelafox in #2362
- Bump Azure/setup-azd from 2.0.0 to 2.1.0 in the github-actions group by @dependabot in #2366
- AI Safety evaluations (with AI Project provisioning) by @pamelafox in #2370
Full Changelog: 2025-02-13...2025-02-20
2025-02-13: Italian localization
The UI is now available in Italian, so the text will display in Italian if the user's browser is configured accordingly, or if the app has the language picker enabled and the user picks italian.

What's Changed
- Bump cryptography from 44.0.0 to 44.0.1 in /app/backend by @dependabot in #2354
- Improve locust test script by @tonybaloney in #2357
- Fix screenshot for Monitoring doc by @pamelafox in #2355
- Added support for italian language by @ivanvaccarics in #2356
New Contributors
- @ivanvaccarics made their first contribution in #2356
Full Changelog: 2025-02-11...2025-02-13
2025-02-11: Evaluation scripts and workflow
For a long time, we've directed developers to follow the steps in ai-rag-chat-evaluator to run evaluations on this app. To make it easier, we've now integrated evaluation directly into the repository, both as CLI scripts and GitHub Actions workflow.
Learn more from the evaluation guide or watch this video about evaluation.
What's Changed
- Make it easy to run evaluation directly from this repo by @pamelafox in #2233
- Use uv managed python in GHA workflows by @eifinger in #2342
- Evaluation workflow for GitHub Actions by @pamelafox in #2350
New Contributors
Full Changelog: 2025-02-07...2025-02-11
2025-02-07: Upgrade gpt-35-turbo to 0125
Due to the impending deprecation of old gpt-35-turbo models, we upgraded our default version to 0125, the only remaining supported version. We will likely soon change the gpt-35-turbo to gpt-4o-mini, pending some evaluations.
What's Changed
- Default gpt-35-turbo to 0125 by @pamelafox in #2329
Full Changelog: 2025-01-29b...2025-02-07
2025-01-29b: New database schema for Cosmos DB
This release improves the database schema for the Cosmos DB chat history feature, based on discussions with the Cosmos DB team about avoiding excessively large document sizes. This is a breaking change, so if you already have the feature deployed, you'll need to do a full azd up
to create the new container and deploy the new code, and users will not see past chat history.
If you want to migrate the past chat history to the new schema before deploying the change, please file an issue in the tracker, and we will write a migration script from the new container to the older container.
## What's Changed
- Improve schema of CosmosDB chat history to handle long conversations by @pamelafox in #2312
Full Changelog: 2025-01-29a...2025-01-29b