RAG chat: Enabling optional features

This document covers optional features that can be enabled in the deployed Azure resources. You should typically enable these features before running azd up. Once you've set them, return to the deployment steps.

Using different chat completion models
Using reasoning models
Using text-embedding-3 models
Enabling GPT-4 Turbo with Vision
Enabling media description with Azure Content Understanding
Enabling client-side chat history
Enabling persistent chat history with Azure Cosmos DB
Enabling language picker
Enabling speech input/output
Enabling Integrated Vectorization
Enabling authentication
Enabling login and document level access control
Enabling user document upload
Enabling CORS for an alternate frontend
Enabling query rewriting
Adding an OpenAI load balancer
Deploying with private endpoints
Using local parsers

Using different chat completion models

As of late March 2025, the default chat completion model is gpt-4o-mini. If you deployed this sample before that date, the default model is gpt-3.5-turbo. You can change the chat completion model to any Azure OpenAI chat model that's available in your Azure OpenAI resource region by following these steps:

To set the name of the deployment, run this command with a unique name in your Azure OpenAI account. You can use any deployment name, as long as it's unique in your Azure OpenAI account. For convenience, many developers use the same deployment name as the model name, but this is not required.
```
azd env set AZURE_OPENAI_CHATGPT_DEPLOYMENT <your-deployment-name>
```
For example:
```
azd env set AZURE_OPENAI_CHATGPT_DEPLOYMENT gpt-4o
```

To set the GPT model to a different available model, run this command with the appropriate model name.

For GPT-4:

azd env set AZURE_OPENAI_CHATGPT_MODEL gpt-4

For GPT-4o:

azd env set AZURE_OPENAI_CHATGPT_MODEL gpt-4o

For GPT-4o mini:

azd env set AZURE_OPENAI_CHATGPT_MODEL gpt-4o-mini

For gpt-3.5-turbo:

azd env set AZURE_OPENAI_CHATGPT_MODEL gpt-35-turbo

To set the Azure OpenAI model version from the available versions, run this command with the appropriate version string.

For GPT-4:

azd env set AZURE_OPENAI_CHATGPT_DEPLOYMENT_VERSION turbo-2024-04-09

For GPT-4o:

azd env set AZURE_OPENAI_CHATGPT_DEPLOYMENT_VERSION 2024-05-13

For GPT-4o mini:

azd env set AZURE_OPENAI_CHATGPT_DEPLOYMENT_VERSION 2024-07-18

For gpt-3.5-turbo:

azd env set AZURE_OPENAI_CHATGPT_DEPLOYMENT_VERSION 0125

To set the Azure OpenAI deployment SKU name, run this command with the desired SKU name.

For GlobalStandard:

azd env set AZURE_OPENAI_CHATGPT_DEPLOYMENT_SKU GlobalStandard

For Standard:

azd env set AZURE_OPENAI_CHATGPT_DEPLOYMENT_SKU Standard

To set the Azure OpenAI deployment capacity (TPM, measured in thousands of tokens per minute), run this command with the desired capacity. This is not necessary if you are using the default capacity of 30.
```
azd env set AZURE_OPENAI_CHATGPT_DEPLOYMENT_CAPACITY 20
```
To update the deployment with the new parameters, run this command.
```
azd up
```

This process does not delete your previous model deployment. If you want to delete previous deployments, go to your Azure OpenAI resource in Azure AI Foundry and delete it there.

Note

To revert back to a previous model, run the same commands with the previous model name and version.

Using reasoning models

⚠️ This feature is not currently compatible with vision integration.

This feature allows you to use reasoning models to generate responses based on retrieved content. These models spend more time processing and understanding the user's request. To enable reasoning models, follow the steps in the reasoning models guide.

Using text-embedding-3 models

By default, the deployed Azure web app uses the text-embedding-ada-002 embedding model. If you want to use one of the text-embedding-3 models, you can do so by following these steps:

Run one of the following commands to set the desired model:

azd env set AZURE_OPENAI_EMB_MODEL_NAME text-embedding-3-small

azd env set AZURE_OPENAI_EMB_MODEL_NAME text-embedding-3-large

Specify the desired dimensions of the model: (from 256-3072, model dependent)
```
azd env set AZURE_OPENAI_EMB_DIMENSIONS 256
```
Set the model version to "1" (the only version as of March 2024):
```
azd env set AZURE_OPENAI_EMB_DEPLOYMENT_VERSION 1
```
When prompted during azd up, make sure to select a region for the OpenAI resource group location that supports the text-embedding-3 models. There are limited regions available.

If you have already deployed:

You'll need to change the deployment name by running azd env set AZURE_OPENAI_EMB_DEPLOYMENT <new-deployment-name>
You'll need to create a new index, and re-index all of the data using the new model. You can either delete the current index in the Azure Portal, or create an index with a different name by running azd env set AZURE_SEARCH_INDEX new-index-name. When you next run azd up, the new index will be created and the data will be re-indexed.
If your OpenAI resource is not in one of the supported regions, you should delete openAiResourceGroupLocation from .azure/YOUR-ENV-NAME/config.json. When running azd up, you will be prompted to select a new region.

![NOTE] The text-embedding-3 models are not currently supported by the integrated vectorization feature.

Enabling GPT-4 Turbo with Vision

⚠️ This feature is not currently compatible with integrated vectorization.

This section covers the integration of GPT-4 Vision with Azure AI Search. Learn how to enhance your search capabilities with the power of image and text indexing, enabling advanced search functionalities over diverse document types. For a detailed guide on setup and usage, visit our Enabling GPT-4 Turbo with Vision page.

Enabling media description with Azure Content Understanding

⚠️ This feature is not currently compatible with integrated vectorization. It is compatible with GPT vision integration, but the features provide similar functionality.

By default, if your documents contain image-like figures, the data ingestion process will ignore those figures, so users will not be able to ask questions about them.

You can optionably enable the description of media content using Azure Content Understanding. When enabled, the data ingestion process will send figures to Azure Content Understanding and replace the figure with the description in the indexed document.

To enable media description with Azure Content Understanding, run:

azd env set USE_MEDIA_DESCRIBER_AZURE_CU true

If you have already run azd up, you will need to run azd provision to create the new Content Understanding service. If you have already indexed your documents and want to re-index them with the media descriptions, first remove the existing documents and then re-ingest the data.

⚠️ This feature does not yet support DOCX, PPTX, or XLSX formats. If you have figures in those formats, they will be ignored. Convert them first to PDF or image formats to enable media description.

Enabling client-side chat history

📺 Watch: (RAG Deep Dive series) Storing chat history

This feature allows users to view the chat history of their conversation, stored in the browser using IndexedDB. That means the chat history will be available only on the device where the chat was initiated. To enable browser-stored chat history, run:

azd env set USE_CHAT_HISTORY_BROWSER true

Enabling persistent chat history with Azure Cosmos DB

📺 Watch: (RAG Deep Dive series) Storing chat history

This feature allows authenticated users to view the chat history of their conversations, stored in the server-side storage using Azure Cosmos DB.This option requires that authentication be enabled. The chat history will be persistent and accessible from any device where the user logs in with the same account. To enable server-stored chat history, run:

azd env set USE_CHAT_HISTORY_COSMOS true

When both the browser-stored and Cosmos DB options are enabled, Cosmos DB will take precedence over browser-stored chat history.

Enabling language picker

You can optionally enable the language picker to allow users to switch between different languages. Currently, it supports English, Spanish, French, and Japanese.

To add support for additional languages, create new locale files and update app/frontend/src/i18n/config.ts accordingly. To enable language picker, run:

azd env set ENABLE_LANGUAGE_PICKER true

Enabling speech input/output

📺 Watch a short video of speech input/output

You can optionally enable speech input/output by setting the azd environment variables.

Speech Input

The speech input feature uses the browser's built-in Speech Recognition API. It may not work in all browser/OS combinations. To enable speech input, run:

azd env set USE_SPEECH_INPUT_BROWSER true

Speech Output

The speech output feature uses Azure Speech Service for speech-to-text. Additional costs will be incurred for using the Azure Speech Service. See pricing. To enable speech output, run:

azd env set USE_SPEECH_OUTPUT_AZURE true

To set the voice for the speech output, run:

azd env set AZURE_SPEECH_SERVICE_VOICE en-US-AndrewMultilingualNeural

Alternatively you can use the browser's built-in Speech Synthesis API. It may not work in all browser/OS combinations. To enable speech output, run:

azd env set USE_SPEECH_OUTPUT_BROWSER true

Enabling Integrated Vectorization

⚠️ This feature is not currently compatible with the GPT vision integration.

Azure AI search recently introduced an integrated vectorization feature in preview mode. This feature is a cloud-based approach to data ingestion, which takes care of document format cracking, data extraction, chunking, vectorization, and indexing, all with Azure technologies.

To enable integrated vectorization with this sample:

If you've previously deployed, delete the existing search index. 🗑️
To enable the use of integrated vectorization, run:
```
azd env set USE_FEATURE_INT_VECTORIZATION true
```
If you've already deployed your app, then you can run just the provision step:
```
azd provision
```
That will set up necessary RBAC roles and configure the integrated vectorization feature on your search service.

If you haven't deployed your app yet, then you should run the full azd up after configuring all optional features.
You can view the resources such as the indexer and skillset in Azure Portal and monitor the status of the vectorization process.

Enabling authentication

By default, the deployed Azure web app will have no authentication or access restrictions enabled, meaning anyone with routable network access to the web app can chat with your indexed data. If you'd like to automatically setup authentication and user login as part of the azd up process, see this guide.

Alternatively, you can manually require authentication to your Azure Active Directory by following the Add app authentication tutorial and set it up against the deployed web app.

To then limit access to a specific set of users or groups, you can follow the steps from Restrict your Microsoft Entra app to a set of users by changing "Assignment Required?" option under the Enterprise Application, and then assigning users/groups access. Users not granted explicit access will receive the error message -AADSTS50105: Your administrator has configured the application <app_name> to block users unless they are specifically granted ('assigned') access to the application.-

Enabling login and document level access control

By default, the deployed Azure web app allows users to chat with all your indexed data. You can enable an optional login system using Azure Active Directory to restrict access to indexed data based on the logged in user. Enable the optional login and document level access control system by following this guide.

Enabling user document upload

You can enable an optional user document upload system to allow users to upload their own documents and chat with them. This feature requires you to first enable login and document level access control. Then you can enable the optional user document upload system by setting an azd environment variable:

azd env set USE_USER_UPLOAD true

Then you'll need to run azd up to provision an Azure Data Lake Storage Gen2 account for storing the user-uploaded documents. When the user uploads a document, it will be stored in a directory in that account with the same name as the user's Entra object id, and will have ACLs associated with that directory. When the ingester runs, it will also set the oids of the indexed chunks to the user's Entra object id.

If you are enabling this feature on an existing index, you should also update your index to have the new storageUrl field:

python ./scripts/manageacl.py  -v --acl-action enable_acls

And then update existing search documents with the storage URL of the main Blob container:

python ./scripts/manageacl.py  -v --acl-action update_storage_urls --url <https://YOUR-MAIN-STORAGE-ACCOUNT.blob.core.windows.net/content/>

Going forward, all uploaded documents will have their storageUrl set in the search index. This is necessary to disambiguate user-uploaded documents from admin-uploaded documents.

Enabling CORS for an alternate frontend

By default, the deployed Azure web app will only allow requests from the same origin. To enable CORS for a frontend hosted on a different origin, run:

Run azd env set ALLOWED_ORIGIN https://<your-domain.com>
Run azd up

For the frontend code, change BACKEND_URI in api.ts to point at the deployed backend URL, so that all fetch requests will be sent to the deployed backend.

For an alternate frontend that's written in Web Components and deployed to Static Web Apps, check out azure-search-openai-javascript and its guide on using a different backend. Both these repositories adhere to the same HTTP protocol for AI chat apps.

Enabling query rewriting

By default, the query rewriting feature from the Azure AI Search service is not enabled. Note that the search service query rewriting feature is different from the query rewriting step that is used by the Chat tab in the codebase. The in-repo query rewriting step also incorporates conversation history, while the search service query rewriting feature only considers the query itself. To enable search service query rewriting, set the following environment variables:

Check that your Azure AI Search service is using one of the supported regions for query rewriting.
Ensure semantic ranker is enabled. Query rewriting may only be used with semantic ranker. Run azd env set AZURE_SEARCH_SEMANTIC_RANKER free or azd env set AZURE_SEARCH_SEMANTIC_RANKER standard depending on your desired semantic ranker tier.
Enable query rewriting. Run azd env set AZURE_SEARCH_QUERY_REWRITING true. An option in developer settings will appear allowing you to toggle query rewriting on and off. It will be on by default.

Adding an OpenAI load balancer

As discussed in more details in our productionizing guide, you may want to consider implementing a load balancer between OpenAI instances if you are consistently going over the TPM limit. Fortunately, this repository is designed for easy integration with other repositories that create load balancers for OpenAI instances. For seamless integration instructions with this sample, please check:

Scale Azure OpenAI for Python with Azure API Management
Scale Azure OpenAI for Python chat using RAG with Azure Container Apps

Deploying with private endpoints

It is possible to deploy this app with public access disabled, using Azure private endpoints and private DNS Zones. For more details, read the private deployment guide. That requires a multi-stage provisioning, so you will need to do more than just azd up after setting the environment variables.

Using local parsers

If you want to decrease the charges by using local parsers instead of Azure Document Intelligence, you can set environment variables before running the data ingestion script. Note that local parsers will generally be not as sophisticated.

Run azd env set USE_LOCAL_PDF_PARSER true to use the local PDF parser.
Run azd env set USE_LOCAL_HTML_PARSER true to use the local HTML parser.

The local parsers will be used the next time you run the data ingestion script. To use these parsers for the user document upload system, you'll need to run azd provision to update the web app to use the local parsers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deploy_features.md

deploy_features.md

RAG chat: Enabling optional features

Using different chat completion models

Using reasoning models

Using text-embedding-3 models

Enabling GPT-4 Turbo with Vision

Enabling media description with Azure Content Understanding

Enabling client-side chat history

Enabling persistent chat history with Azure Cosmos DB

Enabling language picker

Enabling speech input/output

Speech Input

Speech Output

Enabling Integrated Vectorization

Enabling authentication

Enabling login and document level access control

Enabling user document upload

Enabling CORS for an alternate frontend

Enabling query rewriting

Adding an OpenAI load balancer

Deploying with private endpoints

Using local parsers

Files

deploy_features.md

Latest commit

History

deploy_features.md

File metadata and controls

RAG chat: Enabling optional features

Using different chat completion models

Using reasoning models

Using text-embedding-3 models

Enabling GPT-4 Turbo with Vision

Enabling media description with Azure Content Understanding

Enabling client-side chat history

Enabling persistent chat history with Azure Cosmos DB

Enabling language picker

Enabling speech input/output

Speech Input

Speech Output

Enabling Integrated Vectorization

Enabling authentication

Enabling login and document level access control

Enabling user document upload

Enabling CORS for an alternate frontend

Enabling query rewriting

Adding an OpenAI load balancer

Deploying with private endpoints

Using local parsers