You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+21-12Lines changed: 21 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -69,15 +69,15 @@ Our goal is to ground the LLM in our custom data (located in src > indexing > da
69
69
70
70
### Step 3a: Create a new index
71
71
72
-
This step uses vector search with Azure OpenAI embeddings (e.g., ada-002) to encode your documents. First, you need to allow your Azure AI search resource to access your AI OpenAI resource in these roles:
72
+
This step uses vector search with Azure OpenAI embeddings (e.g., ada-002) to encode your documents. First, you need to allow your Azure AI Search resource to access your Azure OpenAI resource in these roles:
73
73
74
-
- Cognitive Services OpenAI Contributor
74
+
- Cognitive Services OpenAI Contributor
75
75
- Cognitive Services Contributor
76
76
- (optionally if you need quota view) Cognitive Services Usages Reader
77
77
78
-
Follow instruction on https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/role-based-access-control to add role assignment in your AI OpenAI resource.
78
+
Follow instructions on https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/role-based-access-control to add role assignment in your Azure OpenAI resource.
79
79
80
-
The following is a script to streamline index creation. It build the search index locally, and publishes it to your AI Studio project in the cloud.
80
+
Next, run the following script designed to streamline index creation. It build the search index locally, and publishes it to your AI Studio project in the cloud.
@@ -124,7 +124,7 @@ Evaluation is a key part of developing a copilot application. Once you have vali
124
124
125
125
Evaluation relies on an evaluation dataset. In this case, we have an evaluation dataset with chat_input, and then a target function that adds the LLM response and context to the evaluation dataset before running the evaluations.
126
126
127
-
Optionally, if you want to log your code traces and evaluation results on AI studio, run the following command. Make sure you have logged in Azure CLI (az login, refer to Azure CLI doc for more informations) before execute below CLI command:
127
+
We recommend logging your traces and evaluation results on AI studio. To do so, run the following command. Make sure you have logged in Azure CLI (az login, refer to Azure CLI doc for more informations) before executing it:
128
128
```bash
129
129
pf config set trace.destination=azureml://subscriptions/<subscription-id>/resourcegroups/<resource-group-name>/providers/Microsoft.MachineLearningServices/workspaces/<project-name>
130
130
```
@@ -146,23 +146,32 @@ This command generates one single custom evaluator called "Completeness" on a mu
To run safety evaluations, you need to first simulate adversarial datasets in evaluation/simulate_and_evaluate_online_endpoints.ipynb with step-by-step explanations (which requires a deployed endpoint of a copilot application where you can deploy the local copilot_flow by jumping to **Step 6**or supply your own application endpoint). The simulator calls will generate a baseline and a jailbreak dataset at the end, which will be saved to local `adv_qa_outputs.jsonl` and `adv_qa_jailbreak_outputs.jsonl`. Learn more about our built-in safety metrics [here](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-metrics-built-in?tabs=warning#risk-and-safety-metrics). Alternatively, you can provide your own safety dataset.
149
+
To run safety evaluations, you need to first simulate adversarial datasets (or provide your own) and then evaluate your copilot on the datasets.
150
150
151
-
This command generates a safety evaluation on the baseline dataset on four safety metrics (self-harm, violence, sexual, hate and unfairness).
151
+
To simulate, run evaluation/simulate_and_evaluate_online_endpoints.ipynb with step-by-step explanations. The notebook requires a deployed endpoint of a copilot application, for which you can deploy the local copilot_flow by jumping to **Step 6** or supply your own application endpoint. The simulator calls will generate a baseline and a jailbreak dataset for built-in content harm metrics, which will be saved to local `adv_qa_pairs.jsonl` and `adv_qa_jailbreak_pairs.jsonl`. Learn more about our built-in safety metrics [here](https://learn.microsoft.com/en-us/azure/ai-studio/concepts/evaluation-metrics-built-in?tabs=warning#risk-and-safety-metrics).
152
+
153
+
> [!NOTE]
154
+
> To measure defect rates on jailbreak attacks, compare the content harm defect rates on the baseline and the jailbreak dataset, and the differences in the defect rates constitute the defect rates for jailbreak attacks. That is, how likely your copilot will be jailbroken to surface harmful content if malicious prompts are injected into your already adversarial user queries.
155
+
156
+
To evaluate your copilot, run this command to generate a safety evaluation on the baseline dataset for the four built-in content harm metrics (self-harm, violence, sexual, hate and unfairness).
We recommend viewing your evaluation results in the Azure AI Studio, to compare evaluation runs with different prompts, or even different models. Compare the harmful content defect rates on the baseline and the jailbreak dataset, and the differences in the defect rates constitute the defect rates for jailbreaks. That is, how likely your copilot will be jailbroken to surface harmful content if malicious prompts are into your user queries. The _evaluate.py_ script is set up to log your evaluation results to your AI Studio project.
168
+
We recommend viewing your evaluation results in the Azure AI Studio, to compare evaluation runs with different prompts, or even different models. The _evaluate.py_ script is set up to log your evaluation results to your AI Studio project.
164
169
165
-
If you do not want to log evaluation results to your AI Studio project, you can modify the _evaluation.py_ script to not pass the azure_ai_project parameter.
170
+
If you do not want to log evaluation results to your AI Studio project, set the destination value to "local" for local logging, or "none" to disable the logging feature entirely.
171
+
172
+
```bash
173
+
pf config set trace.destination=<"local" or "none">
0 commit comments