-
Notifications
You must be signed in to change notification settings - Fork 4.7k
AI Safety evaluations (with AI Project provisioning) #2370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
evals/safety_evaluation.py:123
- This division operation could raise a ZeroDivisionError if summary_scores[evaluator]['low_count'] is zero. Consider adding a check to handle a zero denominator or use an alternative calculation that avoids division by zero.
summary_scores[evaluator]["mean_score"] = summary_scores[evaluator]["score_total"] / summary_scores[evaluator]["low_count"]
Co-authored-by: Copilot <[email protected]>
@pamelafox
|
@TaylorN15 good point! In another repo, I think I just hardcoded the location to one of those, to avoid adding yet-another-location-parameter to the deployment step. I could do that and add an azd env variable to override the location? |
I just meant that anyone using resources outside of those locations (like me), it won't work for them. I deployed an AI Hub to East US 2 for testing :) |
Purpose
Fixes #2262
This PR uses the Azure AI evaluation SDK to simulate adversarial users and evaluate the results. I intentionally do not store the simulation results in the repo due to their often disturbing question content, and I only store the overall safety results.
Our baseline RAG app achieves 100% safety (all scores are "Low" or "Very low") in the 200 simulations that I ran. Yay!
Does this introduce a breaking change?
When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.
Does this require changes to learn.microsoft.com docs?
This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.
Type of change
Code quality checklist
See CONTRIBUTING.md for more details.
python -m pytest
).python -m pytest --cov
to verify 100% coverage of added linespython -m mypy
to check for type errorsruff
andblack
manually on my code.