-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Evaluation workflow for GitHub Actions #2350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 38 commits
d593dc2
a7bbd8b
d7bdddd
a8b2beb
9fbf046
bfb1b9d
f28283c
f6dd98b
0cac252
49b66e5
e564b24
a2c8469
9916bfc
d7cc6fa
934129c
68f9abe
7c95d88
7b022b8
f932ef9
550ee3f
feb7a00
36121f6
1a3e00e
1050b50
d07c263
f11813f
a076539
182c310
13b3f78
340a411
86bd5eb
c51afed
a197f3c
062e9b8
c4861fe
d7b105d
f4a7334
8d28207
c7dae8e
b4f30eb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,243 @@ | ||
name: Evaluate RAG answer flow | ||
|
||
on: | ||
issue_comment: | ||
types: [created] | ||
|
||
# Set up permissions for deploying with secretless Azure federated credentials | ||
# https://learn.microsoft.com/azure/developer/github/connect-from-azure?tabs=azure-portal%2Clinux#set-up-azure-login-with-openid-connect-authentication | ||
permissions: | ||
id-token: write | ||
contents: read | ||
issues: write | ||
pull-requests: write | ||
|
||
jobs: | ||
evaluate: | ||
if: | | ||
contains('["OWNER", "CONTRIBUTOR", "COLLABORATOR", "MEMBER"]', github.event.comment.author_association) && | ||
github.event.issue.pull_request && | ||
github.event.comment.body == '/evaluate' | ||
runs-on: ubuntu-latest | ||
env: | ||
# azd required | ||
AZURE_CLIENT_ID: ${{ vars.AZURE_CLIENT_ID }} | ||
AZURE_TENANT_ID: ${{ vars.AZURE_TENANT_ID }} | ||
AZURE_SUBSCRIPTION_ID: ${{ vars.AZURE_SUBSCRIPTION_ID }} | ||
AZURE_ENV_NAME: ${{ vars.AZURE_ENV_NAME }} | ||
AZURE_LOCATION: ${{ vars.AZURE_LOCATION }} | ||
# project specific | ||
AZURE_OPENAI_SERVICE: ${{ vars.AZURE_OPENAI_SERVICE }} | ||
AZURE_OPENAI_LOCATION: ${{ vars.AZURE_OPENAI_LOCATION }} | ||
AZURE_OPENAI_API_VERSION: ${{ vars.AZURE_OPENAI_API_VERSION }} | ||
AZURE_OPENAI_RESOURCE_GROUP: ${{ vars.AZURE_OPENAI_RESOURCE_GROUP }} | ||
AZURE_DOCUMENTINTELLIGENCE_SERVICE: ${{ vars.AZURE_DOCUMENTINTELLIGENCE_SERVICE }} | ||
AZURE_DOCUMENTINTELLIGENCE_RESOURCE_GROUP: ${{ vars.AZURE_DOCUMENTINTELLIGENCE_RESOURCE_GROUP }} | ||
AZURE_DOCUMENTINTELLIGENCE_SKU: ${{ vars.AZURE_DOCUMENTINTELLIGENCE_SKU }} | ||
AZURE_DOCUMENTINTELLIGENCE_LOCATION: ${{ vars.AZURE_DOCUMENTINTELLIGENCE_LOCATION }} | ||
AZURE_COMPUTER_VISION_SERVICE: ${{ vars.AZURE_COMPUTER_VISION_SERVICE }} | ||
AZURE_COMPUTER_VISION_RESOURCE_GROUP: ${{ vars.AZURE_COMPUTER_VISION_RESOURCE_GROUP }} | ||
AZURE_COMPUTER_VISION_LOCATION: ${{ vars.AZURE_COMPUTER_VISION_LOCATION }} | ||
AZURE_COMPUTER_VISION_SKU: ${{ vars.AZURE_COMPUTER_VISION_SKU }} | ||
AZURE_SEARCH_INDEX: ${{ vars.AZURE_SEARCH_INDEX }} | ||
AZURE_SEARCH_SERVICE: ${{ vars.AZURE_SEARCH_SERVICE }} | ||
AZURE_SEARCH_SERVICE_RESOURCE_GROUP: ${{ vars.AZURE_SEARCH_SERVICE_RESOURCE_GROUP }} | ||
AZURE_SEARCH_SERVICE_LOCATION: ${{ vars.AZURE_SEARCH_SERVICE_LOCATION }} | ||
AZURE_SEARCH_SERVICE_SKU: ${{ vars.AZURE_SEARCH_SERVICE_SKU }} | ||
AZURE_SEARCH_QUERY_LANGUAGE: ${{ vars.AZURE_SEARCH_QUERY_LANGUAGE }} | ||
AZURE_SEARCH_QUERY_SPELLER: ${{ vars.AZURE_SEARCH_QUERY_SPELLER }} | ||
AZURE_SEARCH_SEMANTIC_RANKER: ${{ vars.AZURE_SEARCH_SEMANTIC_RANKER }} | ||
AZURE_STORAGE_ACCOUNT: ${{ vars.AZURE_STORAGE_ACCOUNT }} | ||
AZURE_STORAGE_RESOURCE_GROUP: ${{ vars.AZURE_STORAGE_RESOURCE_GROUP }} | ||
AZURE_STORAGE_SKU: ${{ vars.AZURE_STORAGE_SKU }} | ||
AZURE_APP_SERVICE_PLAN: ${{ vars.AZURE_APP_SERVICE_PLAN }} | ||
AZURE_APP_SERVICE_SKU: ${{ vars.AZURE_APP_SERVICE_SKU }} | ||
AZURE_APP_SERVICE: ${{ vars.AZURE_APP_SERVICE }} | ||
AZURE_OPENAI_CHATGPT_MODEL: ${{ vars.AZURE_OPENAI_CHATGPT_MODEL }} | ||
AZURE_OPENAI_CHATGPT_DEPLOYMENT: ${{ vars.AZURE_OPENAI_CHATGPT_DEPLOYMENT }} | ||
AZURE_OPENAI_CHATGPT_DEPLOYMENT_CAPACITY: ${{ vars.AZURE_OPENAI_CHATGPT_DEPLOYMENT_CAPACITY }} | ||
AZURE_OPENAI_CHATGPT_DEPLOYMENT_VERSION: ${{ vars.AZURE_OPENAI_CHATGPT_DEPLOYMENT_VERSION }} | ||
AZURE_OPENAI_EMB_MODEL_NAME: ${{ vars.AZURE_OPENAI_EMB_MODEL_NAME }} | ||
AZURE_OPENAI_EMB_DEPLOYMENT: ${{ vars.AZURE_OPENAI_EMB_DEPLOYMENT }} | ||
AZURE_OPENAI_EMB_DEPLOYMENT_CAPACITY: ${{ vars.AZURE_OPENAI_EMB_DEPLOYMENT_CAPACITY }} | ||
AZURE_OPENAI_EMB_DEPLOYMENT_VERSION: ${{ vars.AZURE_OPENAI_EMB_DEPLOYMENT_VERSION }} | ||
AZURE_OPENAI_EMB_DIMENSIONS: ${{ vars.AZURE_OPENAI_EMB_DIMENSIONS }} | ||
AZURE_OPENAI_GPT4V_MODEL: ${{ vars.AZURE_OPENAI_GPT4V_MODEL }} | ||
AZURE_OPENAI_GPT4V_DEPLOYMENT: ${{ vars.AZURE_OPENAI_GPT4V_DEPLOYMENT }} | ||
AZURE_OPENAI_GPT4V_DEPLOYMENT_CAPACITY: ${{ vars.AZURE_OPENAI_GPT4V_DEPLOYMENT_CAPACITY }} | ||
AZURE_OPENAI_GPT4V_DEPLOYMENT_VERSION: ${{ vars.AZURE_OPENAI_GPT4V_DEPLOYMENT_VERSION }} | ||
AZURE_OPENAI_GPT4V_DEPLOYMENT_SKU: ${{ vars.AZURE_OPENAI_GPT4V_DEPLOYMENT_SKU }} | ||
USE_EVAL: ${{ vars.USE_EVAL }} | ||
AZURE_OPENAI_EVAL_MODEL: ${{ vars.AZURE_OPENAI_EVAL_MODEL }} | ||
AZURE_OPENAI_EVAL_MODEL_VERSION: ${{ vars.AZURE_OPENAI_EVAL_MODEL_VERSION }} | ||
AZURE_OPENAI_EVAL_DEPLOYMENT: ${{ vars.AZURE_OPENAI_EVAL_DEPLOYMENT }} | ||
AZURE_OPENAI_EVAL_DEPLOYMENT_SKU: ${{ vars.AZURE_OPENAI_EVAL_DEPLOYMENT_SKU }} | ||
AZURE_OPENAI_EVAL_DEPLOYMENT_CAPACITY: ${{ vars.AZURE_OPENAI_EVAL_DEPLOYMENT_CAPACITY }} | ||
AZURE_OPENAI_DISABLE_KEYS: ${{ vars.AZURE_OPENAI_DISABLE_KEYS }} | ||
OPENAI_HOST: ${{ vars.OPENAI_HOST }} | ||
OPENAI_API_KEY: ${{ vars.OPENAI_API_KEY }} | ||
OPENAI_ORGANIZATION: ${{ vars.OPENAI_ORGANIZATION }} | ||
AZURE_USE_APPLICATION_INSIGHTS: ${{ vars.AZURE_USE_APPLICATION_INSIGHTS }} | ||
AZURE_APPLICATION_INSIGHTS: ${{ vars.AZURE_APPLICATION_INSIGHTS }} | ||
AZURE_APPLICATION_INSIGHTS_DASHBOARD: ${{ vars.AZURE_APPLICATION_INSIGHTS_DASHBOARD }} | ||
AZURE_LOG_ANALYTICS: ${{ vars.AZURE_LOG_ANALYTICS }} | ||
USE_VECTORS: ${{ vars.USE_VECTORS }} | ||
USE_GPT4V: ${{ vars.USE_GPT4V }} | ||
AZURE_VISION_ENDPOINT: ${{ vars.AZURE_VISION_ENDPOINT }} | ||
VISION_SECRET_NAME: ${{ vars.VISION_SECRET_NAME }} | ||
ENABLE_LANGUAGE_PICKER: ${{ vars.ENABLE_LANGUAGE_PICKER }} | ||
USE_SPEECH_INPUT_BROWSER: ${{ vars.USE_SPEECH_INPUT_BROWSER }} | ||
USE_SPEECH_OUTPUT_BROWSER: ${{ vars.USE_SPEECH_OUTPUT_BROWSER }} | ||
USE_SPEECH_OUTPUT_AZURE: ${{ vars.USE_SPEECH_OUTPUT_AZURE }} | ||
AZURE_SPEECH_SERVICE: ${{ vars.AZURE_SPEECH_SERVICE }} | ||
AZURE_SPEECH_SERVICE_RESOURCE_GROUP: ${{ vars.AZURE_SPEECH_RESOURCE_GROUP }} | ||
AZURE_SPEECH_SERVICE_LOCATION: ${{ vars.AZURE_SPEECH_SERVICE_LOCATION }} | ||
AZURE_SPEECH_SERVICE_SKU: ${{ vars.AZURE_SPEECH_SERVICE_SKU }} | ||
AZURE_SPEECH_SERVICE_VOICE: ${{ vars.AZURE_SPEECH_SERVICE_VOICE }} | ||
AZURE_KEY_VAULT_NAME: ${{ vars.AZURE_KEY_VAULT_NAME }} | ||
AZURE_USE_AUTHENTICATION: ${{ vars.AZURE_USE_AUTHENTICATION }} | ||
AZURE_ENFORCE_ACCESS_CONTROL: ${{ vars.AZURE_ENFORCE_ACCESS_CONTROL }} | ||
AZURE_ENABLE_GLOBAL_DOCUMENT_ACCESS: ${{ vars.AZURE_ENABLE_GLOBAL_DOCUMENT_ACCESS }} | ||
AZURE_ENABLE_UNAUTHENTICATED_ACCESS: ${{ vars.AZURE_ENABLE_UNAUTHENTICATED_ACCESS }} | ||
AZURE_AUTH_TENANT_ID: ${{ vars.AZURE_AUTH_TENANT_ID }} | ||
AZURE_SERVER_APP_ID: ${{ vars.AZURE_SERVER_APP_ID }} | ||
AZURE_CLIENT_APP_ID: ${{ vars.AZURE_CLIENT_APP_ID }} | ||
ALLOWED_ORIGIN: ${{ vars.ALLOWED_ORIGIN }} | ||
AZURE_ADLS_GEN2_STORAGE_ACCOUNT: ${{ vars.AZURE_ADLS_GEN2_STORAGE_ACCOUNT }} | ||
AZURE_ADLS_GEN2_FILESYSTEM_PATH: ${{ vars.AZURE_ADLS_GEN2_FILESYSTEM_PATH }} | ||
AZURE_ADLS_GEN2_FILESYSTEM: ${{ vars.AZURE_ADLS_GEN2_FILESYSTEM }} | ||
DEPLOYMENT_TARGET: ${{ vars.DEPLOYMENT_TARGET }} | ||
AZURE_CONTAINER_APPS_WORKLOAD_PROFILE: ${{ vars.AZURE_CONTAINER_APPS_WORKLOAD_PROFILE }} | ||
USE_CHAT_HISTORY_BROWSER: ${{ vars.USE_CHAT_HISTORY_BROWSER }} | ||
USE_MEDIA_DESCRIBER_AZURE_CU: ${{ vars.USE_MEDIA_DESCRIBER_AZURE_CU }} | ||
steps: | ||
|
||
- name: Comment on pull request | ||
uses: actions/github-script@v7 | ||
with: | ||
script: | | ||
github.rest.issues.createComment({ | ||
issue_number: context.issue.number, | ||
owner: context.repo.owner, | ||
repo: context.repo.repo, | ||
body: "Starting evaluation! Check the Actions tab for progress, or wait for a comment with the results." | ||
}) | ||
|
||
- name: Checkout pull request | ||
uses: actions/checkout@v4 | ||
with: | ||
ref: refs/pull/${{ github.event.issue.number }}/head | ||
|
||
- name: Install uv | ||
uses: astral-sh/setup-uv@v5 | ||
with: | ||
enable-cache: true | ||
version: "0.4.20" | ||
cache-dependency-glob: "requirements**.txt" | ||
python-version: "3.11" | ||
|
||
- name: Setup node | ||
uses: actions/setup-node@v4 | ||
with: | ||
node-version: 18 | ||
|
||
- name: Install azd | ||
uses: Azure/[email protected] | ||
|
||
- name: Login to Azure with az CLI | ||
uses: azure/login@v2 | ||
with: | ||
client-id: ${{ env.AZURE_CLIENT_ID }} | ||
tenant-id: ${{ env.AZURE_TENANT_ID }} | ||
subscription-id: ${{ env.AZURE_SUBSCRIPTION_ID }} | ||
|
||
- name: Set az account | ||
uses: azure/CLI@v2 | ||
with: | ||
inlineScript: | | ||
az account set --subscription ${{env.AZURE_SUBSCRIPTION_ID}} | ||
|
||
- name: Login to with Azure with azd (Federated Credentials) | ||
if: ${{ env.AZURE_CLIENT_ID != '' }} | ||
run: | | ||
azd auth login ` | ||
--client-id "$Env:AZURE_CLIENT_ID" ` | ||
--federated-credential-provider "github" ` | ||
--tenant-id "$Env:AZURE_TENANT_ID" | ||
shell: pwsh | ||
|
||
- name: Refresh azd environment variables | ||
run: | | ||
azd env refresh -e $AZURE_ENV_NAME --no-prompt | ||
env: | ||
AZD_INITIAL_ENVIRONMENT_CONFIG: ${{ secrets.AZD_INITIAL_ENVIRONMENT_CONFIG }} | ||
|
||
- name: Build frontend | ||
run: | | ||
cd ./app/frontend | ||
npm install | ||
npm run build | ||
|
||
- name: Install dependencies | ||
run: | | ||
uv pip install -r requirements-dev.txt | ||
|
||
- name: Run local server in background | ||
run: | | ||
cd app/backend | ||
RUNNER_TRACKING_ID="" && (nohup python3 -m quart --app main:app run --port 50505 > serverlogs.out 2> serverlogs.err &) | ||
cd ../.. | ||
|
||
- name: Install evaluate dependencies | ||
run: | | ||
uv pip install -r evals/requirements.txt | ||
|
||
- name: Evaluate local RAG flow | ||
run: | | ||
python evals/evaluate.py --targeturl=http://127.0.0.1:50505/chat --resultsdir=evals/results/pr${{ github.event.issue.number }} | ||
|
||
- name: Upload eval results as build artifact | ||
if: ${{ success() }} | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
name: eval_result | ||
path: ./evals/results/pr${{ github.event.issue.number }} | ||
|
||
- name: Upload server logs as build artifact | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
name: server_logs | ||
path: ./app/backend/serverlogs.out | ||
|
||
- name: Upload server error logs as build artifact | ||
uses: actions/upload-artifact@v4 | ||
with: | ||
name: server_error_logs | ||
path: ./app/backend/serverlogs.err | ||
|
||
- name: Summarize results | ||
if: ${{ success() }} | ||
run: | | ||
echo "## Evaluation results" >> eval-summary.md | ||
python -m evaltools summary evals/results --output=markdown >> eval-summary.md | ||
echo "## Answer differences across runs" >> run-diff.md | ||
python -m evaltools diff evals/results/baseline evals/results/pr${{ github.event.issue.number }} --output=markdown >> run-diff.md | ||
cat eval-summary.md >> $GITHUB_STEP_SUMMARY | ||
cat run-diff.md >> $GITHUB_STEP_SUMMARY | ||
|
||
- name: Comment on pull request | ||
uses: actions/github-script@v7 | ||
with: | ||
script: | | ||
const fs = require('fs'); | ||
const summaryPath = "eval-summary.md"; | ||
const summary = fs.readFileSync(summaryPath, 'utf8'); | ||
const runId = process.env.GITHUB_RUN_ID; | ||
const repo = process.env.GITHUB_REPOSITORY; | ||
const actionsUrl = `https://github.com/${repo}/actions/runs/${runId}`; | ||
github.rest.issues.createComment({ | ||
issue_number: context.issue.number, | ||
owner: context.repo.owner, | ||
repo: context.repo.repo, | ||
body: `${summary}\n\n[Check the workflow run for more details](${actionsUrl}).` | ||
}) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -420,7 +420,7 @@ async def setup_clients(): | |
OPENAI_HOST = os.getenv("OPENAI_HOST", "azure") | ||
OPENAI_CHATGPT_MODEL = os.environ["AZURE_OPENAI_CHATGPT_MODEL"] | ||
OPENAI_EMB_MODEL = os.getenv("AZURE_OPENAI_EMB_MODEL_NAME", "text-embedding-ada-002") | ||
OPENAI_EMB_DIMENSIONS = int(os.getenv("AZURE_OPENAI_EMB_DIMENSIONS", 1536)) | ||
OPENAI_EMB_DIMENSIONS = int(os.getenv("AZURE_OPENAI_EMB_DIMENSIONS") or 1536) | ||
# Used with Azure OpenAI deployments | ||
AZURE_OPENAI_SERVICE = os.getenv("AZURE_OPENAI_SERVICE") | ||
AZURE_OPENAI_GPT4V_DEPLOYMENT = os.environ.get("AZURE_OPENAI_GPT4V_DEPLOYMENT") | ||
|
@@ -450,8 +450,8 @@ async def setup_clients(): | |
KB_FIELDS_CONTENT = os.getenv("KB_FIELDS_CONTENT", "content") | ||
KB_FIELDS_SOURCEPAGE = os.getenv("KB_FIELDS_SOURCEPAGE", "sourcepage") | ||
|
||
AZURE_SEARCH_QUERY_LANGUAGE = os.getenv("AZURE_SEARCH_QUERY_LANGUAGE", "en-us") | ||
AZURE_SEARCH_QUERY_SPELLER = os.getenv("AZURE_SEARCH_QUERY_SPELLER", "lexicon") | ||
AZURE_SEARCH_QUERY_LANGUAGE = os.getenv("AZURE_SEARCH_QUERY_LANGUAGE") or "en-us" | ||
AZURE_SEARCH_QUERY_SPELLER = os.getenv("AZURE_SEARCH_QUERY_SPELLER") or "lexicon" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was the fix for the weird search error |
||
AZURE_SEARCH_SEMANTIC_RANKER = os.getenv("AZURE_SEARCH_SEMANTIC_RANKER", "free").lower() | ||
|
||
AZURE_SPEECH_SERVICE_ID = os.getenv("AZURE_SPEECH_SERVICE_ID") | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -81,7 +81,13 @@ Run the evaluation script by running the following command: | |
python evals/evaluate.py | ||
``` | ||
|
||
🕰️ This may take a long time, possibly several hours, depending on the number of ground truth questions. You can specify `--numquestions` argument for a test run on a subset of the questions. | ||
The options are: | ||
|
||
* `numquestions`: The number of questions to evaluate. By default, this is all questions in the ground truth data. | ||
* `resultsdir`: The directory to write the evaluation results. By default, this is a timestamped folder in `evals/results`. This option can also be specified in `eval_config.json`. | ||
* `targeturl`: The URL of the running application to evaluate. By default, this is `http://localhost:50505`. This option can also be specified in `eval_config.json`. | ||
|
||
🕰️ This may take a long time, possibly several hours, depending on the number of ground truth questions. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. tokens allocated to deployment also affect this, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes! I'll add a note. |
||
|
||
## Review the evaluation results | ||
|
||
|
@@ -93,12 +99,18 @@ You can see a summary of results across all evaluation runs by running the follo | |
python -m evaltools summary evals/results | ||
``` | ||
|
||
Compare answers across runs by running the following command: | ||
Compare answers to the ground truth by running the following command: | ||
|
||
```bash | ||
python -m evaltools diff evals/results/baseline/ | ||
``` | ||
|
||
Compare answers across two runs by running the following command: | ||
|
||
```bash | ||
python -m evaltools diff evals/results/baseline/ evals/results/SECONDRUNHERE | ||
``` | ||
|
||
## Run bulk evaluation on a PR | ||
|
||
To run the evaluation on the changes in a PR, you can add a `/evaluate` comment to the PR. This will trigger the evaluation workflow to run the evaluation on the PR changes and will post the results to the PR. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,28 @@ | |
logger = logging.getLogger("ragapp") | ||
|
||
|
||
class AnyCitationMetric(BaseMetric): | ||
METRIC_NAME = "any_citation" | ||
|
||
@classmethod | ||
def evaluator_fn(cls, **kwargs): | ||
def any_citation(*, response, **kwargs): | ||
if response is None: | ||
logger.warning("Received response of None, can't compute any_citation metric. Setting to -1.") | ||
return {cls.METRIC_NAME: -1} | ||
return {cls.METRIC_NAME: bool(re.search(r"\[([^\]]+)\.\w{3,4}(#page=\d+)*\]", response))} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nice regex 👍 |
||
|
||
return any_citation | ||
|
||
@classmethod | ||
def get_aggregate_stats(cls, df): | ||
df = df[df[cls.METRIC_NAME] != -1] | ||
return { | ||
"total": int(df[cls.METRIC_NAME].sum()), | ||
"rate": round(df[cls.METRIC_NAME].mean(), 2), | ||
} | ||
|
||
|
||
class CitationsMatchedMetric(BaseMetric): | ||
METRIC_NAME = "citations_matched" | ||
|
||
|
@@ -80,6 +102,8 @@ def get_azure_credential(): | |
openai_config = get_openai_config() | ||
|
||
register_metric(CitationsMatchedMetric) | ||
register_metric(AnyCitationMetric) | ||
|
||
run_evaluate_from_config( | ||
working_dir=Path(__file__).parent, | ||
config_path="evaluate_config.json", | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also results in a bad error if the env variable is an empty string