Skip to content

Commit 9e3893c

Browse files
committed
test: refine smoke judge comparison rules and output
- Focus comparison on matching event types to reduce false negatives - Drop "ignore callProgress" rule (we're eliding them from the event stream before sending them to the judge now) Signed-off-by: Nick Hale <[email protected]>
1 parent d70f919 commit 9e3893c

File tree

2 files changed

+4
-2
lines changed

2 files changed

+4
-2
lines changed

pkg/tests/judge/judge.go

+2
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ After making a determination, respond with a JSON object that conforms to the fo
4040
]
4141
}
4242
43+
If you determine actual and expected are not equivalent, include a diff of the parts of actual and expected that are not equivalent in the reasoning field of your response.
44+
4345
Your responses are concise and include only the json object described above.
4446
`
4547

pkg/tests/smoke/smoke_test.go

+2-2
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,8 @@ func TestSmoke(t *testing.T) {
8282
expectedEvents,
8383
actualEvents,
8484
`
85-
- disregard differences in timestamps, generated IDs, natural language verbiage, and event order
86-
- omit callProgress events from the comparison
85+
- disregard differences in event order, timestamps, generated IDs, and natural language verbiage, grammar, and punctuation
86+
- compare events with matching event types
8787
- the overall stream of events and set of tools called should roughly match
8888
- arguments passed in tool calls should be roughly the same
8989
- the final callFinish event should be semantically similar

0 commit comments

Comments
 (0)