Skip to content

tool calling eval #256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Ibrahim-Haroon
Copy link
Contributor

Description

Used Gemini to create 40 simple functions that will be passed as tools. Created a queries.json which has 15 queries per function which will be used as an "eval set" to determine accuracy of tool selection. Similar code to tests/eval_tests/tests.py but after running eval.py it shows successful calls per tool and not overall.

How Has This Been Tested?

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

…ests LlamaStack models against various tools and generates detailed performance plots, including per-tool analysis.
…uation. These tools enable structured metric tracking to CSV and dynamic visualization of performance through generated plots.

Added newline to end of queries.json
@Ibrahim-Haroon Ibrahim-Haroon force-pushed the ibrahim/tool-calling-eval branch from 3785414 to 564c494 Compare May 29, 2025 19:36
Comment on lines 10 to 11
:description: Adds two numbers.
:use_case: Use when the user wants to find the sum, total, or combined value of two numbers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you need the :description:, :use_case: or :returns: ( at least according do the client tool decorator docstring). Have you found this to be superior to a plain docstring function description?

https://github.com/meta-llama/llama-stack-client-python/blob/645d2195c5af1c6f903cb93c293319d8f94c36cc/src/llama_stack_client/lib/agents/client_tool.py#L150-L170

Copy link
Contributor Author

@Ibrahim-Haroon Ibrahim-Haroon Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first I added :description: and :use_case: tags because I thought it would need it, didn't look at the docs. But ya, looking at client_tool.py it even dumps information about the return information on purpose, so that won't be needed. From limited experimentation it seems :description: and :use_case: slightly increase tool trivial accuracy.
experiment.pdf

@Ibrahim-Haroon Ibrahim-Haroon force-pushed the ibrahim/tool-calling-eval branch 3 times, most recently from ad5f2bd to 3f3142a Compare May 30, 2025 20:12
@Ibrahim-Haroon Ibrahim-Haroon force-pushed the ibrahim/tool-calling-eval branch from 3f3142a to e9dd497 Compare June 2, 2025 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants