-
Notifications
You must be signed in to change notification settings - Fork 0
add HellaSwag-like tests #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Since both tests computes the perplexity, perhaps --hellaswagish and --basic ? The timing values for the tests are not very useful if there is no data on the system running the test like CPU, GPU and RAM. The timing value of interest would be tokens per second taken from llama.cpp output. A more detailed description of the tests in the readme would me nice, maybe I could help with that. |
Agree, at this stage, timing is is not very useful... as the underlying hardware will vary. Though this mini project was started in response to a request from GG for perplexity + latency benchmarking... with the expectation that Azure baremetal or dedicated host would be imminent. IF/when that happens - IMO then the latency score start to become useful. With same dedicated hardware and model(s) overtime (GG suggest open-llama 7B & 13B)... then quality and performance of the code can be assessed across builds. (Pls correct me if I've missed something) "more detailed description": Yes, pls write something up and drop it here or in a PR FYI: once this code is working well and debugged... sanity checked by those more knowledgable than me... I plan to submit the python test running to llama.cpp via a PR ( |
My simple hellaswagish approach was not good enough, so I am working on a real HellaSwag implementation instead. |
The parameter names above will work since the more accurate HellaSwag scoring does not compute perplexity. |
Uh oh!
There was an error while loading. Please reload this page.
See ggml-org/llama.cpp#2321
Perhaps:
--hellaswag
&--perplexity
CLI argumentsAutomatically download test corpus
The text was updated successfully, but these errors were encountered: