Skip to content

add HellaSwag-like tests #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ianscrivener opened this issue Jul 22, 2023 · 4 comments
Open

add HellaSwag-like tests #1

ianscrivener opened this issue Jul 22, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@ianscrivener
Copy link
Owner

ianscrivener commented Jul 22, 2023

See ggml-org/llama.cpp#2321
Perhaps: --hellaswag & --perplexity CLI arguments
Automatically download test corpus

@ianscrivener ianscrivener added the enhancement New feature or request label Jul 22, 2023
@klosax
Copy link

klosax commented Jul 23, 2023

Since both tests computes the perplexity, perhaps --hellaswagish and --basic ?

The timing values for the tests are not very useful if there is no data on the system running the test like CPU, GPU and RAM. The timing value of interest would be tokens per second taken from llama.cpp output.

A more detailed description of the tests in the readme would me nice, maybe I could help with that.

@ianscrivener
Copy link
Owner Author

ianscrivener commented Jul 24, 2023

--basic it is...

Agree, at this stage, timing is is not very useful... as the underlying hardware will vary. Though this mini project was started in response to a request from GG for perplexity + latency benchmarking... with the expectation that Azure baremetal or dedicated host would be imminent. IF/when that happens - IMO then the latency score start to become useful. With same dedicated hardware and model(s) overtime (GG suggest open-llama 7B & 13B)... then quality and performance of the code can be assessed across builds. (Pls correct me if I've missed something)

"more detailed description": Yes, pls write something up and drop it here or in a PR

FYI: once this code is working well and debugged... sanity checked by those more knowledgable than me... I plan to submit the python test running to llama.cpp via a PR (/examples/perplexity_runner).

@klosax
Copy link

klosax commented Jul 25, 2023

My simple hellaswagish approach was not good enough, so I am working on a real HellaSwag implementation instead.

@klosax
Copy link

klosax commented Jul 26, 2023

Perhaps: --hellaswag & --perplexity CLI arguments

The parameter names above will work since the more accurate HellaSwag scoring does not compute perplexity.
Made a PR ggml-org/llama.cpp#2389 for the new HellaSwag scoring so you can try it when it gets merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants