add HellaSwag-like tests #1

ianscrivener · 2023-07-22T23:57:48Z

See ggml-org/llama.cpp#2321
Perhaps: --hellaswag & --perplexity CLI arguments
Automatically download test corpus

The text was updated successfully, but these errors were encountered:

klosax · 2023-07-23T19:03:49Z

Since both tests computes the perplexity, perhaps --hellaswagish and --basic ?

The timing values for the tests are not very useful if there is no data on the system running the test like CPU, GPU and RAM. The timing value of interest would be tokens per second taken from llama.cpp output.

A more detailed description of the tests in the readme would me nice, maybe I could help with that.

ianscrivener · 2023-07-24T01:14:05Z

--basic it is...

Agree, at this stage, timing is is not very useful... as the underlying hardware will vary. Though this mini project was started in response to a request from GG for perplexity + latency benchmarking... with the expectation that Azure baremetal or dedicated host would be imminent. IF/when that happens - IMO then the latency score start to become useful. With same dedicated hardware and model(s) overtime (GG suggest open-llama 7B & 13B)... then quality and performance of the code can be assessed across builds. (Pls correct me if I've missed something)

"more detailed description": Yes, pls write something up and drop it here or in a PR

FYI: once this code is working well and debugged... sanity checked by those more knowledgable than me... I plan to submit the python test running to llama.cpp via a PR (/examples/perplexity_runner).

klosax · 2023-07-25T03:23:36Z

My simple hellaswagish approach was not good enough, so I am working on a real HellaSwag implementation instead.

klosax · 2023-07-26T12:31:04Z

Perhaps: --hellaswag & --perplexity CLI arguments

The parameter names above will work since the more accurate HellaSwag scoring does not compute perplexity.
Made a PR ggml-org/llama.cpp#2389 for the new HellaSwag scoring so you can try it when it gets merged.

ianscrivener added the enhancement New feature or request label Jul 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add HellaSwag-like tests #1

add HellaSwag-like tests #1

ianscrivener commented Jul 22, 2023 •

edited

Loading

klosax commented Jul 23, 2023

Uh oh!

ianscrivener commented Jul 24, 2023 •

edited

Loading

Uh oh!

klosax commented Jul 25, 2023

Uh oh!

klosax commented Jul 26, 2023

Uh oh!

add HellaSwag-like tests #1

add HellaSwag-like tests #1

Comments

ianscrivener commented Jul 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

klosax commented Jul 23, 2023

Uh oh!

ianscrivener commented Jul 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

klosax commented Jul 25, 2023

Uh oh!

klosax commented Jul 26, 2023

Uh oh!

ianscrivener commented Jul 22, 2023 •

edited

Loading

ianscrivener commented Jul 24, 2023 •

edited

Loading