[DeepSparse Evaluation API] Perplexity eval support for `openai_humaneval`, `c4`, `wikitext2` #1586

dbogunowicz · 2024-02-06T18:16:13Z

Example use

deepsparse.evaluate hf:mgoin/TinyStories-1M-deepsparse --integration perplexity --dataset wikitext2 --limit 2 --batch_size 2 --max_sequence_length 128

2024-02-06 18:14:27 deepsparse.evaluation.cli INFO     Creating deepsparse pipeline to evaluate from model path: hf:mgoin/TinyStories-1M-deepsparse
2024-02-06 18:14:27 deepsparse.evaluation.cli INFO     Datasets to evaluate on: ['wikitext2']
Batch size: 2
Splits to evaluate on: None
Metrics to evaluate on: None
Additional integration arguments supplied: {'limit': 2, 'max_sequence_length': 128}
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 149796.57it/s]
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.7.0.20240104 COMMUNITY | (86c38139) (release) (optimized) (system=avx2, binary=avx2)
2024-02-06 18:14:30 deepsparse.evaluation.integrations.perplexity INFO     Argument `splits` is None. Defaulting to `test` split.
Token indices sequence length is longer than the specified maximum sequence length for this model (287645 > 2048). Running this sequence through the model will result in indexing errors
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.51it/s]
2024-02-06 18:14:38 deepsparse.evaluation.cli INFO     Evaluation done. Results:
[Evaluation(task='perplexity', dataset=Dataset(type=None, name='wikitext2', config=None, split='test'), metrics=[Metric(name='perplexity', value=24642.261152241255)], samples=None)]
2024-02-06 18:14:38 deepsparse.evaluation.cli INFO     Saving the evaluation results to /nm/drive0/damian/deepsparse/result.json

…plexity_datasets

* initial commit * Update src/deepsparse/evaluation/integrations/__init__.py * design ready, time to define additional features * split prep_for_generation operator * fix logits * update non-kv cache pipeline and tests * add tests to address edge cases * add condition to check of kv_cache full during prompt inference, add test to cover this case, revert debugging changes * fix typing * remove commented code * remove irrelevant condition * perplexity for non-kv cache pipelines works! * logic is working * ready for review * [DeepSparse Evaluation API] Perplexity eval support for `openai_humaneval`, `c4`, `wikitext2` (#1586) * fix tests 2 * initial commit * add return to a function * make script more robust --------- Co-authored-by: Dipika Sikka <[email protected]>

dbogunowicz added 2 commits February 6, 2024 12:50

fix tests 2

a58a4c8

initial commit

4a72d40

dbogunowicz changed the base branch from main to feature/damian/perplexity_eval February 6, 2024 18:16

add return to a function

fd53b3a

dbogunowicz requested review from bfineran, dsikka, anmarques and rahul-tuli and removed request for dsikka February 6, 2024 18:18

dbogunowicz changed the title ~~[DeepSparse Evaluation API] Support for openai_humaneval, c4, wikitext2~~ [DeepSparse Evaluation API] Perplexity eval support for openai_humaneval, c4, wikitext2 Feb 6, 2024

This was referenced Feb 6, 2024

[DeepSparse Evaluation API][Feature Branch] 1.7 Updates #1579

Closed

[DeepSparse Evaluation API] Perplexity #1555

Merged

dbogunowicz and others added 4 commits February 7, 2024 13:02

Merge branch 'feature/damian/perplexity_eval' into feature/damian/per…

a2171f0

…plexity_datasets

make script more robust

0bade58

Merge branch 'feature/damian/perplexity_eval' into feature/damian/per…

867ebc3

…plexity_datasets

Merge branch 'feature/damian/perplexity_eval' into feature/damian/per…

0e4dfc5

…plexity_datasets

bfineran approved these changes Feb 8, 2024

View reviewed changes

dbogunowicz merged commit 898f677 into feature/damian/perplexity_eval Feb 9, 2024

dbogunowicz deleted the feature/damian/perplexity_datasets branch February 9, 2024 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DeepSparse Evaluation API] Perplexity eval support for `openai_humaneval`, `c4`, `wikitext2` #1586

[DeepSparse Evaluation API] Perplexity eval support for `openai_humaneval`, `c4`, `wikitext2` #1586

dbogunowicz commented Feb 6, 2024

[DeepSparse Evaluation API] Perplexity eval support for openai_humaneval, c4, wikitext2 #1586

[DeepSparse Evaluation API] Perplexity eval support for openai_humaneval, c4, wikitext2 #1586

Conversation

dbogunowicz commented Feb 6, 2024

Example use

[DeepSparse Evaluation API] Perplexity eval support for `openai_humaneval`, `c4`, `wikitext2` #1586

[DeepSparse Evaluation API] Perplexity eval support for `openai_humaneval`, `c4`, `wikitext2` #1586