Skip to content

[DeepSparse Evaluation API] Perplexity eval support for openai_humaneval, c4, wikitext2 #1586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

dbogunowicz
Copy link
Contributor

Example use

deepsparse.evaluate hf:mgoin/TinyStories-1M-deepsparse --integration perplexity --dataset wikitext2 --limit 2 --batch_size 2 --max_sequence_length 128

2024-02-06 18:14:27 deepsparse.evaluation.cli INFO     Creating deepsparse pipeline to evaluate from model path: hf:mgoin/TinyStories-1M-deepsparse
2024-02-06 18:14:27 deepsparse.evaluation.cli INFO     Datasets to evaluate on: ['wikitext2']
Batch size: 2
Splits to evaluate on: None
Metrics to evaluate on: None
Additional integration arguments supplied: {'limit': 2, 'max_sequence_length': 128}
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 149796.57it/s]
DeepSparse, Copyright 2021-present / Neuralmagic, Inc. version: 1.7.0.20240104 COMMUNITY | (86c38139) (release) (optimized) (system=avx2, binary=avx2)
2024-02-06 18:14:30 deepsparse.evaluation.integrations.perplexity INFO     Argument `splits` is None. Defaulting to `test` split.
Token indices sequence length is longer than the specified maximum sequence length for this model (287645 > 2048). Running this sequence through the model will result in indexing errors
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.51it/s]
2024-02-06 18:14:38 deepsparse.evaluation.cli INFO     Evaluation done. Results:
[Evaluation(task='perplexity', dataset=Dataset(type=None, name='wikitext2', config=None, split='test'), metrics=[Metric(name='perplexity', value=24642.261152241255)], samples=None)]
2024-02-06 18:14:38 deepsparse.evaluation.cli INFO     Saving the evaluation results to /nm/drive0/damian/deepsparse/result.json

@dbogunowicz dbogunowicz changed the base branch from main to feature/damian/perplexity_eval February 6, 2024 18:16
@dbogunowicz dbogunowicz requested review from bfineran, dsikka, anmarques and rahul-tuli and removed request for dsikka February 6, 2024 18:18
@dbogunowicz dbogunowicz changed the title [DeepSparse Evaluation API] Support for openai_humaneval, c4, wikitext2 [DeepSparse Evaluation API] Perplexity eval support for openai_humaneval, c4, wikitext2 Feb 6, 2024
@dbogunowicz dbogunowicz merged commit 898f677 into feature/damian/perplexity_eval Feb 9, 2024
@dbogunowicz dbogunowicz deleted the feature/damian/perplexity_datasets branch February 9, 2024 15:50
dbogunowicz added a commit that referenced this pull request Feb 9, 2024
* initial commit

* Update src/deepsparse/evaluation/integrations/__init__.py

* design ready, time to define additional features

* split prep_for_generation operator

* fix logits

* update non-kv cache pipeline and tests

* add tests to address edge cases

* add condition to check of kv_cache full during prompt inference, add test to cover this case, revert debugging changes

* fix typing

* remove commented code

* remove irrelevant condition

* perplexity for non-kv cache pipelines works!

* logic is working

* ready for review

* [DeepSparse Evaluation API] Perplexity eval support for `openai_humaneval`, `c4`, `wikitext2` (#1586)

* fix tests 2

* initial commit

* add return to a function

* make script more robust

---------

Co-authored-by: Dipika Sikka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants