-
Notifications
You must be signed in to change notification settings - Fork 2.6k
LLM NPUW tests for PR checks #30051
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
LLM NPUW tests for PR checks #30051
Conversation
53a65f7
to
4f4186e
Compare
|
||
class SimpleLLMPipeline { | ||
public: | ||
void initialize(const std::string& model_path, ov::Core& core, const ov::AnyMap& config); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix order
register_mock_plugins_in_ov(); | ||
|
||
// Do the actual test: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to remove
// ------------------------ Prefill model --------------------------- | ||
// 1 infer request for head: | ||
EXPECT_CREATE_SYNC_INFER_REQ(mock_npu_for_prefill, MODEL(0), TIMES(1)); | ||
// 2 infer requests for function, `create_sync_infer_request()` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to polish
ov::AnyMap config; | ||
std::tie(model_path, config, input_ids, reference_ids) = param; | ||
config["NPUW_DEVICES"] = "CPU"; | ||
config["NPUW_LLM_MIN_RESPONSE_LEN"] = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the prompt length? Not sure why you limit this one here, but if your test prompt is short you may want to limit the PROMPT_LEN as well to improve the prefill's performance on CPU.
std::vector<int64_t>, std::vector<int64_t>>; | ||
} // anonymous namespace | ||
|
||
class LLMAccuracyTestsNPUW : public ::testing::TestWithParam<LLMTestParams> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be called an ACCURACY test to avoid confusion with the real accuracy tests.
namespace { | ||
const std::vector<int64_t> What_is_OpenVINO = | ||
{529, 29989, 1792, 29989, 29958, 13, 5618, 338, 4673, 29963, 1177, | ||
29949, 29973, 2, 29871, 13, 29966, 29989, 465, 22137, 29989, 29958, 13}; | ||
const std::vector<int64_t> OpenVINO = | ||
{6585, 29963, 1177, 29949}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this namespace can have a name token_ids
.
Details:
Tickets: