-
Notifications
You must be signed in to change notification settings - Fork 575
support qnn runner multi iter run #9071
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9071
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 5c674af with merge base 366ad75 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
474e2bd
to
09f1baf
Compare
This pull request was exported from Phabricator. Differential Revision: D70842764 |
This pull request was exported from Phabricator. Differential Revision: D70842764 |
09f1baf
to
9050a2b
Compare
This pull request was exported from Phabricator. Differential Revision: D70842764 |
Summary: Pull Request resolved: pytorch#9071 support qnn runner multi iter run Differential Revision: D70842764
9050a2b
to
86908f5
Compare
int32_t v_cache_size = (num_heads_ + 1) * context_len_ * head_dim_; | ||
int32_t k_cache_out_size = num_heads_ * max_ar_len * head_dim_; | ||
|
||
ptr->k_cache_out.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @billmguo,
Thank you for the PR. Could you help clean up the clear and reserve functions for the iOs? From our perspective, resetting the attention mask and pointer positions should be sufficient to reset iOs.
Smart mask should be relatively simple. Adjust attention mask should be enough.
About ShiftPointer one please refer to prepare_kv_io and prepare_prefill_io and reassign the very beginning of each pointer to corresponding TensorImpl
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.would you comment in the code which lines can be removed?
2. About ShiftPointer one please refer to prepare_kv_io and prepare_prefill_io and reassign the very beginning of each pointer to corresponding TensorImpl
can you explain more this? do we need call prepare_kv_io and prepare_prefill_io for each generate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the both shiftpointer and Smartmask logic, if you think it is still can be optimized, can you try on your side and update on PR specific code thanks!
86908f5
to
fe63aba
Compare
Summary: support qnn runner multi iter run Differential Revision: D70842764
This pull request was exported from Phabricator. Differential Revision: D70842764 |
void ShiftPointerIoMgr::reset_io( | ||
const std::vector<Result<MethodMeta>>& prefill_methods_meta, | ||
const std::vector<Result<MethodMeta>>& kv_methods_meta) { | ||
IO* ptr = static_cast<IO*>(data_ptr_.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't actually need to modify the interface of prepare_xx_io
. Maybe following snippet is enough:
std::fill(ptr->prefill_attention_mask.begin(), ptr->prefill_attention_mask.end(), 0);
std::fill(ptr->kv_attention_mask.begin(), ptr->kv_attention_mask.end(), 0);
And the following function calls of prepare_xx_io
might be omitted, the attention mask will be set correctly when runner invoke fill_xx_toks
.
Ditto for smart-mask I think. If you found it work for both versions, please have them both map to one implementation, thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never mind, I tried, this works I will update diff for the logics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the diff since the smartmask and shifpointer use different data structure for prefill and kv attn so I did not unified the reset_io into one
Summary: support qnn runner multi iter run Differential Revision: D70842764
fe63aba
to
6804e3a
Compare
This pull request was exported from Phabricator. Differential Revision: D70842764 |
@pytorchbot label "topic: not user facing" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you!
Summary: support qnn runner multi iter run Reviewed By: limintang Differential Revision: D70842764
6804e3a
to
5b48434
Compare
This pull request was exported from Phabricator. Differential Revision: D70842764 |
Summary: support qnn runner multi iter run Reviewed By: limintang Differential Revision: D70842764
5b48434
to
5c674af
Compare
This pull request was exported from Phabricator. Differential Revision: D70842764 |
Summary: support qnn runner multi iter run
Differential Revision: D70842764