Skip to content

support qnn runner multi iter run #9071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 11, 2025

Conversation

billmguo
Copy link
Contributor

@billmguo billmguo commented Mar 8, 2025

Summary: support qnn runner multi iter run

Differential Revision: D70842764

@billmguo billmguo requested a review from cccclai as a code owner March 8, 2025 18:11
Copy link

pytorch-bot bot commented Mar 8, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9071

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5c674af with merge base 366ad75 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 8, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70842764

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70842764

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70842764

billmguo added a commit to billmguo/executorch that referenced this pull request Mar 8, 2025
Summary:
Pull Request resolved: pytorch#9071

support qnn runner multi iter run

Differential Revision: D70842764
int32_t v_cache_size = (num_heads_ + 1) * context_len_ * head_dim_;
int32_t k_cache_out_size = num_heads_ * max_ar_len * head_dim_;

ptr->k_cache_out.clear();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @billmguo,
Thank you for the PR. Could you help clean up the clear and reserve functions for the iOs? From our perspective, resetting the attention mask and pointer positions should be sufficient to reset iOs.

Smart mask should be relatively simple. Adjust attention mask should be enough.
About ShiftPointer one please refer to prepare_kv_io and prepare_prefill_io and reassign the very beginning of each pointer to corresponding TensorImpl

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.would you comment in the code which lines can be removed?
2. About ShiftPointer one please refer to prepare_kv_io and prepare_prefill_io and reassign the very beginning of each pointer to corresponding TensorImpl
can you explain more this? do we need call prepare_kv_io and prepare_prefill_io for each generate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the both shiftpointer and Smartmask logic, if you think it is still can be optimized, can you try on your side and update on PR specific code thanks!

billmguo added a commit to billmguo/executorch that referenced this pull request Mar 10, 2025
Summary:

support qnn runner multi iter run

Differential Revision: D70842764
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70842764

void ShiftPointerIoMgr::reset_io(
const std::vector<Result<MethodMeta>>& prefill_methods_meta,
const std::vector<Result<MethodMeta>>& kv_methods_meta) {
IO* ptr = static_cast<IO*>(data_ptr_.get());
Copy link
Collaborator

@haowhsu-quic haowhsu-quic Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't actually need to modify the interface of prepare_xx_io. Maybe following snippet is enough:

std::fill(ptr->prefill_attention_mask.begin(), ptr->prefill_attention_mask.end(), 0);
std::fill(ptr->kv_attention_mask.begin(), ptr->kv_attention_mask.end(), 0);

And the following function calls of prepare_xx_io might be omitted, the attention mask will be set correctly when runner invoke fill_xx_toks.
Ditto for smart-mask I think. If you found it work for both versions, please have them both map to one implementation, thank you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never mind, I tried, this works I will update diff for the logics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the diff since the smartmask and shifpointer use different data structure for prefill and kv attn so I did not unified the reset_io into one

billmguo added a commit to billmguo/executorch that referenced this pull request Mar 11, 2025
Summary:

support qnn runner multi iter run

Differential Revision: D70842764
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70842764

@billmguo
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

Copy link
Collaborator

@haowhsu-quic haowhsu-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you!

@limintang limintang self-requested a review March 11, 2025 06:00
billmguo added a commit to billmguo/executorch that referenced this pull request Mar 11, 2025
Summary:

support qnn runner multi iter run

Reviewed By: limintang

Differential Revision: D70842764
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70842764

Summary:

support qnn runner multi iter run

Reviewed By: limintang

Differential Revision: D70842764
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D70842764

@facebook-github-bot facebook-github-bot merged commit ddf0d9e into pytorch:main Mar 11, 2025
52 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants