support qnn runner multi iter run #9071

billmguo · 2025-03-08T18:11:36Z

Summary: support qnn runner multi iter run

Differential Revision: D70842764

pytorch-bot · 2025-03-08T18:11:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9071

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5c674af with merge base 366ad75 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-03-08T18:28:48Z

This pull request was exported from Phabricator. Differential Revision: D70842764

facebook-github-bot · 2025-03-08T18:29:04Z

This pull request was exported from Phabricator. Differential Revision: D70842764

facebook-github-bot · 2025-03-08T18:38:30Z

This pull request was exported from Phabricator. Differential Revision: D70842764

Summary: Pull Request resolved: pytorch#9071 support qnn runner multi iter run Differential Revision: D70842764

chunit-quic · 2025-03-10T06:11:27Z

examples/qualcomm/oss_scripts/llama/runner/io_manager.cpp

+  int32_t v_cache_size = (num_heads_ + 1) * context_len_ * head_dim_;
+  int32_t k_cache_out_size = num_heads_ * max_ar_len * head_dim_;
+
+  ptr->k_cache_out.clear();


Hi @billmguo,
Thank you for the PR. Could you help clean up the clear and reserve functions for the iOs? From our perspective, resetting the attention mask and pointer positions should be sufficient to reset iOs.

Smart mask should be relatively simple. Adjust attention mask should be enough.
About ShiftPointer one please refer to prepare_kv_io and prepare_prefill_io and reassign the very beginning of each pointer to corresponding TensorImpl

Thanks!

1.would you comment in the code which lines can be removed?
2. About ShiftPointer one please refer to prepare_kv_io and prepare_prefill_io and reassign the very beginning of each pointer to corresponding TensorImpl
can you explain more this? do we need call prepare_kv_io and prepare_prefill_io for each generate?

Update the both shiftpointer and Smartmask logic, if you think it is still can be optimized, can you try on your side and update on PR specific code thanks!

Summary: support qnn runner multi iter run Differential Revision: D70842764

facebook-github-bot · 2025-03-10T19:03:55Z

This pull request was exported from Phabricator. Differential Revision: D70842764

haowhsu-quic · 2025-03-11T02:35:28Z

examples/qualcomm/oss_scripts/llama/runner/io_manager.cpp

+void ShiftPointerIoMgr::reset_io(
+    const std::vector<Result<MethodMeta>>& prefill_methods_meta,
+    const std::vector<Result<MethodMeta>>& kv_methods_meta) {
+  IO* ptr = static_cast<IO*>(data_ptr_.get());


I think we don't actually need to modify the interface of prepare_xx_io. Maybe following snippet is enough:

std::fill(ptr->prefill_attention_mask.begin(), ptr->prefill_attention_mask.end(), 0); std::fill(ptr->kv_attention_mask.begin(), ptr->kv_attention_mask.end(), 0);

And the following function calls of prepare_xx_io might be omitted, the attention mask will be set correctly when runner invoke fill_xx_toks.
Ditto for smart-mask I think. If you found it work for both versions, please have them both map to one implementation, thank you.

never mind, I tried, this works I will update diff for the logics

Update the diff since the smartmask and shifpointer use different data structure for prefill and kv attn so I did not unified the reset_io into one

Summary: support qnn runner multi iter run Differential Revision: D70842764

facebook-github-bot · 2025-03-11T04:05:26Z

This pull request was exported from Phabricator. Differential Revision: D70842764

billmguo · 2025-03-11T04:27:39Z

@pytorchbot label "topic: not user facing"

haowhsu-quic

Looks good to me, thank you!

Summary: support qnn runner multi iter run Reviewed By: limintang Differential Revision: D70842764

facebook-github-bot · 2025-03-11T06:07:13Z

This pull request was exported from Phabricator. Differential Revision: D70842764

Summary: support qnn runner multi iter run Reviewed By: limintang Differential Revision: D70842764

facebook-github-bot · 2025-03-11T06:41:46Z

This pull request was exported from Phabricator. Differential Revision: D70842764

billmguo requested a review from cccclai as a code owner March 8, 2025 18:11

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 8, 2025

billmguo force-pushed the export-D70842764 branch from 474e2bd to 09f1baf Compare March 8, 2025 18:16

facebook-github-bot added the fb-exported label Mar 8, 2025

billmguo force-pushed the export-D70842764 branch from 09f1baf to 9050a2b Compare March 8, 2025 18:29

billmguo added a commit to billmguo/executorch that referenced this pull request Mar 8, 2025

support qnn runner multi iter run (pytorch#9071)

86908f5

Summary: Pull Request resolved: pytorch#9071 support qnn runner multi iter run Differential Revision: D70842764

billmguo force-pushed the export-D70842764 branch from 9050a2b to 86908f5 Compare March 8, 2025 18:38

cccclai requested review from chunit-quic, haowhsu-quic, shewu-quic and winskuo-quic March 10, 2025 01:56

chunit-quic reviewed Mar 10, 2025

View reviewed changes

billmguo force-pushed the export-D70842764 branch from 86908f5 to fe63aba Compare March 10, 2025 19:03

billmguo added a commit to billmguo/executorch that referenced this pull request Mar 10, 2025

support qnn runner multi iter run (pytorch#9071)

fe63aba

Summary: support qnn runner multi iter run Differential Revision: D70842764

haowhsu-quic reviewed Mar 11, 2025

View reviewed changes

billmguo added a commit to billmguo/executorch that referenced this pull request Mar 11, 2025

support qnn runner multi iter run (pytorch#9071)

6804e3a

Summary: support qnn runner multi iter run Differential Revision: D70842764

billmguo force-pushed the export-D70842764 branch from fe63aba to 6804e3a Compare March 11, 2025 04:04

pytorch-bot bot added the topic: not user facing label Mar 11, 2025

haowhsu-quic approved these changes Mar 11, 2025

View reviewed changes

limintang self-requested a review March 11, 2025 06:00

limintang approved these changes Mar 11, 2025

View reviewed changes

billmguo added a commit to billmguo/executorch that referenced this pull request Mar 11, 2025

support qnn runner multi iter run (pytorch#9071)

5b48434

Summary: support qnn runner multi iter run Reviewed By: limintang Differential Revision: D70842764

billmguo force-pushed the export-D70842764 branch from 6804e3a to 5b48434 Compare March 11, 2025 06:06

support qnn runner multi iter run (pytorch#9071)

5c674af

Summary: support qnn runner multi iter run Reviewed By: limintang Differential Revision: D70842764

billmguo force-pushed the export-D70842764 branch from 5b48434 to 5c674af Compare March 11, 2025 06:41

facebook-github-bot merged commit ddf0d9e into pytorch:main Mar 11, 2025
52 checks passed

support qnn runner multi iter run #9071

support qnn runner multi iter run #9071

Uh oh!

Conversation

billmguo commented Mar 8, 2025

Uh oh!

pytorch-bot bot commented Mar 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9071

✅ No Failures

Uh oh!

facebook-github-bot commented Mar 8, 2025

Uh oh!

facebook-github-bot commented Mar 8, 2025

Uh oh!

facebook-github-bot commented Mar 8, 2025

Uh oh!

chunit-quic Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

billmguo Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

billmguo Mar 10, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 10, 2025

Uh oh!

haowhsu-quic Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

billmguo Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

billmguo Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 11, 2025

Uh oh!

billmguo commented Mar 11, 2025

Uh oh!

haowhsu-quic left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 11, 2025

Uh oh!

facebook-github-bot commented Mar 11, 2025

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 8, 2025 •

edited

Loading

haowhsu-quic Mar 11, 2025 •

edited

Loading