NXP backend: Add NeutronQuantizer #9876

skywall · 2025-04-03T14:45:44Z

Summary

Implementation on NeutronQuantizer that quantizes model in a way optimized for Neutron NPU.

cc @digantdesai @JakeStevens @robert-kalmar

pytorch-bot · 2025-04-03T14:45:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9876

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 7516dfb with merge base 4717459 ():

NEW FAILURE - The following job has failed:

pull / android / build-llm-demo / linux-job (gh)
RuntimeError: Command docker exec -t c4d4542b17b5c1d0f7ffdf39510e6b57126059989fb00782c96ef2388a4b8a86 /exec failed with exit code 127

This comment was automatically generated by Dr. CI and updates every 15 minutes.

skywall · 2025-04-03T14:51:47Z

@pytorchbot label module: nxp

pytorch-bot · 2025-04-03T14:51:50Z

Didn't find following labels among repository labels: module:nxp

skywall · 2025-04-03T14:54:23Z

@pytorchbot label "module: nxp"

skywall · 2025-04-03T14:56:01Z

@pytorchbot label "release notes: nxp"

backends/nxp/quantizer/neutron_quantizer.py

digantdesai

Thanks for the contribution.

Looks good to me at a high level. Left a couple of comments, lets resolve this. We are quite close to accept/merging this.

backends/nxp/quantizer/neutron_quantizer.py

digantdesai · 2025-04-04T16:23:03Z

backends/nxp/quantizer/neutron_quantizer.py

+    Conv2dPattern,
+    LinearPattern,
+)
+from executorch.backends.cadence.aot.quantizer.quantizer import CadenceAtenQuantizer


I have a few comments regarding this approach. Our goal is to strike a balance between two key considerations:

(A) reducing code duplication and sharing code when it makes sense, in order to maintain a healthy codebase, and

(B) avoiding false-positive issues or unnecessary burdens by assuming that something is backend-independent when it's not.

Balancing these two factors can be challenging.

There are two ways to address this issue. For cases that lean heavily towards (A), we should consider moving them to a shared location like executorch/backends, after consulting with the original authors, and updating both call-sites. On the other hand, for cases that are more akin to (B), we shouldn't worry too much about duplicating a few hundred lines of Python code. For example, we wouldn't want to burden the Cadence backend with worrying about Neutron tests failing if they need to make changes to these patterns.

In this particular instance, I believe the patterns fall under category (B), given that they involve replacing op fields with Cadence ops. The Aten Quantizer, however, seems more like an example of (A), as it's a more abstract concept that could be useful to others. That being said, feel free to treat it as a simple case (B) if it only involves a few lines of code.

What are your thoughts on this?

CC @mcremon-meta - who is the author and maintainer of Cadence backend in ET. We have been discussing this offline.

In this particular instance, I believe the patterns fall under category (B), given that they involve replacing op fields with Cadence ops.

What are your thoughts on this?

CC @mcremon-meta - who is the author and maintainer of Cadence backend in ET. We have been discussing this offline.

These are relatively simple like

def replacement_op(self) -> OpOverload: return torch.ops.cadence.quantized_conv.default

One approach would be to share this code as (A) but keep the replacement as abstract and individual backends must implement.

Could help bootstrap the ET MCU work as well.

Although, I'm not sure we'd want to just promote Cadence approach as standard approach-- might be better to audit different quantizers (including any internal) and see if there is some commonality or different approaches to tradeoff

@digantdesai Thank you for thorough description and possible approaches to improve this. I have decided to go with (B) - remove our dependency on Cadence quantizer at a cost of copying some lines of code. I expect we'll update quantizer a bit in a future what can give us an idea how to design some universal one (approach A).

@skywall I think we can fairly easily refactor this in a nice way so that all base classes are shared and both quantizers derive from them in a way that minimize any major dependencies/risks. We very rarely update base classes anymore, and if that were to happen, we can do a good reviewing job here on GH and I don't anticipate issues. This PR is completely fine, but if there is consensus we will try to find eng time to merge those properly (cc @digantdesai @JakeStevens)

Good discussion, thanks @JakeStevens, @mcremon-meta, @skywall.

I don't have a strong preference here. We can go as is and throw this down the line, or copy it now, or lastly, refactor Cadence (and potentially others) and create a better abstraction.

We already have a few quantizers in ET, so if I were you I would just copy it for now and create an issue to refactor this later.

As others said it shouldn't be too difficult and potentially be useful to new delegates in the future. That said, I am a bit hesitant to put that as a blocker on this PR because it can potentially impact not just Cadence but other delegates too if you want to scrub a bit deeper.

Agreed, to be clear @skywall @digantdesai not at all a blocker on this PR from our side. Just a nice to have :)

backends/nxp/quantizer/neutron_quantizer.py

digantdesai · 2025-04-04T17:28:50Z

backends/nxp/tests/test_quantizer.py

+    assert len(nodes) == 11
+    assert nodes[7].name == "conv2d"
+    # [0]: Input, [1] : weights, [2]: bias
+    assert _get_target_name(nodes[7].args[0]) == 'torch.ops.quantized_decomposed.dequantize_per_tensor.default'


Other option would be to use FileCheck utils, not a strong preference.

Thanks for the tip. We plan to utilize FileCheck through some ArmTester-like interface, but we're not there yet. FileCheck in general doesn't provide an API to check order and specific index of an operators, what is at this stage important for us, because we're (not ideally) using models with multiple operators in tests.

ArmTester derived from XNNPACKTester, alas same story as Quantizer. But since you don't have it implemented yet, can you please refactor XNNPACKTester, abstract parts of it as BackendTester and move of it to backends/test dir and make XNNPACKTester, ArmTester (already inherits from XNNPACKTester), and NeutronTester (new!) derive from that common tester? 🙏

Sounds good to me. Should I open Github issue or we can track this internally?

Please do open a Github issue to track

I have opened: #10100

backends/nxp/tests/models.py

backends/nxp/tests/test_quantizer.py

JakeStevens · 2025-04-08T11:38:15Z

@skywall waiting on any updates

skywall · 2025-04-09T09:36:12Z

@digantdesai @JakeStevens Thank you for quick review and my apologies for late response. I hope I have addressed all your comments and I'm looking forward your feedback.

JakeStevens · 2025-04-10T15:54:08Z

Looks like you need to run the linter and resubmit @skywall

FWIW, I think this is a good state and we can work on reviewing the next chunk in parallel if there is more to submit

backends/nxp/quantizer/neutron_quantizer.py

robert-kalmar · 2025-04-11T13:41:51Z

Looks like you need to run the linter and resubmit @skywall

FWIW, I think this is a good state and we can work on reviewing the next chunk in parallel if there is more to submit

@JakeStevens I am already working on the next chunk putting the Neutron Backend (i.e. the Delegate and the Partitioner) on the PR. Slightly more work to get it in sync with main, as of lot of new files added.

digantdesai

LGTM, thanks for the quick turnaround. Let's fix the CI and we can merge it.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 3, 2025

pytorch-bot bot added the module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ label Apr 3, 2025

pytorch-bot bot added the release notes: nxp Changes to the NXP Neutron backend delegate label Apr 3, 2025

skywall force-pushed the feature/nxf93343/neutron-quantizer branch from 4dc6bc0 to 191c158 Compare April 4, 2025 06:16

JakeStevens reviewed Apr 4, 2025

View reviewed changes

backends/nxp/quantizer/neutron_quantizer.py Outdated Show resolved Hide resolved

backends/nxp/quantizer/neutron_quantizer.py Outdated Show resolved Hide resolved

digantdesai reviewed Apr 4, 2025

View reviewed changes

skywall force-pushed the feature/nxf93343/neutron-quantizer branch from 191c158 to db29fda Compare April 9, 2025 09:09

skywall force-pushed the feature/nxf93343/neutron-quantizer branch from db29fda to 41cfb0a Compare April 10, 2025 06:26

skywall requested review from JakeStevens and digantdesai April 10, 2025 06:28

skywall force-pushed the feature/nxf93343/neutron-quantizer branch from 41cfb0a to 7b9bd77 Compare April 11, 2025 05:22

JakeStevens reviewed Apr 11, 2025

View reviewed changes

backends/nxp/quantizer/neutron_quantizer.py Outdated Show resolved Hide resolved

digantdesai approved these changes Apr 11, 2025

View reviewed changes

NXP backend: Add NeutronQuantizer

7516dfb

skywall force-pushed the feature/nxf93343/neutron-quantizer branch from 7b9bd77 to 7516dfb Compare April 14, 2025 06:29

github-actions bot mentioned this pull request Apr 14, 2025

Weekly pr metrics report - 2025-04-01..2025-04-07 wdvr/pytorch#28

Open

digantdesai merged commit 906d332 into pytorch:main Apr 14, 2025
81 of 82 checks passed

skywall deleted the feature/nxf93343/neutron-quantizer branch April 15, 2025 12:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NXP backend: Add NeutronQuantizer #9876

NXP backend: Add NeutronQuantizer #9876

skywall commented Apr 3, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 3, 2025 •

edited

Loading

skywall commented Apr 3, 2025 •

edited

Loading

pytorch-bot bot commented Apr 3, 2025

skywall commented Apr 3, 2025

skywall commented Apr 3, 2025

digantdesai left a comment

digantdesai Apr 4, 2025

JakeStevens Apr 4, 2025

skywall Apr 9, 2025 •

edited

Loading

mcremon-meta Apr 9, 2025

digantdesai Apr 9, 2025

mcremon-meta Apr 9, 2025 •

edited

Loading

digantdesai Apr 4, 2025

skywall Apr 9, 2025

digantdesai Apr 9, 2025

skywall Apr 10, 2025

JakeStevens Apr 10, 2025

skywall Apr 11, 2025 •

edited

Loading

JakeStevens commented Apr 8, 2025

skywall commented Apr 9, 2025

JakeStevens commented Apr 10, 2025

robert-kalmar commented Apr 11, 2025

digantdesai left a comment

NXP backend: Add NeutronQuantizer #9876

NXP backend: Add NeutronQuantizer #9876

Conversation

skywall commented Apr 3, 2025 • edited by pytorch-bot bot Loading

Summary

pytorch-bot bot commented Apr 3, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9876

❌ 1 New Failure

skywall commented Apr 3, 2025 • edited Loading

pytorch-bot bot commented Apr 3, 2025

skywall commented Apr 3, 2025

skywall commented Apr 3, 2025

digantdesai left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skywall Apr 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcremon-meta Apr 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skywall Apr 11, 2025 • edited Loading

Choose a reason for hiding this comment

JakeStevens commented Apr 8, 2025

skywall commented Apr 9, 2025

JakeStevens commented Apr 10, 2025

robert-kalmar commented Apr 11, 2025

digantdesai left a comment

Choose a reason for hiding this comment

skywall commented Apr 3, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 3, 2025 •

edited

Loading

skywall commented Apr 3, 2025 •

edited

Loading

skywall Apr 9, 2025 •

edited

Loading

mcremon-meta Apr 9, 2025 •

edited

Loading

skywall Apr 11, 2025 •

edited

Loading