Skip to content

NXP backend: Add NeutronQuantizer #9876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

skywall
Copy link
Contributor

@skywall skywall commented Apr 3, 2025

Summary

Implementation on NeutronQuantizer that quantizes model in a way optimized for Neutron NPU.

cc @digantdesai @JakeStevens @robert-kalmar

Copy link

pytorch-bot bot commented Apr 3, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9876

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 7516dfb with merge base 4717459 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 3, 2025
@skywall
Copy link
Contributor Author

skywall commented Apr 3, 2025

@pytorchbot label module: nxp

Copy link

pytorch-bot bot commented Apr 3, 2025

Didn't find following labels among repository labels: module:nxp

@skywall
Copy link
Contributor Author

skywall commented Apr 3, 2025

@pytorchbot label "module: nxp"

@pytorch-bot pytorch-bot bot added the module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ label Apr 3, 2025
@skywall
Copy link
Contributor Author

skywall commented Apr 3, 2025

@pytorchbot label "release notes: nxp"

@pytorch-bot pytorch-bot bot added the release notes: nxp Changes to the NXP Neutron backend delegate label Apr 3, 2025
@skywall skywall force-pushed the feature/nxf93343/neutron-quantizer branch from 4dc6bc0 to 191c158 Compare April 4, 2025 06:16
Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution.

Looks good to me at a high level. Left a couple of comments, lets resolve this. We are quite close to accept/merging this.

Conv2dPattern,
LinearPattern,
)
from executorch.backends.cadence.aot.quantizer.quantizer import CadenceAtenQuantizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few comments regarding this approach. Our goal is to strike a balance between two key considerations:

  • (A) reducing code duplication and sharing code when it makes sense, in order to maintain a healthy codebase, and
  • (B) avoiding false-positive issues or unnecessary burdens by assuming that something is backend-independent when it's not.

Balancing these two factors can be challenging.

There are two ways to address this issue. For cases that lean heavily towards (A), we should consider moving them to a shared location like executorch/backends, after consulting with the original authors, and updating both call-sites. On the other hand, for cases that are more akin to (B), we shouldn't worry too much about duplicating a few hundred lines of Python code. For example, we wouldn't want to burden the Cadence backend with worrying about Neutron tests failing if they need to make changes to these patterns.

In this particular instance, I believe the patterns fall under category (B), given that they involve replacing op fields with Cadence ops. The Aten Quantizer, however, seems more like an example of (A), as it's a more abstract concept that could be useful to others. That being said, feel free to treat it as a simple case (B) if it only involves a few lines of code.

What are your thoughts on this?

CC @mcremon-meta - who is the author and maintainer of Cadence backend in ET. We have been discussing this offline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular instance, I believe the patterns fall under category (B), given that they involve replacing op fields with Cadence ops.

What are your thoughts on this?

CC @mcremon-meta - who is the author and maintainer of Cadence backend in ET. We have been discussing this offline.

These are relatively simple like

    def replacement_op(self) -> OpOverload:
        return torch.ops.cadence.quantized_conv.default

One approach would be to share this code as (A) but keep the replacement as abstract and individual backends must implement.

Could help bootstrap the ET MCU work as well.

Although, I'm not sure we'd want to just promote Cadence approach as standard approach-- might be better to audit different quantizers (including any internal) and see if there is some commonality or different approaches to tradeoff

Copy link
Contributor Author

@skywall skywall Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@digantdesai Thank you for thorough description and possible approaches to improve this. I have decided to go with (B) - remove our dependency on Cadence quantizer at a cost of copying some lines of code. I expect we'll update quantizer a bit in a future what can give us an idea how to design some universal one (approach A).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skywall I think we can fairly easily refactor this in a nice way so that all base classes are shared and both quantizers derive from them in a way that minimize any major dependencies/risks. We very rarely update base classes anymore, and if that were to happen, we can do a good reviewing job here on GH and I don't anticipate issues. This PR is completely fine, but if there is consensus we will try to find eng time to merge those properly (cc @digantdesai @JakeStevens)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good discussion, thanks @JakeStevens, @mcremon-meta, @skywall.

I don't have a strong preference here. We can go as is and throw this down the line, or copy it now, or lastly, refactor Cadence (and potentially others) and create a better abstraction.

We already have a few quantizers in ET, so if I were you I would just copy it for now and create an issue to refactor this later.

As others said it shouldn't be too difficult and potentially be useful to new delegates in the future. That said, I am a bit hesitant to put that as a blocker on this PR because it can potentially impact not just Cadence but other delegates too if you want to scrub a bit deeper.

Copy link
Contributor

@mcremon-meta mcremon-meta Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, to be clear @skywall @digantdesai not at all a blocker on this PR from our side. Just a nice to have :)

assert len(nodes) == 11
assert nodes[7].name == "conv2d"
# [0]: Input, [1] : weights, [2]: bias
assert _get_target_name(nodes[7].args[0]) == 'torch.ops.quantized_decomposed.dequantize_per_tensor.default'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other option would be to use FileCheck utils, not a strong preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip. We plan to utilize FileCheck through some ArmTester-like interface, but we're not there yet. FileCheck in general doesn't provide an API to check order and specific index of an operators, what is at this stage important for us, because we're (not ideally) using models with multiple operators in tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArmTester derived from XNNPACKTester, alas same story as Quantizer. But since you don't have it implemented yet, can you please refactor XNNPACKTester, abstract parts of it as BackendTester and move of it to backends/test dir and make XNNPACKTester, ArmTester (already inherits from XNNPACKTester), and NeutronTester (new!) derive from that common tester? 🙏

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Should I open Github issue or we can track this internally?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do open a Github issue to track

Copy link
Contributor Author

@skywall skywall Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened: #10100

@JakeStevens
Copy link
Contributor

@skywall waiting on any updates

@skywall skywall force-pushed the feature/nxf93343/neutron-quantizer branch from 191c158 to db29fda Compare April 9, 2025 09:09
@skywall
Copy link
Contributor Author

skywall commented Apr 9, 2025

@digantdesai @JakeStevens Thank you for quick review and my apologies for late response. I hope I have addressed all your comments and I'm looking forward your feedback.

@skywall skywall force-pushed the feature/nxf93343/neutron-quantizer branch from db29fda to 41cfb0a Compare April 10, 2025 06:26
@JakeStevens
Copy link
Contributor

Looks like you need to run the linter and resubmit @skywall

FWIW, I think this is a good state and we can work on reviewing the next chunk in parallel if there is more to submit

@skywall skywall force-pushed the feature/nxf93343/neutron-quantizer branch from 41cfb0a to 7b9bd77 Compare April 11, 2025 05:22
@robert-kalmar
Copy link
Contributor

Looks like you need to run the linter and resubmit @skywall

FWIW, I think this is a good state and we can work on reviewing the next chunk in parallel if there is more to submit

@JakeStevens I am already working on the next chunk putting the Neutron Backend (i.e. the Delegate and the Partitioner) on the PR. Slightly more work to get it in sync with main, as of lot of new files added.

Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the quick turnaround. Let's fix the CI and we can merge it.

@skywall skywall force-pushed the feature/nxf93343/neutron-quantizer branch from 7b9bd77 to 7516dfb Compare April 14, 2025 06:29
@digantdesai digantdesai merged commit 906d332 into pytorch:main Apr 14, 2025
81 of 82 checks passed
@skywall skywall deleted the feature/nxf93343/neutron-quantizer branch April 15, 2025 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: nxp Issues related to NXP Neutron NPU delegation and code under backends/nxp/ release notes: nxp Changes to the NXP Neutron backend delegate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants