-
Notifications
You must be signed in to change notification settings - Fork 519
NXP backend: Add NeutronQuantizer #9876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NXP backend: Add NeutronQuantizer #9876
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9876
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 7516dfb with merge base 4717459 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot label module: nxp |
Didn't find following labels among repository labels: module:nxp |
@pytorchbot label "module: nxp" |
@pytorchbot label "release notes: nxp" |
4dc6bc0
to
191c158
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution.
Looks good to me at a high level. Left a couple of comments, lets resolve this. We are quite close to accept/merging this.
Conv2dPattern, | ||
LinearPattern, | ||
) | ||
from executorch.backends.cadence.aot.quantizer.quantizer import CadenceAtenQuantizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few comments regarding this approach. Our goal is to strike a balance between two key considerations:
- (A) reducing code duplication and sharing code when it makes sense, in order to maintain a healthy codebase, and
- (B) avoiding false-positive issues or unnecessary burdens by assuming that something is backend-independent when it's not.
Balancing these two factors can be challenging.
There are two ways to address this issue. For cases that lean heavily towards (A), we should consider moving them to a shared location like executorch/backends, after consulting with the original authors, and updating both call-sites. On the other hand, for cases that are more akin to (B), we shouldn't worry too much about duplicating a few hundred lines of Python code. For example, we wouldn't want to burden the Cadence backend with worrying about Neutron tests failing if they need to make changes to these patterns.
In this particular instance, I believe the patterns fall under category (B), given that they involve replacing op fields with Cadence ops. The Aten Quantizer, however, seems more like an example of (A), as it's a more abstract concept that could be useful to others. That being said, feel free to treat it as a simple case (B) if it only involves a few lines of code.
What are your thoughts on this?
CC @mcremon-meta - who is the author and maintainer of Cadence backend in ET. We have been discussing this offline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this particular instance, I believe the patterns fall under category (B), given that they involve replacing op fields with Cadence ops.
What are your thoughts on this?
CC @mcremon-meta - who is the author and maintainer of Cadence backend in ET. We have been discussing this offline.
These are relatively simple like
def replacement_op(self) -> OpOverload:
return torch.ops.cadence.quantized_conv.default
One approach would be to share this code as (A) but keep the replacement as abstract and individual backends must implement.
Could help bootstrap the ET MCU work as well.
Although, I'm not sure we'd want to just promote Cadence approach as standard approach-- might be better to audit different quantizers (including any internal) and see if there is some commonality or different approaches to tradeoff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@digantdesai Thank you for thorough description and possible approaches to improve this. I have decided to go with (B) - remove our dependency on Cadence quantizer at a cost of copying some lines of code. I expect we'll update quantizer a bit in a future what can give us an idea how to design some universal one (approach A).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skywall I think we can fairly easily refactor this in a nice way so that all base classes are shared and both quantizers derive from them in a way that minimize any major dependencies/risks. We very rarely update base classes anymore, and if that were to happen, we can do a good reviewing job here on GH and I don't anticipate issues. This PR is completely fine, but if there is consensus we will try to find eng time to merge those properly (cc @digantdesai @JakeStevens)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good discussion, thanks @JakeStevens, @mcremon-meta, @skywall.
I don't have a strong preference here. We can go as is and throw this down the line, or copy it now, or lastly, refactor Cadence (and potentially others) and create a better abstraction.
We already have a few quantizers in ET, so if I were you I would just copy it for now and create an issue to refactor this later.
As others said it shouldn't be too difficult and potentially be useful to new delegates in the future. That said, I am a bit hesitant to put that as a blocker on this PR because it can potentially impact not just Cadence but other delegates too if you want to scrub a bit deeper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, to be clear @skywall @digantdesai not at all a blocker on this PR from our side. Just a nice to have :)
backends/nxp/tests/test_quantizer.py
Outdated
assert len(nodes) == 11 | ||
assert nodes[7].name == "conv2d" | ||
# [0]: Input, [1] : weights, [2]: bias | ||
assert _get_target_name(nodes[7].args[0]) == 'torch.ops.quantized_decomposed.dequantize_per_tensor.default' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other option would be to use FileCheck
utils, not a strong preference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the tip. We plan to utilize FileCheck
through some ArmTester
-like interface, but we're not there yet. FileCheck
in general doesn't provide an API to check order and specific index of an operators, what is at this stage important for us, because we're (not ideally) using models with multiple operators in tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArmTester
derived from XNNPACKTester
, alas same story as Quantizer. But since you don't have it implemented yet, can you please refactor XNNPACKTester
, abstract parts of it as BackendTester
and move of it to backends/test
dir and make XNNPACKTester
, ArmTester
(already inherits from XNNPACKTester
), and NeutronTester
(new!) derive from that common tester? 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Should I open Github issue or we can track this internally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do open a Github issue to track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have opened: #10100
@skywall waiting on any updates |
191c158
to
db29fda
Compare
@digantdesai @JakeStevens Thank you for quick review and my apologies for late response. I hope I have addressed all your comments and I'm looking forward your feedback. |
db29fda
to
41cfb0a
Compare
Looks like you need to run the linter and resubmit @skywall FWIW, I think this is a good state and we can work on reviewing the next chunk in parallel if there is more to submit |
41cfb0a
to
7b9bd77
Compare
@JakeStevens I am already working on the next chunk putting the Neutron Backend (i.e. the Delegate and the Partitioner) on the PR. Slightly more work to get it in sync with main, as of lot of new files added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the quick turnaround. Let's fix the CI and we can merge it.
7b9bd77
to
7516dfb
Compare
Summary
Implementation on NeutronQuantizer that quantizes model in a way optimized for Neutron NPU.
cc @digantdesai @JakeStevens @robert-kalmar