Skip to content

Commit c5d468f

Browse files
committed
Update on "[ET-VK][ez] Allow logit linear layer to be lowered to Vulkan"
## Context Due to poor performance of Vulkan's int4 linear operator, the final logit layer of the transformer model was not being delegated to vulkan, and was instead quantized and executed with the XNNPACK delegate. However, with D72412950 / #9883 decent performance can now be achieved with Vulkan/s int4 linear op. Therefore, the final logit layer can be lowered to Vulkan. ## Changes * Remove limit from `VkInt4WeightOnlyQuantizer` that was causing it to ignore the logit layer of the transformer * Do not apply XNNPACK partitioner and quantizer when lowering with Vulkan Differential Revision: [D72480177](https://our.internmc.facebook.com/intern/diff/D72480177/) cc manuelcandales cbilgin [ghstack-poisoned]
2 parents a167be5 + a24415a commit c5d468f

File tree

6 files changed

+725
-996
lines changed

6 files changed

+725
-996
lines changed

Diff for: backends/vulkan/_passes/squeeze_unsqueeze_inputs.py

+6
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,12 @@ class SqueezeUnsqueezeInputs(ExportPass):
3030
def should_squeeze(self, op, shape: List[int]) -> bool:
3131
if len(shape) == 3:
3232
return shape[1] == 1 and shape[0] > 1
33+
if len(shape) == 4:
34+
# No need to squeeze if all dims are 1 except the width dim
35+
if all(dim == 1 for dim in shape[:-1]):
36+
return False
37+
# Otherwise, check for squeezable dim
38+
return 1 in shape[:-1]
3339

3440
# Prefer not to introduce additional orchestration ops by default
3541
return False

0 commit comments

Comments
 (0)