Skip to content

Commit 5232a22

Browse files
committed
Update base for Update on "[ET-VK][ez] Allow logit linear layer to be lowered to Vulkan"
## Context Due to poor performance of Vulkan's int4 linear operator, the final logit layer of the transformer model was not being delegated to vulkan, and was instead quantized and executed with the XNNPACK delegate. However, with D72412950 / #9883 decent performance can now be achieved with Vulkan/s int4 linear op. Therefore, the final logit layer can be lowered to Vulkan. ## Changes * Remove limit from `VkInt4WeightOnlyQuantizer` that was causing it to ignore the logit layer of the transformer * Do not apply XNNPACK partitioner and quantizer when lowering with Vulkan Differential Revision: [D72480177](https://our.internmc.facebook.com/intern/diff/D72480177/) cc manuelcandales cbilgin [ghstack-poisoned]
1 parent d874244 commit 5232a22

File tree

1 file changed

+0
-12
lines changed

1 file changed

+0
-12
lines changed

Diff for: backends/vulkan/runtime/graph/ops/glsl/q_4w_linear.glsl

-12
Original file line numberDiff line numberDiff line change
@@ -33,18 +33,6 @@ layout(local_size_x_id = 0, local_size_y_id = 1, local_size_z_id = 2) in;
3333

3434
layout(constant_id = 3) const int group_size = 64;
3535

36-
uint8_t get_first(const uint8_t packed) {
37-
return uint8_t((packed & 0xF0) >> 4);
38-
}
39-
40-
uint8_t get_second(const uint8_t packed) {
41-
return uint8_t(packed & 0x0F);
42-
}
43-
44-
uint8_t combine(const uint8_t first, const uint8_t second) {
45-
return uint8_t(first << 4 | second);
46-
}
47-
4836
/*
4937
* This shader computes a linear operator between a floating point input matrix
5038
* x and a weights matrix that is quantized to 4 bits.

0 commit comments

Comments
 (0)