Skip to content

Commit 20ef8b4

Browse files
authored
Change GEMM kernels to natively handle broader range of row counts (#406)
The outer GEMM loop repeatedly calls the inner GEMM kernel with a row count (the M parameter to GEMM) and the inner kernel decides how many rows it will actually handle. The FMA3 kernel only handled row counts of 1,3,6 to keep code size down. To be competitive however, the FMA3 kernel needs to handle any row count from 1-6. One example model was issuing a GEMM with M=11 and this had been broken up into 6,3,1,1, but can now be handled as 6,5. The kernels have been templatized MASM style to avoid the cut/paste code from the original implementation. The Linux variants will be updated after doing some additional work on the MASM variants first.
1 parent d75bdc5 commit 20ef8b4

File tree

7 files changed

+847
-1124
lines changed

7 files changed

+847
-1124
lines changed

onnxruntime/core/mlas/lib/amd64/LogisticKernelFma3.asm

-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@
1919

2020
.xlist
2121
INCLUDE mlasi.inc
22-
INCLUDE SgemmKernelCommon.inc
2322
.list
2423

2524
EXTERN MlasMaskMoveAvx:NEAR

0 commit comments

Comments
 (0)