You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf: tweak the pipeline design of mla kernel (#901)
1. defer barrier sync for `p_smem`
2. change unroll number from 1 to 2
We found there are still significant overhead for synchronizing two
consumers in qk stage. Use only one warpgroup for qk can resolve the
issue.
0 commit comments