What's Changed
- Fix compilation with FP16_QK_REDUCTION enabled. by @diptorupd in #962
- misc: Use environment variable to control JIT verbose flag by @yzh119 in #981
- Triton
rms_norm
kernels by @nandor in #983 - Allow passing workspace base directory via environment variable by @jsuchome in #973
- [CHORE] Rename
output_emitted_token_num
->output_emitted_draft_token_num
by @jon-chuang in #977 - ci: switch to on-demand instances if spot instance is interrupted by @yzh119 in #987
- misc: update devcontainer by @yzh119 in #986
- ci: add torch 2.6+cu126 wheel by @yzh119 in #985
- misc: fix devcontainer conda path by @yzh119 in #989
- perf: prefetch page indices for mla kernel by @yzh119 in #991
- SM-constraint-GEMM by triton persistent kernel by @yyihuang in #982
- 3rdparty: upgrade cutlass to 3.9 by @yzh119 in #997
- perf: add
-DNDEBUG
compilation flag by @yzh119 in #998 - release: bump version to v0.2.5 by @yzh119 in #999
New Contributors
- @jsuchome made their first contribution in #973
- @jon-chuang made their first contribution in #977
- @yyihuang made their first contribution in #982
Full Changelog: v0.2.4...v0.2.5