Skip to content

v0.2.5

Latest
Compare
Choose a tag to compare
@yzh119 yzh119 released this 04 Apr 00:41
· 1 commit to main since this release
592b110

What's Changed

  • Fix compilation with FP16_QK_REDUCTION enabled. by @diptorupd in #962
  • misc: Use environment variable to control JIT verbose flag by @yzh119 in #981
  • Triton rms_norm kernels by @nandor in #983
  • Allow passing workspace base directory via environment variable by @jsuchome in #973
  • [CHORE] Rename output_emitted_token_num -> output_emitted_draft_token_num by @jon-chuang in #977
  • ci: switch to on-demand instances if spot instance is interrupted by @yzh119 in #987
  • misc: update devcontainer by @yzh119 in #986
  • ci: add torch 2.6+cu126 wheel by @yzh119 in #985
  • misc: fix devcontainer conda path by @yzh119 in #989
  • perf: prefetch page indices for mla kernel by @yzh119 in #991
  • SM-constraint-GEMM by triton persistent kernel by @yyihuang in #982
  • 3rdparty: upgrade cutlass to 3.9 by @yzh119 in #997
  • perf: add -DNDEBUG compilation flag by @yzh119 in #998
  • release: bump version to v0.2.5 by @yzh119 in #999

New Contributors

Full Changelog: v0.2.4...v0.2.5