v0.2.1.post2
·
80 commits
to main
since this release
What's Changed
- use 3 latest pytorch version by @youkaichao in #835
- docs: update installation by @zhyncs in #839
- Update README.md: fixing a typo for "hierical" by @didier-durand in #836
- Update page.rst: fixing 1 typo by @didier-durand in #841
- Update README.md: fixing 1 typo by @didier-durand in #842
- adds
TensorRT-LLM
to the list of projects adopting FlashInfer by @yzh119 in #843 - perf: MLA decode kernel implemented by CuTe targeted to SM80 by @tsu-bin in #844
- Update installation.rst: fixing 2 typos by @didier-durand in #840
- fix: Pass backend in BatchPrefillWith*KVCacheWrapper.plan() by @sfc-gh-yewang in #808
- bugfix: Fix inline RoPE in decode kernels by @MasterJH5574 in #847
- misc: Remove duplicate param set in MLA kernel by @MasterJH5574 in #850
- feat: adding
out
andlse
parameters torun
functions to allow user allocated output buffer by @yzh119 in #854 - Unique the symbol of maybe_q_rope_offset_v. by @foreverlms in #855
- typo: update
decode_maybe_q_rope_offset
by @MasterJH5574 in #856 - update ci by @zhyncs in #857
- fix some compiler pre-check. by @foreverlms in #859
- perf: dynamic split-k for MLA by @yzh119 in #863
- Revert "fix: Pass backend in BatchPrefillWith*KVCacheWrapper.plan() (… by @zhyncs in #864
- chore: bump v0.2.1.post2 by @zhyncs in #865
- fix compile by @zhyncs in #866
New Contributors
- @didier-durand made their first contribution in #836
- @sfc-gh-yewang made their first contribution in #808
- @foreverlms made their first contribution in #855
Full Changelog: v0.2.1.post1...v0.2.1.post2