|
1 | 1 | # Changelog
|
2 | 2 |
|
| 3 | +## [0.2.1](https://github.com/flashinfer-ai/flashinfer/compare/v0.2.0.post2...v0.2.1) |
| 4 | + |
| 5 | +### What's Changed |
| 6 | +* misc: addressing the package renaming issues by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/770 |
| 7 | +* feat: support deepseek prefill attention shape by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/765 |
| 8 | +* refactor: change the structure of attention updater by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/772 |
| 9 | +* hotfix: follow up of #772 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/773 |
| 10 | +* bugfix: Ensure Loop Termination by Enforcing IEEE-754 Compliance in Sampling Kernels by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/774 |
| 11 | +* bugfix: fix the JIT warmup arguments in unittests by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/775 |
| 12 | +* ci: change whl folder to flashinfer-python by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/779 |
| 13 | +* perf: refactor fa2 prefill template by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/776 |
| 14 | +* feat: Separate QK/VO head dim dispatch for sm90 AOT by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/778 |
| 15 | +* bugfix: fix batch prefill attention kernel unittests by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/781 |
| 16 | +* misc: remove head dimension 64 from AOT by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/782 |
| 17 | +* misc: allow head_dim=64 for sm90 AOT by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/783 |
| 18 | +* bugfix: drop CTA_TILE_Q=32 by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/785 |
| 19 | +* refactor: make `group_size` a part of params by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/786 |
| 20 | +* bugfix: MLA decode should multiply sm_scale by math::log2e by @tsu-bin in https://github.com/flashinfer-ai/flashinfer/pull/787 |
| 21 | +* fix rope logic in mla decoding by @zhyncs in https://github.com/flashinfer-ai/flashinfer/pull/793 |
| 22 | +* Fix arguments of `plan` for split QK/VO head dims by @abmfy in https://github.com/flashinfer-ai/flashinfer/pull/795 |
| 23 | +* test: add unittest comparing deepseek prefill fa2 & 3 implementation by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/797 |
| 24 | +* bugfix: fix aot build not compatible with cmake command by @tsu-bin in https://github.com/flashinfer-ai/flashinfer/pull/796 |
| 25 | +* Fix the type annotation of q_dtype and kv_dtype on ragged prefill by @nandor in https://github.com/flashinfer-ai/flashinfer/pull/798 |
| 26 | +* feat: support f32 attention output in FA2 template by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/799 |
| 27 | +* feat: apply sm_scale at logits instead of q in FA2 template by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/801 |
| 28 | +* bugfix: mla decode failed under cuda graph mode, and update test case by @tsu-bin in https://github.com/flashinfer-ai/flashinfer/pull/803 |
| 29 | +* perf: memory efficient deepseek mla fused page-attention kernel by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/804 |
| 30 | +* bugfix: mla page-attention kernel for different page sizes by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/810 |
| 31 | +* doc: add documentation to new MLA interface by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/811 |
| 32 | +* feat: unlocking MLA for A100 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/812 |
| 33 | +* feat: cudagraph-compatible MLA API by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/813 |
| 34 | +* feat: unlock MLA attention for sm89 (L40/L40s/4090) by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/814 |
| 35 | +* misc: fix sphinx by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/815 |
| 36 | +* bugfix: fix the behavior of mla plan function when provided with host tensors by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/816 |
| 37 | +* doc: improve mla related documentation by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/818 |
| 38 | + |
| 39 | +### New Contributors |
| 40 | +* @abmfy made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/795 |
| 41 | + |
| 42 | +## [0.2.0.post2](https://github.com/flashinfer-ai/flashinfer/compare/v0.2.0.post1...v0.2.0.post2) |
| 43 | + |
| 44 | +### What's Changed |
| 45 | +* ci: fix the update_whl_index script to regonize version number with "post" and add torch2.5 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/694 |
| 46 | +* bugfix: casting int array to int32 for rope input arguments by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/697 |
| 47 | +* bugfix: only use sm90 group gemm when torch cuda >= 12.3 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/699 |
| 48 | +* misc: remove release-please workflow by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/705 |
| 49 | +* Customizable SM90 prefill kernels. by @hyhieu in https://github.com/flashinfer-ai/flashinfer/pull/704 |
| 50 | +* hotfix: revert torch.library register by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/709 |
| 51 | +* Improve compatibility with pytorch 2.5 by @zifeitong in https://github.com/flashinfer-ai/flashinfer/pull/711 |
| 52 | +* misc: add bibtex reference by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/712 |
| 53 | +* sampling: simplify min-p sampling by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/713 |
| 54 | +* perf: fix the iteration bound of SWA in FA2 prefill template by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/714 |
| 55 | +* bugfix: fix min-p AOT compilation in #713 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/717 |
| 56 | +* Triton implementation of `silu_and_mul` by @nandor in https://github.com/flashinfer-ai/flashinfer/pull/716 |
| 57 | +* bugfix: FusedAddRMSNorm kernels might require more than 48KB shared memory when d is large. by @bobboli in https://github.com/flashinfer-ai/flashinfer/pull/718 |
| 58 | +* bugfix: Choose sm90 kernels only for Hopper GPUs. by @bobboli in https://github.com/flashinfer-ai/flashinfer/pull/719 |
| 59 | +* Finer-grained control over fp16/fp8 builds by @nandor in https://github.com/flashinfer-ai/flashinfer/pull/722 |
| 60 | +* Align KV chunk size binary search with actual KV chunk splitting. by @timzsu in https://github.com/flashinfer-ai/flashinfer/pull/728 |
| 61 | +* ci: rename python package name to `flashinfer-python` by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/729 |
| 62 | +* Add a note about int32/int64 datatypes to the `kv_layout` tutorial by @fergusfinn in https://github.com/flashinfer-ai/flashinfer/pull/737 |
| 63 | +* fix return type of cuBLAS by @zhyncs in https://github.com/flashinfer-ai/flashinfer/pull/749 |
| 64 | +* [Refactor] Unify JIT/Customization/AOT mode by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/748 |
| 65 | +* Move allocations out of torch ops by @nandor in https://github.com/flashinfer-ai/flashinfer/pull/740 |
| 66 | +* [Lint] Fix some linting issues and provide automatic format check script by @LeiWang1999 in https://github.com/flashinfer-ai/flashinfer/pull/743 |
| 67 | +* Filter out unsupported head dim for sm90 by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/751 |
| 68 | +* bugfix: various AOT issues by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/752 |
| 69 | +* [bugfix] Fix cpp tests/benchmarks by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/753 |
| 70 | +* fix pin memory device by @youkaichao in https://github.com/flashinfer-ai/flashinfer/pull/755 |
| 71 | +* Add dev container for easier development by @ByronHsu in https://github.com/flashinfer-ai/flashinfer/pull/680 |
| 72 | +* hotfix: bugfix to #756 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/757 |
| 73 | +* Change `apply_rope_with_cos_sin_cache` to accept `cos_sin_cache` by @ByronHsu in https://github.com/flashinfer-ai/flashinfer/pull/754 |
| 74 | +* fix: match statement not supported in Python 3.8 by @xslingcn in https://github.com/flashinfer-ai/flashinfer/pull/759 |
| 75 | +* bugfix: use actual sm count for num_sm90_ctas by @LLLLKKKK in https://github.com/flashinfer-ai/flashinfer/pull/762 |
| 76 | +* bugfix: Fix block-sparse attention API by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/767 |
| 77 | +* Version bump: v0.2.0.post2 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/768 |
| 78 | + |
| 79 | +### New Contributors |
| 80 | +* @hyhieu made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/704 |
| 81 | +* @zifeitong made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/711 |
| 82 | +* @bobboli made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/718 |
| 83 | +* @timzsu made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/728 |
| 84 | +* @fergusfinn made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/737 |
| 85 | +* @LeiWang1999 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/743 |
| 86 | +* @youkaichao made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/755 |
| 87 | +* @LLLLKKKK made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/762 |
| 88 | + |
| 89 | + |
3 | 90 | ## [0.2.0.post1](https://github.com/flashinfer-ai/flashinfer/compare/v0.2.0...v0.2.0.post1) (2024-12-22)
|
4 | 91 |
|
5 | 92 | ### Bug Fixes
|
|
0 commit comments