Skip to content

Commit 986dc07

Browse files
authored
release: bump version to v0.2.1 (#819)
1 parent 88fa03f commit 986dc07

File tree

2 files changed

+88
-1
lines changed

2 files changed

+88
-1
lines changed

CHANGELOG.md

+87
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,92 @@
11
# Changelog
22

3+
## [0.2.1](https://github.com/flashinfer-ai/flashinfer/compare/v0.2.0.post2...v0.2.1)
4+
5+
### What's Changed
6+
* misc: addressing the package renaming issues by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/770
7+
* feat: support deepseek prefill attention shape by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/765
8+
* refactor: change the structure of attention updater by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/772
9+
* hotfix: follow up of #772 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/773
10+
* bugfix: Ensure Loop Termination by Enforcing IEEE-754 Compliance in Sampling Kernels by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/774
11+
* bugfix: fix the JIT warmup arguments in unittests by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/775
12+
* ci: change whl folder to flashinfer-python by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/779
13+
* perf: refactor fa2 prefill template by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/776
14+
* feat: Separate QK/VO head dim dispatch for sm90 AOT by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/778
15+
* bugfix: fix batch prefill attention kernel unittests by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/781
16+
* misc: remove head dimension 64 from AOT by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/782
17+
* misc: allow head_dim=64 for sm90 AOT by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/783
18+
* bugfix: drop CTA_TILE_Q=32 by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/785
19+
* refactor: make `group_size` a part of params by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/786
20+
* bugfix: MLA decode should multiply sm_scale by math::log2e by @tsu-bin in https://github.com/flashinfer-ai/flashinfer/pull/787
21+
* fix rope logic in mla decoding by @zhyncs in https://github.com/flashinfer-ai/flashinfer/pull/793
22+
* Fix arguments of `plan` for split QK/VO head dims by @abmfy in https://github.com/flashinfer-ai/flashinfer/pull/795
23+
* test: add unittest comparing deepseek prefill fa2 & 3 implementation by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/797
24+
* bugfix: fix aot build not compatible with cmake command by @tsu-bin in https://github.com/flashinfer-ai/flashinfer/pull/796
25+
* Fix the type annotation of q_dtype and kv_dtype on ragged prefill by @nandor in https://github.com/flashinfer-ai/flashinfer/pull/798
26+
* feat: support f32 attention output in FA2 template by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/799
27+
* feat: apply sm_scale at logits instead of q in FA2 template by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/801
28+
* bugfix: mla decode failed under cuda graph mode, and update test case by @tsu-bin in https://github.com/flashinfer-ai/flashinfer/pull/803
29+
* perf: memory efficient deepseek mla fused page-attention kernel by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/804
30+
* bugfix: mla page-attention kernel for different page sizes by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/810
31+
* doc: add documentation to new MLA interface by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/811
32+
* feat: unlocking MLA for A100 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/812
33+
* feat: cudagraph-compatible MLA API by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/813
34+
* feat: unlock MLA attention for sm89 (L40/L40s/4090) by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/814
35+
* misc: fix sphinx by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/815
36+
* bugfix: fix the behavior of mla plan function when provided with host tensors by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/816
37+
* doc: improve mla related documentation by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/818
38+
39+
### New Contributors
40+
* @abmfy made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/795
41+
42+
## [0.2.0.post2](https://github.com/flashinfer-ai/flashinfer/compare/v0.2.0.post1...v0.2.0.post2)
43+
44+
### What's Changed
45+
* ci: fix the update_whl_index script to regonize version number with "post" and add torch2.5 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/694
46+
* bugfix: casting int array to int32 for rope input arguments by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/697
47+
* bugfix: only use sm90 group gemm when torch cuda >= 12.3 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/699
48+
* misc: remove release-please workflow by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/705
49+
* Customizable SM90 prefill kernels. by @hyhieu in https://github.com/flashinfer-ai/flashinfer/pull/704
50+
* hotfix: revert torch.library register by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/709
51+
* Improve compatibility with pytorch 2.5 by @zifeitong in https://github.com/flashinfer-ai/flashinfer/pull/711
52+
* misc: add bibtex reference by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/712
53+
* sampling: simplify min-p sampling by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/713
54+
* perf: fix the iteration bound of SWA in FA2 prefill template by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/714
55+
* bugfix: fix min-p AOT compilation in #713 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/717
56+
* Triton implementation of `silu_and_mul` by @nandor in https://github.com/flashinfer-ai/flashinfer/pull/716
57+
* bugfix: FusedAddRMSNorm kernels might require more than 48KB shared memory when d is large. by @bobboli in https://github.com/flashinfer-ai/flashinfer/pull/718
58+
* bugfix: Choose sm90 kernels only for Hopper GPUs. by @bobboli in https://github.com/flashinfer-ai/flashinfer/pull/719
59+
* Finer-grained control over fp16/fp8 builds by @nandor in https://github.com/flashinfer-ai/flashinfer/pull/722
60+
* Align KV chunk size binary search with actual KV chunk splitting. by @timzsu in https://github.com/flashinfer-ai/flashinfer/pull/728
61+
* ci: rename python package name to `flashinfer-python` by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/729
62+
* Add a note about int32/int64 datatypes to the `kv_layout` tutorial by @fergusfinn in https://github.com/flashinfer-ai/flashinfer/pull/737
63+
* fix return type of cuBLAS by @zhyncs in https://github.com/flashinfer-ai/flashinfer/pull/749
64+
* [Refactor] Unify JIT/Customization/AOT mode by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/748
65+
* Move allocations out of torch ops by @nandor in https://github.com/flashinfer-ai/flashinfer/pull/740
66+
* [Lint] Fix some linting issues and provide automatic format check script by @LeiWang1999 in https://github.com/flashinfer-ai/flashinfer/pull/743
67+
* Filter out unsupported head dim for sm90 by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/751
68+
* bugfix: various AOT issues by @abcdabcd987 in https://github.com/flashinfer-ai/flashinfer/pull/752
69+
* [bugfix] Fix cpp tests/benchmarks by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/753
70+
* fix pin memory device by @youkaichao in https://github.com/flashinfer-ai/flashinfer/pull/755
71+
* Add dev container for easier development by @ByronHsu in https://github.com/flashinfer-ai/flashinfer/pull/680
72+
* hotfix: bugfix to #756 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/757
73+
* Change `apply_rope_with_cos_sin_cache` to accept `cos_sin_cache` by @ByronHsu in https://github.com/flashinfer-ai/flashinfer/pull/754
74+
* fix: match statement not supported in Python 3.8 by @xslingcn in https://github.com/flashinfer-ai/flashinfer/pull/759
75+
* bugfix: use actual sm count for num_sm90_ctas by @LLLLKKKK in https://github.com/flashinfer-ai/flashinfer/pull/762
76+
* bugfix: Fix block-sparse attention API by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/767
77+
* Version bump: v0.2.0.post2 by @yzh119 in https://github.com/flashinfer-ai/flashinfer/pull/768
78+
79+
### New Contributors
80+
* @hyhieu made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/704
81+
* @zifeitong made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/711
82+
* @bobboli made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/718
83+
* @timzsu made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/728
84+
* @fergusfinn made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/737
85+
* @LeiWang1999 made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/743
86+
* @youkaichao made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/755
87+
* @LLLLKKKK made their first contribution in https://github.com/flashinfer-ai/flashinfer/pull/762
88+
89+
390
## [0.2.0.post1](https://github.com/flashinfer-ai/flashinfer/compare/v0.2.0...v0.2.0.post1) (2024-12-22)
491

592
### Bug Fixes

version.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.2.0.post2
1+
0.2.1

0 commit comments

Comments
 (0)