Skip to content

Commit c18745b

Browse files
authored
doc: fix the description of logits cap in docstring (#299)
The logits cap was applied to pre-attention logits, not attention scores.
1 parent ab1e2ad commit c18745b

File tree

2 files changed

+24
-24
lines changed

2 files changed

+24
-24
lines changed

python/flashinfer/decode.py

+12-12
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,8 @@ def single_decode_with_kv_cache(
8686
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
8787
Defaults to ``NONE``.
8888
logits_cap : bool
89-
Whether to apply logits cap to attention scores.
90-
If ``True``, the attention scores will be capped according to formula (proposed in
89+
Whether to apply logits cap to pre-attention logits.
90+
If ``True``, the logits will be capped according to formula (proposed in
9191
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
9292
Defaults to ``False``.
9393
q_scale : Optional[float]
@@ -199,8 +199,8 @@ def batch_decode_with_padded_kv_cache(
199199
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
200200
Defaults to ``NONE``.
201201
logits_cap : bool
202-
Whether to apply logits cap to attention scores.
203-
If ``True``, the attention scores will be capped according to formula (proposed in
202+
Whether to apply logits cap to pre-attention logits.
203+
If ``True``, the logits will be capped according to formula (proposed in
204204
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
205205
Defaults to ``False``.
206206
q_scale : Optional[float]
@@ -312,8 +312,8 @@ def batch_decode_with_padded_kv_cache_return_lse(
312312
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
313313
Defaults to ``NONE``.
314314
logits_cap : bool
315-
Whether to apply logits cap to attention scores.
316-
If ``True``, the attention scores will be capped according to formula (proposed in
315+
Whether to apply logits cap to pre-attention logits.
316+
If ``True``, the logits will be capped according to formula (proposed in
317317
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
318318
Defaults to ``False``.
319319
q_scale : Optional[float]
@@ -592,8 +592,8 @@ def begin_forward(
592592
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
593593
Defaults to ``NONE``.
594594
logits_cap: bool
595-
Whether to apply logits cap to attention scores.
596-
If ``True``, the attention scores will be capped according to formula (proposed in
595+
Whether to apply logits cap to pre-attention logits.
596+
If ``True``, the logits will be capped according to formula (proposed in
597597
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
598598
Defaults to ``False``.
599599
data_type : Union[str, torch.dtype]
@@ -704,8 +704,8 @@ def forward(
704704
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
705705
Defaults to ``NONE``.
706706
logits_cap: bool
707-
Whether to apply logits cap to attention scores.
708-
If ``True``, the attention scores will be capped according to formula (proposed in
707+
Whether to apply logits cap to pre-attention logits.
708+
If ``True``, the logits will be capped according to formula (proposed in
709709
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
710710
Defaults to ``False``.
711711
q_scale : Optional[float]
@@ -789,8 +789,8 @@ def forward_return_lse(
789789
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
790790
Defaults to ``NONE``.
791791
logits_cap: bool
792-
Whether to apply logits cap to attention scores.
793-
If ``True``, the attention scores will be capped according to formula (proposed in
792+
Whether to apply logits cap to pre-attention logits.
793+
If ``True``, the logits will be capped according to formula (proposed in
794794
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
795795
Defaults to ``False``.
796796
q_scale : Optional[float]

python/flashinfer/prefill.py

+12-12
Original file line numberDiff line numberDiff line change
@@ -96,8 +96,8 @@ def single_prefill_with_kv_cache(
9696
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
9797
Default is ``NONE``.
9898
logits_cap : bool
99-
Whether to apply logits cap to attention scores.
100-
If ``True``, the attention scores will be capped according to formula (proposed in
99+
Whether to apply logits cap to pre-attention logits.
100+
If ``True``, the logits will be capped according to formula (proposed in
101101
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
102102
Defaults to ``False``.
103103
allow_fp16_qk_reduction : bool
@@ -240,8 +240,8 @@ def single_prefill_with_kv_cache_return_lse(
240240
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
241241
Default is ``NONE``.
242242
logits_cap : bool
243-
Whether to apply logits cap to attention scores.
244-
If ``True``, the attention scores will be capped according to formula (proposed in
243+
Whether to apply logits cap to pre-attention logits.
244+
If ``True``, the logits will be capped according to formula (proposed in
245245
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
246246
Defaults to ``False``.
247247
allow_fp16_qk_reduction : bool
@@ -770,8 +770,8 @@ def forward(
770770
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
771771
Default is ``NONE``.
772772
logits_cap : bool
773-
Whether to apply logits cap to attention scores.
774-
If ``True``, the attention scores will be capped according to formula (proposed in
773+
Whether to apply logits cap to pre-attention logits,
774+
If ``True``, the logits will be capped according to formula (proposed in
775775
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
776776
Defaults to ``False``.
777777
allow_fp16_qk_reduction : bool
@@ -874,8 +874,8 @@ def forward_return_lse(
874874
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
875875
Default is ``NONE``.
876876
logits_cap : bool
877-
Whether to apply logits cap to attention scores.
878-
If ``True``, the attention scores will be capped according to formula (proposed in
877+
Whether to apply logits cap to pre-attention logits.
878+
If ``True``, the logits will be capped according to formula (proposed in
879879
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
880880
Defaults to ``False``.
881881
allow_fp16_qk_reduction : bool
@@ -1276,8 +1276,8 @@ def forward(
12761276
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
12771277
Default is ``NONE``.
12781278
logits_cap : bool
1279-
Whether to apply logits cap to attention scores.
1280-
If ``True``, the attention scores will be capped according to formula (proposed in
1279+
Whether to apply logits cap to pre-attention logits.
1280+
If ``True``, the logits will be capped according to formula (proposed in
12811281
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
12821282
Defaults to ``False``.
12831283
allow_fp16_qk_reduction : bool
@@ -1378,8 +1378,8 @@ def forward_return_lse(
13781378
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
13791379
Default is ``NONE``.
13801380
logits_cap : bool
1381-
Whether to apply logits cap to attention scores.
1382-
If ``True``, the attention scores will be capped according to formula (proposed in
1381+
Whether to apply logits cap to pre-attention logits.
1382+
If ``True``, the logits will be capped according to formula (proposed in
13831383
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
13841384
Defaults to ``False``.
13851385
allow_fp16_qk_reduction : bool

0 commit comments

Comments
 (0)