@@ -96,7 +96,7 @@ def single_prefill_with_kv_cache(
96
96
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
97
97
Default is ``NONE``.
98
98
logits_cap : bool
99
- Whether to apply logits cap to pre-attention logits.
99
+ Whether to apply logits cap to pre-softmax logits.
100
100
If ``True``, the logits will be capped according to formula (proposed in
101
101
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
102
102
Defaults to ``False``.
@@ -240,7 +240,7 @@ def single_prefill_with_kv_cache_return_lse(
240
240
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
241
241
Default is ``NONE``.
242
242
logits_cap : bool
243
- Whether to apply logits cap to pre-attention logits.
243
+ Whether to apply logits cap to pre-softmax logits.
244
244
If ``True``, the logits will be capped according to formula (proposed in
245
245
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
246
246
Defaults to ``False``.
@@ -770,7 +770,7 @@ def forward(
770
770
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
771
771
Default is ``NONE``.
772
772
logits_cap : bool
773
- Whether to apply logits cap to pre-attention logits,
773
+ Whether to apply logits cap to pre-softmax logits,
774
774
If ``True``, the logits will be capped according to formula (proposed in
775
775
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
776
776
Defaults to ``False``.
@@ -874,7 +874,7 @@ def forward_return_lse(
874
874
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
875
875
Default is ``NONE``.
876
876
logits_cap : bool
877
- Whether to apply logits cap to pre-attention logits.
877
+ Whether to apply logits cap to pre-softmax logits.
878
878
If ``True``, the logits will be capped according to formula (proposed in
879
879
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
880
880
Defaults to ``False``.
@@ -1276,7 +1276,7 @@ def forward(
1276
1276
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
1277
1277
Default is ``NONE``.
1278
1278
logits_cap : bool
1279
- Whether to apply logits cap to pre-attention logits.
1279
+ Whether to apply logits cap to pre-softmax logits.
1280
1280
If ``True``, the logits will be capped according to formula (proposed in
1281
1281
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
1282
1282
Defaults to ``False``.
@@ -1378,7 +1378,7 @@ def forward_return_lse(
1378
1378
``NONE``/``ROPE_LLAMA`` (LLAMA style rotary embedding) /``ALIBI``.
1379
1379
Default is ``NONE``.
1380
1380
logits_cap : bool
1381
- Whether to apply logits cap to pre-attention logits.
1381
+ Whether to apply logits cap to pre-softmax logits.
1382
1382
If ``True``, the logits will be capped according to formula (proposed in
1383
1383
Grok-1): :math:`30 \times \mathrm{tanh}(x / 30)`, where :math:`x` is the input logits.
1384
1384
Defaults to ``False``.
0 commit comments