You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ORT 1.17.0 Release] Cherry-pick Final Round (#19327)
### Description
<!-- Describe your changes. -->
Cherry-pick Final Round
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Adrian Lizarraga <[email protected]>
Co-authored-by: Changming Sun <[email protected]>
Co-authored-by: Chi Lo <[email protected]>
Co-authored-by: rachguo <[email protected]>
Co-authored-by: Edward Chen <[email protected]>
Co-authored-by: kunal-vaishnavi <[email protected]>
Co-authored-by: aciddelgado <[email protected]>
Co-authored-by: Yufeng Li <[email protected]>
Copy file name to clipboardExpand all lines: docs/ContribOperators.md
+12-4
Original file line number
Diff line number
Diff line change
@@ -2398,24 +2398,28 @@ This version of the operator has been available since version 1 of the 'com.micr
2398
2398
#### Attributes
2399
2399
2400
2400
<dl>
2401
+
<dt><tt>do_rotary</tt> : int</dt>
2402
+
<dd>Whether to use rotary position embedding. Default value is 0.</dd>
2401
2403
<dt><tt>kv_num_heads</tt> : int (required)</dt>
2402
2404
<dd>Number of attention heads for k and v</dd>
2403
2405
<dt><tt>local_window_size</tt> : int</dt>
2404
2406
<dd>left_window_size for local attention (like Mistral). Default value is -1 meaning unused.</dd>
2405
2407
<dt><tt>num_heads</tt> : int (required)</dt>
2406
2408
<dd>Number of attention heads for q</dd>
2409
+
<dt><tt>rotary_interleaved</tt> : int</dt>
2410
+
<dd>Rotate using interleaved pattern. Default value is 0 (False).</dd>
2407
2411
<dt><tt>scale</tt> : float</dt>
2408
2412
<dd>Custom scale will be used if specified. Default value is 1/sqrt(head_size)</dd>
2409
2413
</dl>
2410
2414
2411
-
#### Inputs
2415
+
#### Inputs (7 - 9)
2412
2416
2413
2417
<dl>
2414
2418
<dt><tt>query</tt> : T</dt>
2415
-
<dd>Query with shape (batch_size, sequence_length, hidden_size)</dd>
2416
-
<dt><tt>key</tt> : T</dt>
2419
+
<dd>Query with shape (batch_size, sequence_length, hidden_size), or packed QKV with shape(batch_size, sequence_length, d) where d is (num_heads * head_size + 2 * kv_num_heads * head_size).</dd>
2420
+
<dt><tt>key</tt> (optional) : T</dt>
2417
2421
<dd>Key with shape (batch_size, kv_sequence_length, kv_hidden_size) </dd>
2418
-
<dt><tt>value</tt> : T</dt>
2422
+
<dt><tt>value</tt> (optional) : T</dt>
2419
2423
<dd>Value with shape (batch_size, kv_sequence_length, kv_hidden_size)</dd>
2420
2424
<dt><tt>past_key</tt> (optional) : T</dt>
2421
2425
<dd>past state key with support for format BNSH. When past_key uses same tensor as present_key(k-v cache), it is of length max_sequence_length... otherwise of length past_sequence_length.</dd>
@@ -2425,6 +2429,10 @@ This version of the operator has been available since version 1 of the 'com.micr
2425
2429
<dd>1d Tensor of shape (batch_size). Indicates past sequence lengths for token generation case.</dd>
2426
2430
<dt><tt>total_sequence_length</tt> : M</dt>
2427
2431
<dd>Scalar tensor of total sequence length (past + new).</dd>
2432
+
<dt><tt>cos_cache</tt> (optional) : T</dt>
2433
+
<dd>2D tensor with shape (max_sequence_length, head_size / 2).</dd>
2434
+
<dt><tt>sin_cache</tt> (optional) : T</dt>
2435
+
<dd>2D tensor with shape (max_sequence_length, head_size / 2).</dd>
0 commit comments