You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
kv_chunk_size: Optional[int]: key/value chunks size. if None: defaults to sqrt(key_tokens)
153
160
kv_chunk_size_min: Optional[int]: key/value minimum chunk size. only considered when kv_chunk_size is None. changes `sqrt(key_tokens)` into `max(sqrt(key_tokens), kv_chunk_size_min)`, to ensure our chunk sizes don't get too small (smaller chunks = more chunks = less concurrent work done).
154
-
use_checkpoint: bool: whether to use checkpointing (recommended True for training, False for inference)
161
+
use_checkpoint: Optional[bool]: whether to use checkpointing (recommended True for training, False for inference)
162
+
key_needs_transpose: Optional[bool]: whether key needs a transpose. defaults to True
155
163
Returns:
156
164
Output of shape `[batch * num_heads, query_tokens, channels_per_head]`.
0 commit comments