You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# int32_t n_gpu_layers; // number of layers to store in VRAM
654
670
# enum llama_split_mode split_mode; // how to split the model across multiple GPUs
655
671
@@ -684,6 +700,7 @@ class llama_model_params(ctypes.Structure):
684
700
"""Parameters for llama_model
685
701
686
702
Attributes:
703
+
tensor_buft_overrides(llama_model_tensor_buft_override): NULL-terminated list of buffer types to use for tensors that match a pattern
687
704
n_gpu_layers (int): number of layers to store in VRAM
688
705
split_mode (int): how to split the model across multiple GPUs
689
706
main_gpu (int): the GPU that is used for the entire model. main_gpu interpretation depends on split_mode: LLAMA_SPLIT_NONE: the GPU that is used for the entire model LLAMA_SPLIT_ROW: the GPU that is used for small tensors and intermediate results LLAMA_SPLIT_LAYER: ignored
@@ -697,6 +714,7 @@ class llama_model_params(ctypes.Structure):
697
714
check_tensors (bool): validate model tensor data"""
0 commit comments