Skip to content

Commit b4b0d53

Browse files
committed
server: docs: --no-mul-mat-q,-nommq
1 parent 78aacf3 commit b4b0d53

File tree

2 files changed

+2
-0
lines changed

2 files changed

+2
-0
lines changed

examples/server/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ The project is under active development, and we are [looking for feedback and co
2727
- `-b N`, `--batch-size N`: Set the batch size for prompt processing. Default: `512`.
2828
- `--memory-f32`: Use 32-bit floats instead of 16-bit floats for memory key+value. Not recommended.
2929
- `--mlock`: Lock the model in memory, preventing it from being swapped out when memory-mapped.
30+
- `--no-mul-mat-q,-nommq`: Disable mul_mat_q kernels
3031
- `--no-mmap`: Do not memory-map the model. By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed.
3132
- `--numa STRATEGY`: Attempt one of the below optimization strategies that help on some NUMA systems
3233
- `--numa distribute`: Spread execution evenly over all nodes

examples/server/server.cpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2085,6 +2085,7 @@ static void server_print_usage(const char *argv0, const gpt_params &params,
20852085
{
20862086
printf(" --no-mmap do not memory-map model (slower load but may reduce pageouts if not using mlock)\n");
20872087
}
2088+
printf(" --no-mul-mat-q,-nommq Disable mul_mat_q kernels\n");
20882089
printf(" --numa TYPE attempt optimizations that help on some NUMA systems\n");
20892090
printf(" - distribute: spread execution evenly over all nodes\n");
20902091
printf(" - isolate: only spawn threads on CPUs on the node that execution started on\n");

0 commit comments

Comments
 (0)