You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/main/README.md
+4
Original file line number
Diff line number
Diff line change
@@ -282,6 +282,10 @@ These options help improve the performance and memory usage of the LLaMA models.
282
282
283
283
-`--no-mmap`: Do not memory-map the model. By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed. However, if the model is larger than your total amount of RAM or if your system is low on available memory, using mmap might increase the risk of pageouts, negatively impacting performance. Disabling mmap results in slower load times but may reduce pageouts if you're not using `--mlock`. Note that if the model is larger than the total amount of RAM, turning off mmap would prevent the model from loading at all.
284
284
285
+
### Direct I/O
286
+
287
+
-`--direct-io`: Use direct I/O. Potentially faster uncached loading, fewer pageouts, no page cache pollution. You may benefit from this option if you load a model for the first time (or after some time), load several different models consecutively, or simply want to keep the page cache clean. The faster your storage device is, the greater the gain you can expect. The effect may be greater on Linux due to Transparent HugePage support.
288
+
285
289
### NUMA support
286
290
287
291
-`--numa distribute`: Pin an equal proportion of the threads to the cores on each NUMA node. This will spread the load amongst all cores on the system, utilitizing all memory channels at the expense of potentially requiring memory to travel over the slow links between nodes.
Copy file name to clipboardExpand all lines: examples/server/README.md
+1
Original file line number
Diff line number
Diff line change
@@ -34,6 +34,7 @@ The project is under active development, and we are [looking for feedback and co
34
34
-`-ub N`, `--ubatch-size N`: Physical maximum batch size. Default: `512`
35
35
-`--mlock`: Lock the model in memory, preventing it from being swapped out when memory-mapped.
36
36
-`--no-mmap`: Do not memory-map the model. By default, models are mapped into memory, which allows the system to load only the necessary parts of the model as needed.
37
+
-`--direct-io`: Use direct I/O. Potentially faster uncached loading, fewer pageouts, no page cache pollution.
37
38
-`--numa STRATEGY`: Attempt one of the below optimization strategies that may help on some NUMA systems
38
39
-`--numa distribute`: Spread execution evenly over all nodes
39
40
-`--numa isolate`: Only spawn threads on CPUs on the node that execution started on
0 commit comments