File tree
12 files changed
+77
-17
lines changed- all_models
- gpt/postprocessing
- inflight_batcher_llm
- postprocessing
- tensorrt_llm
- 1
- dockerfile
- inflight_batcher_llm
- client
- src
- tools
12 files changed
+77
-17
lines changedOriginal file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
1 | 1 |
| |
2 | 2 |
| |
3 | 3 |
| |
| 4 | + | |
4 | 5 |
| |
5 | 6 |
| |
6 | 7 |
| |
|
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
27 | 27 |
| |
28 | 28 |
| |
29 | 29 |
| |
| 30 | + | |
30 | 31 |
| |
31 | 32 |
| |
32 | 33 |
| |
|
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
175 | 175 |
| |
176 | 176 |
| |
177 | 177 |
| |
178 |
| - | |
179 |
| - | |
180 |
| - | |
181 |
| - | |
| 178 | + | |
182 | 179 |
| |
183 | 180 |
| |
184 | 181 |
| |
185 | 182 |
| |
186 | 183 |
| |
187 | 184 |
| |
188 | 185 |
| |
189 |
| - | |
190 | 186 |
| |
191 | 187 |
| |
192 | 188 |
| |
| |||
312 | 308 |
| |
313 | 309 |
| |
314 | 310 |
| |
315 |
| - | |
316 |
| - | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
317 | 323 |
| |
318 | 324 |
| |
319 | 325 |
| |
|
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
253 | 253 |
| |
254 | 254 |
| |
255 | 255 |
| |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
256 | 263 |
| |
257 | 264 |
| |
258 | 265 |
| |
|
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
42 | 42 |
| |
43 | 43 |
| |
44 | 44 |
| |
| 45 | + | |
45 | 46 |
| |
46 | 47 |
| |
47 | 48 |
|
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
8 | 8 |
| |
9 | 9 |
| |
10 | 10 |
| |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
11 | 25 |
| |
12 | 26 |
| |
13 | 27 |
| |
| |||
20 | 34 |
| |
21 | 35 |
| |
22 | 36 |
| |
23 |
| - | |
24 | 37 |
| |
25 | 38 |
| |
26 | 39 |
| |
| |||
76 | 89 |
| |
77 | 90 |
| |
78 | 91 |
| |
79 |
| - | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
80 | 98 |
| |
81 | 99 |
| |
82 | 100 |
| |
|
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
123 | 123 |
| |
124 | 124 |
| |
125 | 125 |
| |
126 |
| - | |
| 126 | + | |
| 127 | + | |
127 | 128 |
| |
128 | 129 |
| |
129 | 130 |
| |
| |||
185 | 186 |
| |
186 | 187 |
| |
187 | 188 |
| |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
188 | 193 |
| |
189 | 194 |
| |
190 | 195 |
| |
| |||
665 | 670 |
| |
666 | 671 |
| |
667 | 672 |
| |
| 673 | + | |
| 674 | + | |
| 675 | + | |
| 676 | + | |
| 677 | + | |
668 | 678 |
| |
669 | 679 |
| |
670 | 680 |
| |
| |||
690 | 700 |
| |
691 | 701 |
| |
692 | 702 |
| |
693 |
| - | |
| 703 | + | |
694 | 704 |
| |
695 | 705 |
| |
696 | 706 |
| |
|
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
211 | 211 |
| |
212 | 212 |
| |
213 | 213 |
| |
| 214 | + | |
| 215 | + | |
| 216 | + | |
214 | 217 |
| |
215 | 218 |
| |
216 | 219 |
| |
| |||
978 | 981 |
| |
979 | 982 |
| |
980 | 983 |
| |
981 |
| - | |
| 984 | + | |
982 | 985 |
| |
983 | 986 |
| |
984 | 987 |
| |
|
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
535 | 535 |
| |
536 | 536 |
| |
537 | 537 |
| |
538 |
| - | |
539 | 538 |
| |
540 | 539 |
| |
541 | 540 |
| |
| |||
628 | 627 |
| |
629 | 628 |
| |
630 | 629 |
| |
631 |
| - | |
| 630 | + | |
632 | 631 |
| |
633 | 632 |
| |
634 | 633 |
| |
| |||
644 | 643 |
| |
645 | 644 |
| |
646 | 645 |
| |
647 |
| - | |
| 646 | + | |
| 647 | + | |
| 648 | + | |
| 649 | + | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
648 | 660 |
| |
649 | 661 |
| |
650 | 662 |
| |
|
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
62 | 62 |
| |
63 | 63 |
| |
64 | 64 |
| |
| 65 | + | |
65 | 66 |
| |
66 | 67 |
| |
67 | 68 |
| |
|
Submodule tensorrt_llm updated 98 files
- README.md+12-6
- cpp/include/tensorrt_llm/batch_manager/llmRequest.h+30-33
- cpp/tensorrt_llm/batch_manager/aarch64-linux-gnu/libtensorrt_llm_batch_manager_static.a+2-2
- cpp/tensorrt_llm/batch_manager/aarch64-linux-gnu/libtensorrt_llm_batch_manager_static.pre_cxx11.a+2-2
- cpp/tensorrt_llm/batch_manager/aarch64-linux-gnu/version.txt+3-3
- cpp/tensorrt_llm/batch_manager/x86_64-linux-gnu/libtensorrt_llm_batch_manager_static.a+2-2
- cpp/tensorrt_llm/batch_manager/x86_64-linux-gnu/libtensorrt_llm_batch_manager_static.pre_cxx11.a+2-2
- cpp/tensorrt_llm/batch_manager/x86_64-linux-gnu/version.txt+3-3
- cpp/tensorrt_llm/batch_manager/x86_64-windows-msvc/tensorrt_llm_batch_manager_static.lib+2-2
- cpp/tensorrt_llm/batch_manager/x86_64-windows-msvc/version.txt+2-2
- cpp/tensorrt_llm/common/mpiUtils.cpp+21-22
- cpp/tensorrt_llm/executor/aarch64-linux-gnu/libtensorrt_llm_executor_static.a+2-2
- cpp/tensorrt_llm/executor/aarch64-linux-gnu/libtensorrt_llm_executor_static.pre_cxx11.a+2-2
- cpp/tensorrt_llm/executor/aarch64-linux-gnu/version.txt+3-3
- cpp/tensorrt_llm/executor/x86_64-linux-gnu/libtensorrt_llm_executor_static.a+2-2
- cpp/tensorrt_llm/executor/x86_64-linux-gnu/libtensorrt_llm_executor_static.pre_cxx11.a+2-2
- cpp/tensorrt_llm/executor/x86_64-linux-gnu/version.txt+3-3
- cpp/tensorrt_llm/executor/x86_64-windows-msvc/tensorrt_llm_executor_static.lib+2-2
- cpp/tensorrt_llm/executor/x86_64-windows-msvc/version.txt+2-2
- cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/aarch64-linux-gnu/version.txt+1-1
- cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/x86_64-linux-gnu/version.txt+1-1
- cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/x86_64-windows-msvc/tensorrt_llm_nvrtc_wrapper.dll+1-1
- cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/x86_64-windows-msvc/tensorrt_llm_nvrtc_wrapper.lib+1-1
- cpp/tensorrt_llm/kernels/decoderMaskedMultiheadAttention/decoderXQAImplJIT/nvrtcWrapper/x86_64-windows-msvc/version.txt+3-3
- cpp/tensorrt_llm/kernels/internal_cutlass_kernels/aarch64-linux-gnu/libtensorrt_llm_internal_cutlass_kernels_static.a+1-1
- cpp/tensorrt_llm/kernels/internal_cutlass_kernels/aarch64-linux-gnu/libtensorrt_llm_internal_cutlass_kernels_static.pre_cxx11.a+1-1
- cpp/tensorrt_llm/kernels/internal_cutlass_kernels/aarch64-linux-gnu/version.txt+3-3
- cpp/tensorrt_llm/kernels/internal_cutlass_kernels/x86_64-linux-gnu/libtensorrt_llm_internal_cutlass_kernels_static.a+1-1
- cpp/tensorrt_llm/kernels/internal_cutlass_kernels/x86_64-linux-gnu/libtensorrt_llm_internal_cutlass_kernels_static.pre_cxx11.a+1-1
- cpp/tensorrt_llm/kernels/internal_cutlass_kernels/x86_64-linux-gnu/version.txt+3-3
- cpp/tensorrt_llm/kernels/internal_cutlass_kernels/x86_64-windows-msvc/tensorrt_llm_internal_cutlass_kernels_static.lib+2-2
- cpp/tensorrt_llm/kernels/internal_cutlass_kernels/x86_64-windows-msvc/version.txt+2-2
- cpp/tensorrt_llm/layers/lookaheadDecodingUtils.h+82-6
- cpp/tensorrt_llm/pybind/bindings.cpp+7-7
- cpp/tensorrt_llm/runtime/lookaheadBuffers.cpp+1-1
- cpp/tensorrt_llm/runtime/medusaModule.cpp+1-1
- cpp/tests/CMakeLists.txt+8
- cpp/tests/kernels/mixtureOfExpertsTest.cu+114-297
- cpp/tests/resources/scripts/case_report_wrapper.py+42
- cpp/tests/resources/scripts/test_cpp.py+51-1
- docs/source/advanced/executor.md
- docs/source/advanced/kv-cache-reuse.md+2
- docs/source/advanced/speculative-decoding.md+9-7
- docs/source/architecture/core-concepts.md+2-2
- docs/source/index.rst+3
- docs/source/release-notes.md+2-2
- examples/baichuan/requirements.txt+1-1
- examples/bloom/requirements.txt+1-1
- examples/chatglm/requirements.txt+1-1
- examples/dbrx/requirements.txt+1-1
- examples/deepseek_v1/README.md+77
- examples/deepseek_v1/__init__.py+14
- examples/deepseek_v1/convert_checkpoint.py+215
- examples/deepseek_v1/requirements.txt+5
- examples/falcon/requirements.txt+1-1
- examples/gemma/requirements.txt+1-1
- examples/gpt/requirements.txt+1-1
- examples/gptj/requirements.txt+1-1
- examples/gptneox/requirements.txt+1-1
- examples/grok/requirements.txt+1-1
- examples/internlm/requirements.txt+1-1
- examples/jais/requirements.txt+1-1
- examples/llama/requirements.txt+1-1
- examples/llm-api/requirements.txt+1-1
- examples/mamba/requirements.txt+1-1
- examples/medusa/requirements.txt+1-1
- examples/mixtral/requirements.txt+1-1
- examples/mpt/requirements.txt+1-1
- examples/nemotron/requirements.txt+1-1
- examples/opt/requirements.txt+1-1
- examples/phi/requirements.txt+1-1
- examples/quantization/requirements.txt+1-1
- examples/qwen/requirements.txt+1-1
- examples/qwenvl/requirements.txt+1-1
- examples/recurrentgemma/requirements.txt+1-1
- examples/redrafter/requirements.txt+1-1
- examples/skywork/requirements.txt+1-1
- examples/smaug/requirements.txt+1-1
- examples/whisper/requirements.txt+1-1
- requirements-dev.txt+1
- tensorrt_llm/commands/build.py+1-1
- tensorrt_llm/functional.py+18-18
- tensorrt_llm/layers/__init__.py+2-1
- tensorrt_llm/layers/embedding.py+1-6
- tensorrt_llm/layers/moe.py+51
- tensorrt_llm/models/__init__.py+3
- tensorrt_llm/models/deepseek_v1/__init__.py+14
- tensorrt_llm/models/deepseek_v1/convert.py+361
- tensorrt_llm/models/deepseek_v1/model.py+257
- tensorrt_llm/models/llama/model.py+1-2
- tensorrt_llm/models/model_weights_loader.py+41-8
- tensorrt_llm/models/modeling_utils.py+3
- tensorrt_llm/models/qwen/model.py+8-1
- tensorrt_llm/module.py+61
- tensorrt_llm/version.py+1-1
- tests/bindings/test_bindings_ut.py+1-1
- tests/conftest.py+106
- tests/test_module.py+2
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
1 |
| - | |
| 1 | + |
0 commit comments