Skip to content

Commit 73d6af7

Browse files
Update 0.23.0 - OSS release
1 parent 111b2da commit 73d6af7

File tree

510 files changed

+71652
-2110
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

510 files changed

+71652
-2110
lines changed

.dockerignore

+3-2
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
docker
2-
**/.git
3-
llm_ptq/saved_models*
2+
examples/**/.git
3+
examples/llm_ptq/saved_models*
4+
**/experimental
45

56
##### Copied from .gitignore #####
67
# Byte-compiled / optimized / DLL files

.pre-commit-config.yaml

+145
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# NOTE: Make sure to update version in dev requirements (setup.py) as well!
2+
exclude: >
3+
(?x)^(
4+
experimental/.*|
5+
)$
6+
7+
repos:
8+
- repo: https://github.com/pre-commit/pre-commit-hooks
9+
rev: v4.6.0
10+
hooks:
11+
- id: trailing-whitespace
12+
- id: mixed-line-ending
13+
args: [--fix=lf]
14+
- id: end-of-file-fixer
15+
- id: check-merge-conflict
16+
- id: requirements-txt-fixer
17+
- id: debug-statements
18+
- id: check-json
19+
exclude: ^.vscode/.*.json # vscode files can take comments
20+
- id: check-yaml
21+
args: [--allow-multiple-documents]
22+
- id: check-toml
23+
- id: check-added-large-files
24+
args: [--maxkb=500, --enforce-all]
25+
exclude: >
26+
(?x)^(
27+
examples/diffusers/quantization/assets/.*.png|
28+
examples/diffusers/cache_diffusion/assets/.*.png|
29+
)$
30+
31+
- repo: https://github.com/executablebooks/mdformat
32+
rev: 0.7.17
33+
hooks:
34+
- id: mdformat
35+
36+
- repo: https://github.com/astral-sh/ruff-pre-commit
37+
rev: v0.6.4
38+
hooks:
39+
- id: ruff
40+
args: [--fix, --exit-non-zero-on-fix]
41+
- id: ruff-format
42+
43+
- repo: https://github.com/pre-commit/mirrors-mypy
44+
rev: v1.11.2
45+
hooks:
46+
- id: mypy
47+
48+
- repo: https://github.com/pre-commit/mirrors-clang-format
49+
rev: v16.0.4
50+
hooks:
51+
- id: clang-format
52+
types_or: [c++, c, c#, cuda, java, javascript, objective-c, proto] # no json!
53+
args: ["--style={ColumnLimit: 100}"]
54+
55+
- repo: https://github.com/pre-commit/pygrep-hooks
56+
rev: v1.10.0
57+
hooks:
58+
- id: rst-backticks
59+
- id: rst-directive-colons
60+
- id: rst-inline-touching-normal
61+
62+
- repo: https://github.com/jumanjihouse/pre-commit-hook-yamlfmt
63+
rev: 0.2.3
64+
hooks:
65+
- id: yamlfmt
66+
args: [--mapping=2, --sequence=4, --offset=2, --implicit_start, --implicit_end, --preserve-quotes]
67+
exclude: ^.github/workflows/
68+
69+
# Instructions to change license file if ever needed:
70+
# https://github.com/Lucas-C/pre-commit-hooks#removing-old-license-and-replacing-it-with-a-new-one
71+
- repo: https://github.com/Lucas-C/pre-commit-hooks
72+
rev: v1.5.5
73+
hooks:
74+
# Default hook for Apache 2.0 in core python files
75+
- id: insert-license
76+
alias: insert-license-py
77+
args:
78+
- --license-filepath
79+
- ./LICENSE
80+
- --comment-style
81+
- "#"
82+
- --allow-past-years
83+
types: [python]
84+
# NOTE: Exclude files that have copyright or license headers from another company or individual
85+
# since we want to keep those above the license header added by this hook.
86+
# Instead, we should manually add the license header to those files after the original header.
87+
exclude: >
88+
(?x)^(
89+
modelopt/onnx/quantization/operators.py|
90+
modelopt/onnx/quantization/ort_patching.py|
91+
modelopt/torch/export/transformer_engine.py|
92+
modelopt/torch/quantization/export_onnx.py|
93+
modelopt/torch/quantization/plugins/attention.py|
94+
modelopt/torch/speculative/plugins/transformers.py|
95+
modelopt/torch/speculative/eagle/utils.py|
96+
modelopt/torch/_deploy/utils/onnx_utils.py|
97+
examples/chained_optimizations/bert_prune_distill_quantize.py|
98+
examples/diffusers/quantization/onnx_utils/export.py|
99+
examples/diffusers/cache_diffusion/pipeline/models/sdxl.py|
100+
examples/llm_eval/gen_model_answer.py|
101+
examples/llm_eval/humaneval.py|
102+
examples/llm_eval/lm_eval_hf.py|
103+
examples/llm_eval/mmlu.py|
104+
examples/llm_eval/modeling.py|
105+
examples/llm_sparsity/finetune.py|
106+
examples/llm_qat/main.py|
107+
examples/speculative_decoding/main.py|
108+
examples/speculative_decoding/medusa_utils.py|
109+
examples/speculative_decoding/vllm_generate.py|
110+
)$
111+
112+
# Default hook for Apache 2.0 in core c/c++/cuda files
113+
- id: insert-license
114+
alias: insert-license-c
115+
args:
116+
- --license-filepath
117+
- ./LICENSE
118+
- --comment-style
119+
- "/*| *| */"
120+
- --allow-past-years
121+
types_or: [c++, cuda, c]
122+
123+
# Default hook for Apache 2.0 in shell files
124+
- id: insert-license
125+
alias: insert-license-sh
126+
args:
127+
- --license-filepath
128+
- ./LICENSE
129+
- --comment-style
130+
- "#"
131+
- --allow-past-years
132+
types_or: [shell]
133+
134+
- repo: https://github.com/keith/pre-commit-buildifier
135+
rev: 6.4.0
136+
hooks:
137+
- id: buildifier
138+
- id: buildifier-lint
139+
140+
- repo: https://github.com/PyCQA/bandit
141+
rev: 1.7.9
142+
hooks:
143+
- id: bandit
144+
args: ["-c", "pyproject.toml", "-q"]
145+
additional_dependencies: ["bandit[toml]"]

.vscode/extensions.json

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
{
2+
// See https://go.microsoft.com/fwlink/?LinkId=827846 to learn about workspace recommendations.
3+
// Extension identifier format: ${publisher}.${name}. Example: vscode.csharp
4+
// List of extensions which should be recommended for users of this workspace.
5+
"recommendations": [
6+
"ms-vscode.cpptools",
7+
"ms-azuretools.vscode-docker",
8+
"tamasfe.even-better-toml",
9+
"GitHub.copilot",
10+
"GitLab.gitlab-workflow",
11+
"eamodio.gitlens",
12+
"VisualStudioExptTeam.vscodeintellicode",
13+
"ms-toolsai.jupyter",
14+
"ms-python.vscode-pylance",
15+
"ms-python.python",
16+
"ms-vscode-remote.remote-ssh",
17+
"ms-vscode.remote-explorer",
18+
"charliermarsh.ruff",
19+
"redhat.vscode-yaml",
20+
],
21+
// List of extensions recommended by VS Code that should not be recommended for users of this workspace.
22+
"unwantedRecommendations": [
23+
"ms-python.black-formatter",
24+
"ms-python.mypy-type-checker",
25+
"ms-python.pylint",
26+
"ms-python.flake8",
27+
]
28+
}

.vscode/settings.json

+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
// VSCode workspace settings for modelopt
2+
{
3+
"editor.rulers": [
4+
100,
5+
120
6+
], // 100 for black auto-formatter, 120 for hard limit in ruff
7+
"[python]": {
8+
"editor.defaultFormatter": "charliermarsh.ruff",
9+
"editor.formatOnSave": true,
10+
"editor.codeActionsOnSave": {
11+
"source.fixAll": "explicit"
12+
},
13+
},
14+
"files.exclude": {
15+
"build": true,
16+
},
17+
"files.watcherExclude": {
18+
".ipynb_checkpoints": true,
19+
".mypy_cache": true,
20+
".pytest_cache": true,
21+
".ruff_cache": true,
22+
".tox": true,
23+
"**/__pycache__/**": true,
24+
"**/*.pyc": true,
25+
"**/runs": true,
26+
"build": true
27+
},
28+
"[yaml]": {
29+
"editor.defaultFormatter": "redhat.vscode-yaml",
30+
},
31+
"yaml.format.enable": true,
32+
"yaml.format.printWidth": 150,
33+
"yaml.format.bracketSpacing": false,
34+
"yaml.customTags": [
35+
"!reference sequence"
36+
],
37+
"python.testing.pytestEnabled": true,
38+
"python.testing.pytestArgs": [
39+
"./tests",
40+
"--no-cov",
41+
],
42+
"evenBetterToml.schema.enabled": false, // disable toml/json schema since we have custom fields
43+
}

CHANGELOG-Windows.rst

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
===================================
2+
Model Optimizer Changelog (Windows)
3+
===================================
4+
5+
0.19 (2024-11-18)
6+
^^^^^^^^^^^^^^^^^
7+
8+
**New Features**
9+
10+
- This is the first official release of TensorRT Model Optimizer for Windows
11+
- **ONNX INT4 Quantization:** :meth:`modelopt.onnx.quantization.quantize_int4 <modelopt.onnx.quantization.int4.quantize>` now supports ONNX INT4 quantization for DirectML and TensorRT* deployment. See :ref:`Support_Matrix` for details about supported features and models.
12+
- **LLM Quantization with Olive:** Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer `example <https://github.com/microsoft/Olive/tree/main/examples/phi3#quantize-models-with-nvidia-tensorrt-model-optimizer>`_
13+
- **DirectML Deployment Guide:** Added DML deployment guide. Refer :ref:`DirectML_Deployment`.
14+
- **MMLU Benchmark for Accuracy Evaluations:** Introduced `MMLU benchmarking <https://github.com/NVIDIA/TensorRT-Model-Optimizer/tree/main/examples/windows/accuracy_benchmark/README.md>`_ for accuracy evaluation of ONNX models on DirectML (DML).
15+
- **Published quantized ONNX models collection:** Published quantized ONNX models at HuggingFace `NVIDIA collections <https://huggingface.co/collections/nvidia/optimized-onnx-models-for-nvidia-rtx-gpus-67373fe7c006ebc1df310613>`_.
16+
17+
18+
\* *This version includes experimental features such as TensorRT deployment of ONNX INT4 models, PyTorch quantization and sparsity. These are currently unverified on Windows.*

0 commit comments

Comments
 (0)