Skip to content

Rebasing with msft commits #607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Mar 10, 2025
Merged

Rebasing with msft commits #607

merged 47 commits into from
Mar 10, 2025

Conversation

jatinwadhwa921
Copy link

Rebasing with msft commits

sushraja-msft and others added 30 commits February 28, 2025 08:02
### Description
This change fixes GQA for Flash Attention on Nvidia GPUs. The root cause
appears to be
`k_start + capped_sg_id < seq_causal_length`
check. This is either because, 
a. seq_causal_length varies per lane, so the check becomes non uniform
control flow, which is having interactions with subgroupShuffle.
or 
b. The check itself is incorrect and is wiping out values of v based on
the source lane's seq_causal_length. While in actualness values of v
need to be causal as per the lane that is going to multiply it with qkt.

qkt is already causal because earlier values of qk for out of bounds k
are set to min_value, and exp(<-4) are 0.

This fix works by removing that causal check and relying on the qk being
wiped out earlier. The documentation for causality behavior for GQA is
missing to determine which of this reason is the true reason.

Prior to this prompts with sequence length > 16 < 32 or 1k would break
with Phi 4 but smaller prompts would work.
Tested on Intel Alderlake, Nvidia 4070.
### Description
<!-- Describe your changes. -->
Supports creating a model programmatically using the ORT C or C++ API. 
Supports augmenting an existing model to add nodes.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Fixed a typo in function names related to the Upsample CUDA kernel.
Changed incorrect spelling Upample to Upsample across relevant
functions.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This change is necessary to maintain consistency and prevent potential
confusion caused by incorrect function names.
)

### Description
<!-- Describe your changes. -->
Fix typos in csharp/src/Microsoft.ML.OnnxRuntime/


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…oft#23788)

Change the logic to generate the default ep context file name

### Description
Applies to all EPs: replace the .onnx to _ctx.onnx, instead of directly append extra string _ctx.onnx to existing model path. In QNN EP, also make the context binary .bin file shorter by removing QNNExecutionProvider_ from the file name.
### Description
Make
[QNN_Nuget_Windows](https://aiinfra.visualstudio.com/Lotus/_build?definitionId=1234)1ES
compliant



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…osoft#23827)

### Description

Resolve microsoft#23817



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This PR fixes the errors in the ConvTranspose optimization and adds
tests to ensure the correctness of the implementation.
### Description
Fix a warning with std::move usage



### Motivation and Context
Possibly allow building without --compile_no_warning_as_error flag
To be compatible with the latest GSL library. Without this fix we will
get:

```
onnxruntime\core\providers\cpu\controlflow\loop.cc(247): error C4996: 'gsl::byte': Use std::byte instead.
```
### Description

#### Background

From code search, the following EPs use
`onnxruntime::GetCpuPreferredNodes()` in their `GetCapabilities()`
methods:
- CANN
- CUDA
- DML
- JS
- ROCM
- WebGPU

However, the source file that implements
`onnxruntime::GetCpuPreferredNodes()` is excluded when minimal build is
ON:
https://github.com/microsoft/onnxruntime/blob/6df0973e58ba5399fcaa98686f70ed9a9e59aaef/cmake/onnxruntime_framework.cmake#L38-L42

This means that all EPs mentioned above is not able to compile with
minimal build.

#### Solution

The excluded file `core/framework/fallback_cpu_capability.cc` cannot
build in minimal build because some of its dependencies are not included
in the minimal build. However, in extended minimal build mode, all
dependencies are available.

This PR looses the restrict and allows to compile this file when it is
extended minimal build. After this change, those EPs are able to compile
in extended minimal build.
### Description

Add `dawn` to ThirdPartyNotices.
…3702)

### Description
Enable QNN EP weight sharing generation using public API instead of internal interfaces, so that user can integrate into their own toolchain. The change is to share the QnnBackendManager across ORT sessions if ep.share_ep_contexts is enabled. And there is extra option to end the share so that we know when to remove the shared QnnBackendManager from the singleton.

Change the tool name from onnxruntime_qnn_ctx_gen to ep_weight_sharing_ctx_gen, so that it can be shared for other EPs.
…microsoft#23892)

### Description
When using the enable_htp_shared_memory feature, we see that the address
of the buffer passed to rpcmem_free is incorrect. So the rpc buffers are
not freed leading to memory exhaustion.

### Motivation and Context
When using the enable_htp_shared_memory_allocator feature for QNN in
GenAI extensions, it leads to inference failures during the second
prompt. As GenAI memory asks are higher, it surfaces sooner in gen AI
use cases.

Co-authored-by: Ashish Garg <[email protected]>
The build option --enable_pix_capture is broken. This fixes the problem.

---------

Co-authored-by: wp <[email protected]>
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…t#23887)

### Description
* Add dynamo export for Sam2 image encoder
* Verify fp32 onnx model with CPU EP (to avoid error message from TRT
EP).
* Update benchmark script:
  - output ORT profiling
- output torch compiled code and unique kernel name for compiled kernel
  - add an option for nightly package installation
  - uninstall existing ort packages before installing

The node metadata of dynamo exported model can help mapping node in onnx
model back to pytorch modeling script. Currently, the graph optimization
is not done on dynamo exported model, so it is experimental right now.

### Motivation and Context

To support profiling of torch compiled CUDA kernel.
### Description
This PR improves the workaround for bundlers in onnxruntime-web.
Specifically, the following changes have been made:

- Use [this
workaround](xenova@9c50aa2)
as suggested by @xenova in
huggingface/transformers.js#1161 (comment)

- Use `url > "file:" && url < "file;"` instead of
`url.startsWith("file:")` to allow minifiers to remove dead code
correctly.

This change allows to remove unnecessary dependencies of file parsed
from `new URL("ort.bundle.min.js", import.meta.url)` in Vite, and
optimize code like `if("file://filepath.js".startsWith("file:"))
{do_sth1(); } else {do_sth2();}` into `do_sth1()` for webpack/terser
usages.

Resolves huggingface/transformers.js#1161
)

### Description
This change restores the MatMulNBits workgroup size from (8, 8, 1) back
to (16, 8, 1) to resolve a performance regression observed on Intel
iGPUs during token generation (M=1).

### Motivation and Context
As above.

Signed-off-by: Jianhui Dai <[email protected]>
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
…icrosoft#23894)

Float16Array is now shipping and WebNN Chromium implementation has
accepted it. We should allow it in WebNN EP as well.
…icrosoft#23888)

### Description
CMake 4.0 release candidate 2.0 is available, and it cannot compile all
of OnnxRuntime out-of-the-box. There's portions of the OnnxRuntime
codebase that specify a `cmake_minimum_required` version of 3.0, and
CMake 4.0 has removed support for compatibility with CMake < 3.5 - the
following error is reported:

```
CMake Error at winml_sdk_helpers.cmake:4 (cmake_minimum_required):
  Compatibility with CMake < 3.5 has been removed from CMake.

  Update the VERSION argument <min> value.  Or, use the <min>...<max> syntax
  to tell CMake that the project requires at least <min> but has been updated
  to work with policies introduced by <max> or earlier.

  Or, add -DCMAKE_POLICY_VERSION_MINIMUM=3.5 to try configuring anyway.
```

Since CMake 3.5 appears to have shipped in 2016, it seems reasonable to
set that as a minimum version to fix the error. The root CMakeLists.txt
does ask for a minimum version of 3.28, so we could snap to that, but
I'm still ramping up on the build, so wanted to propose a minimally
sufficient fix.

### Motivation and Context
Being able to build with the latest CMake - when it ships - reduces the
barrier to entry to building OnnxRuntime, and allows the OnnxRuntime to
leverage the latest and greatest tooling.
…icrosoft#23898)

This PR removes the deprecated subgroups-f16 from WebGPU native and JS
EP, and also remove the unused deviceInfo in WebGPU JS EP.
### Description
Fixed an error softmax dispatch



### Motivation and Context
Produce expected results for LlaMA model
### Description

This PR is the first step for migrating the webgpu backend of
onnxruntime-web from JSEP based to WebGPU EP based.

In this change, we enable building WebGPU EP in a wasm build (ie.
`--build_wasm` `--use_webgpu` `--use_jsep`). However, the old build
flags should still keep previous behavior.
### Description
<!-- Describe your changes. -->

Enable an OpenVINO Windows CI pipeline. This includes:
- Downloading the OpenVINO toolkit for Windows from an external source.
- Setting up OpenVINO environment variables.
- Building the ONNX Runtime OpenVINO Execution Provider.
- Running unit tests.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

This change is required to run checks on precommit and commit in the
ONNX Runtime project. It ensures that the code is tested with the
OpenVINO toolkit on Windows, improving the reliability and compatibility
of the project.
### Description
<!-- Describe your changes. -->
Use 'desktop only' solution in GPU C# packaging builds. We don't need to
include any MAUI support for those builds.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
sushraja-msft and others added 17 commits March 6, 2025 17:44
…23907)

### Description
Simple change 
1. The DP4A shader actually supports all block sizes that are multiples
of 32, relaxing the restriction and making a small tweak to support
sizes other than 32.
2. Moved the shader to a separate file for maintainability.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
### Description
<!-- Describe your changes. -->
Add example of a custom op that is required to do type inference for the
output type for the model load to work.
Also acts as an example of how to override an ONNX op with a custom
implementation.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
microsoft#23891
There are some requirements to modify the graph which are specific to
the EP/hardware.
ORT has the hardcoded EP list for optimizations but that can't scale and
it's hard be extended to enable EP custom optimizations.

Here is the prototype to enable L2+ optimizations for EPs (The original
overview is provided by @skottmckay) as well as the TRT EP
implementation for the ConstantFoldingDQ optimization.

Signatures for selection and optimization functions:
````
  - Selection: std::function<std::vector<std::unique_ptr<ComputeCapability>>(const GraphViewer&, const KeyValueConfig&)>
  - Optimization: std::function<Status(const Graph&, const ComputeCapability& this_optimization, ComputeCapability& cc_to_update)>
````
GetCapability

- call (new) provider bridge API to lookup pre-defined optimizer by name
and get selection function
- ComputeCapability.optimize_func, i.e. optimization function, would be
set by the optimizer to the function that does the optimization

- EP has to update the returning ComputeCapability to include the
optimization ComputeCapability in nodes_to_optimize. So that later ORT
can perform optimization/transformation accordingly.

GraphPartitioner

- After assigning the ComputeCapability to the EP and prior to Compile,
if the ComputeCapability has nodes_to_optimize, iterate that list
  - optimization function needs to be called with
    - a mutable Graph instance
    - the ComputeCapability for the individual optimization
    - the overall ComputeCapability so it can be updated
### Description
<!-- Describe your changes. -->
Fix ConvInteger handling of optional inputs. Need to check Exists() and
not just the number of inputs.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
microsoft#23927
### Description
This PR updates the OpenVINO version used in the pipeline from 2024.5.0
to 2025.0.0

Co-authored-by: jatinwadhwa921 <[email protected]>
### Description
In BE system, model tensor data coming from external file is not handled
properly.
This was found during the debugging of
(microsoft/onnxruntime-genai#1104)

This PR changes do the endianness conversion of data loaded from
external file in BE system.
…soft#23926)

### Description

`gsl::narrow` does not work in no exception build.
- use `onnxruntime::narrow` if necessary;
- or change to `static_cast` if it's obviously safe.

also apply the changes to usage of `gsl::narrow_cast`, which does not
apply checks.
### Description
1. Set  VCPKG_OSX_DEPLOYMENT_TARGET for macOS targets
2. Enable VCPKG in more pipelines.
…kg (microsoft#23946)

### Description
Allow using a different version of flatbuffers when building with vcpkg,
so that users do not need to pin flatbuffer's version, which provides
more flexibility in the build process.

Delete utf8_range from the dependencies, because it is an indirect
dependency of protobuf, which is already included in the build process.
### Motivation and Context
### Description
Make [Python packaging
pipeline](https://aiinfra.visualstudio.com/530acbc4-21bc-487d-8cd8-348ff451d2ff/_build?definitionId=841)
1ES compliant



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Checklist

- [x] Make Onnxruntime-QNNEP-Windows-2022-CPU stateless
…osoft.ML.OnnxRuntime.FasterRcnnSample (microsoft#23924)

Bumps [SixLabors.ImageSharp](https://github.com/SixLabors/ImageSharp)
from 2.1.9 to 2.1.10.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/SixLabors/ImageSharp/releases">SixLabors.ImageSharp's
releases</a>.</em></p>
<blockquote>
<h2>v2.1.10</h2>
<h2>What's Changed</h2>
<ul>
<li>Backport <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2859">#2859</a>
to release/2.1.x by <a
href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in
<a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2890">SixLabors/ImageSharp#2890</a></li>
<li>Backport <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2701">#2701</a>
to 2.1.x [copy] by <a
href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in
<a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2891">SixLabors/ImageSharp#2891</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10">https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/d133ef99e8becfc3b924b0bb4315e63b8681d307"><code>d133ef9</code></a>
Set lang version</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/5dfe5a800367581239de442cc18de659da6e9b1d"><code>5dfe5a8</code></a>
Missed cache action update</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/4d3a85112b03c89d2cb8616a5b747684b6e73730"><code>4d3a851</code></a>
Use latest cache action</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/4cb9f40a722ab2b837157862f0320c6a652da4d0"><code>4cb9f40</code></a>
Merge pull request <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2891">#2891</a>
from SixLabors/af/backport-2701</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/bb82f79db0197166271d4355b5fb5ceda370a906"><code>bb82f79</code></a>
<a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2701">#2701</a>
to 2.1.x [copy]</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/627b5f721f30f6d529acb50bd81f92bd3db754eb"><code>627b5f7</code></a>
Merge pull request <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2890">#2890</a>
from SixLabors/af/backport-2859</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/67f7848d6e975e7956c8056823555de49a5fdf6d"><code>67f7848</code></a>
try to fix LFS for *.BMP</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/44d294e06606111195152ead3006452357ef1bb9"><code>44d294e</code></a>
8.0.x is not needed</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/adb85d9e66aa3a588a86f4a4ef9a0539a8502117"><code>adb85d9</code></a>
Another attempt for a Linux-specific skip</li>
<li><a
href="https://github.com/SixLabors/ImageSharp/commit/efc3fc4ee15eec4e523c26f7130e786541b00df2"><code>efc3fc4</code></a>
Disable BmpDecoder_CanDecode_Os2BitmapArray on Linux</li>
<li>Additional commits viewable in <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.9...v2.1.10">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=SixLabors.ImageSharp&package-manager=nuget&previous-version=2.1.9&new-version=2.1.10)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@jatinwadhwa921 jatinwadhwa921 requested a review from ankitm3k March 10, 2025 06:40
@ankitm3k
Copy link

lgtm

@ankitm3k ankitm3k merged commit a6cdf62 into ovep-develop Mar 10, 2025
6 of 12 checks passed
ankitm3k added a commit that referenced this pull request Mar 10, 2025
ankitm3k added a commit that referenced this pull request Mar 10, 2025
Revert "Rebasing with msft commits"
jatinwadhwa921 added a commit that referenced this pull request Mar 10, 2025
This reverts commit 920ed58, reversing
changes made to a6cdf62.
jatinwadhwa921 added a commit that referenced this pull request Mar 10, 2025
This reverts commit 920ed58, reversing
changes made to a6cdf62.
@jatinwadhwa921 jatinwadhwa921 deleted the sync_msft_10_3_25 branch March 13, 2025 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.