Move Linux GPU CI pipeline to A10 (microsoft#23235)

snnn · tarekziade · commit 785ae21632fc · 2025-01-10T15:33:40.000+01:00
Move Linux GPU CI pipeline to A10 machines which are more advanced.
Retire onnxruntime-Linux-GPU-T4 machine pool.
Disable run_lean_attention test because the new machines do not have
enough shared memory.

```
skip loading trt attention kernel fmha_mhca_fp16_128_256_sm86_kernel because no enough shared memory
[E:onnxruntime:, sequential_executor.cc:505 ExecuteKernel] Non-zero status code returned while running MultiHeadAttention node. Name:'MultiHeadAttention_0' Status Message: CUDA error cudaErrorInvalidValue:invalid argument
```
diff --git a/onnxruntime/test/python/transformers/test_mha.py b/onnxruntime/test/python/transformers/test_mha.py
@@ -892,7 +892,7 @@ def test_all(self):
         # Run tests sequentially to avoid out of memory issue.
         self.run_mha_cpu()
         self.run_mha_cuda()
-        self.run_lean_attention()
+        # self.run_lean_attention()
         self.run_mha_cuda_multi_threading_default()
         self.run_mha_cuda_multi_threading_cudnn()
         self.run_mha_cuda_multi_threading_efficient()
diff --git a/tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline.yml b/tools/ci_build/github/azure-pipelines/linux-gpu-ci-pipeline.yml
@@ -137,7 +137,7 @@ stages:
       skipComponentGovernanceDetection: true
     workspace:
       clean: all
-    pool: onnxruntime-Linux-GPU-T4
+    pool: Onnxruntime-Linux-A10-24G
     steps:
     - checkout: self
       clean: true
diff --git a/tools/ci_build/github/linux/build_cuda_ci.sh b/tools/ci_build/github/linux/build_cuda_ci.sh
@@ -21,7 +21,7 @@ BUILD_ARGS=('--config'
             "--enable_pybind"
             "--build_java"
             "--cmake_extra_defines"
-            "CMAKE_CUDA_ARCHITECTURES=75"
+            "CMAKE_CUDA_ARCHITECTURES=86"
             "onnxruntime_BUILD_UNIT_TESTS=ON"
             "onnxruntime_ENABLE_CUDA_EP_INTERNAL_TESTS=ON")
 if [ -x "$(command -v ninja)" ]; then