Skip to content

Longvec mega change #7188

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 46 commits into
base: main
Choose a base branch
from
Draft

Longvec mega change #7188

wants to merge 46 commits into from

Conversation

pow2clk
Copy link
Member

@pow2clk pow2clk commented Mar 10, 2025

A collection of incompletely tested and probably buggy implementations of long/native vector functionality.

pow2clk and others added 30 commits February 17, 2025 21:49
Remove errors in Sema diagnostics for vectors longer than 4 in 6.9.
Test for failures using long vectors in unspported contexts and for correct codegen in
supported contexts. Verify errors persist in pre-6.9 shader models

The type buffer cache expects a max vector size of 4. By just skipping the cache for longer vectors, we don't overrun and store float7 vectors in the double3 slot or retrieve the double3 in place of float7.

Testing is for acceptance, mangling and basic copying that takes place
at the high level to ensure they are being accepted and recognized
correctly. The intent is not to tully test the passing of data as that
requires enabling vector operations to do properly. This test is used to
verify that these same constructs are disallowed in 6.8 and earlier.

A separate test verifies that disallowed contexts produce the
appropriate errors

Fixes microsoft#7117
Disallow long vectors, and arrays or structs containing long vectors in
cbuffers, entry functions, node records, tessellation patchs, or special intrinsic parameters with
user-defined struct parameters.
Expand resource attribute to all resource types by adding reskind and
resclass arguments indicating the specific resource type. Change detection
in HlslTypes to use these attribute arguments. Similarly add vertex
number arguments to output stream attribute and a boolean indicator of
input or output for tessellation patches.

Add geomstream attr to detect those objects

Use attribute to detect tesselation patches
Removes template arg counts and startswith stirngs to identify
tesslations patches and distinguish them from multisampled textures
Add setting for max vec size.

Determine long vector presence using DefinitionData bit?
OR
Rename testing for long vecs function?

Add attribute for geometry streams, produce and test errors for long vectors there.

Add and test errors for > 1024 element vectors.

Add vector size to error messages

good test changes
Go for consistent test filename formatting. most LLVM tests have dashes,
so dashes it is. Remove redundant sm68 test
Expand existing tests to different target and contexts. Add thorough
testing for geometry streams and tessellation patches.

Add toolong vector test. Verify that vectors that are over the maximum
for 6.9 fail.

Add subobjects and template classes to tests. These are unfortunately
disabled because the code to make them work causes other tests to fail.
Use RequireCompleteType to force specialization of templates encountered
in global and other scopes where finding long vectors is necessary where
possible. This populates the definitiondata which contains the base
class chain needed to detect when a base class has disqualifying long
vectors. It was also needed to detect when dependent types in a template
class result in long vectors.

Work graph node types didn't check their base classes for failures. This
affects base classes with longvectors that have sub classes used for
node objects which should fail for having long vector members.

Respond to feedback about iterating through fields in clunky manner
which got left out of the last reviewer feedback response
I guess it was about time. Should simplify some things later as well as at present and it was too easy to not do. Specifically, I was going to need to add another string check to the template instantiation code to identify longvectors. This is cleaner.

Incidentally convert another feedback texture string check to use attribs.

Incidentally resort the recently-added attribs to not break up the node shader attribs.
Vector types can be cached in a 2D array that has a column for lenghts 1-4. This uses the added contant to indicate the length and for the checks that confirm it isn't exceeded.
By setting the bit when the vector template is instantiated and then propagating it when members, be they standard members or base classes, the bit will be set correctly for any struct or struct-like type. For arrays, the arrays are pealed away in a utility function to get at the elements.

Decided to separate the check for completeness from the check for long vectors. Even though the latter almost always requires the former, they are separate concepts and embedding the first in the second would be unexpected
Output Streams, Tessellation patches, and global variables should be complete when receiving other correctness checks. If they cannot be made complete, they should produce an error. This was omitted for various of these including non-template globals, which was fine, but it meant that redundant errors were produced for templates, but not standard globals likely just because that was what was tested. This removes that distinction and adds testing for all of the above to the existing incomplete-type.hlsl test.
remove some stale elements. Add some HLSL type helper functions and add some new ones. Make resource type retreiveals type-safe. Add some parameter comments and names to make clearer what the effect of them are. Pass resource attribute to cbuffer/tbuffer creation. Clean up and clarify error messages. Remove redundant type canonization from type queries. Correct resclass of tbuffers. Use multimatch utility of verify to condense checks
Disables various forms of scalarization and vector elimination to permit
vectors to pass through to final DXIL when used in native LLVM
operations and loading/storing.

Introduces a few vector manipulation llvm instructions to DXIL allowing
for them to appear in output DXIL.

Skips passes for 6.9 that scalarize, convert to arrays, or otherwise eliminate vectors.
This eliminates the element-by-element loading of the vectors
In many cases, this required plumbing the shader model information to
passes that didn't have it before.

Many changes were needed for the MatrixBitcastLower pass related to
linking to avoid converting matrix vectors, but also to perform the
conversion if a shader was compiled for 6.9+, but then linked to a
earlier target.
This now adapts to the linker target to either preserve vectors for 6.9 or arrays for previous versions.
This requires running the DynamicIndexing VectorToArray pass during linking since 6_x and 6_9+ will fail to run this in the initial compile, but will still need to lower vectors to arrays.

Ternary conditional/select operators were element extracted in codegen.
Removing this allows 6.9 to preserve the vectors, but also maintains
behavior for previous shader models because the operations get
scalarized later anyway.

Keep groupshared variables as vectors for 6.9. They are no longer represented as indivual groupshared scalars.

Adds extensive tests for these operations using different types and
sizes and testing them appropriately. Booleans produce significantly
different code, so they get their own test.

Fixes microsoft#7123
Disentangles the raw buffer lowering implmentation into an isolated
function. Alters the various places that lowering took place to call
into the common function. This function will be expanded to handle other
lowering later.

When raw buffers use a templated load with a struct, they reuse the
subscript path also used for subscripted structured buffers. Such loads
with structs containing vectors or matrices will invoke the load
lowering from within this recursive call that traverses GEPs and other
users of the original call to set up correct offsets etc.

This adapts that code to use the common load lowering that enables long
vectors within structs to be correctly loaded.

Since the code expects byte address buffers, it is not (yet) adapted to
structured buffers, so those code paths are kept as they were.

This requires the ability to override the type used by the resloadhelper
explicitly, so a member is added to accomodate the matrices vector
representation that doesn't match the types of the load call.

This also requires removing the bufIdx and offset swapping that was
done, confusingly throughout the TranslateStructBufSubscriptUser code to
account for the fact that byte address buffers have to represent offsets
using the main coord parameter in favor of passing the Resource Kind
down such that the right parameter can receive the incrementation when
necessary for longer types such as matrices. This is enabled also by
adding ResKind appropriate offset calculation in the ResLoadHelper.
ResLoadHelper also gets an opcode set based on the ResKind for both
overaloads in preparation for further expansion to different resource
kinds.
the default offset behavior for non-call instructions was different in every location so making it explicit and letting each call location calculate it as appropriate cleaned things up. It turns out the only case of matrix loading where the type was thought to be different from the type of the value to replace was always the same, so the type member is removed. The only case where offset calculation was different for the call instruction constructor was also where the replaced instruction needed to be explicit, so the boolean parameter was replaced by the explicit replacement instruction which sets that boolean and explicitly sets the replaced instruction to either the matrix load instruction or a load instruction of the result of a subscript operator.
making the translateresourcebuffer call into the correct buffer
lowererer

Adapt resloadhelper to set miplevel correctly for MS textures

Rename some typed buffer load utility functions. With recent changes to
load lowering, these function names are misleading implying that they are used for loads instead of just subscript operators or that they are used more broadly than just for typed buffers.

Add testing for texture load/stores.

move arg collection to separate function to better enable iteration control and handle the complicated typed buf arguments
Clarify some naming and reduce some redundancy between GenerateRawBufLd
and TranslateBufLoad. Part of this involved passing the correct vector
of i32 type for loading boolean vectors in sm69 raw buffer loads.
Add a new native vector overload type to DXIL intrinsics and the corresponding generation.
Add new raw buffer vector load/store intrinsics that use that overload type.
When the loaded/stored type is a vector of more than 1 element, the
shader model is 6.9 or higher, and the operation is on a raw buffer,
enable the generation of a native vector raw buffer load or store.
Added structured buffer support to TranslateStore and used it for all
such lowerings.
add vector overload type and apply to the relevant builtins

for 6.9, don't lower vectors for exp lowering.

ugly fix for opcode reordering

note that still need to clean up loading from BABuffers too
pow2clk added 5 commits March 10, 2025 07:53
Preliminary groupshared support

Just adds groupshared to the test and performs the switch to CS to allow it. Additionally required storing output to a buffer, which was something that needed testing anyway.

keep groupshared as vectors for 6.9

They are no longer represented as inidivual groupshared scalars, but they are still retrived one single element at a time. I'm not sure we have another way to do it just yet.
Support dot product on long vecs by expanding the inrinsic into
mul/mad ops like is done with integer dot products
Since the or() and and() intrinsics did their own scalarization, the or/and operators would never be applied to full vectors. This leaves the scalarization for the scalarization pass, which will skip it for 6.9
Copy link
Contributor

github-actions bot commented Mar 10, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 4d3a2f5489fd9f438f13b2308e767a93882d4728 5674a1848801ac94fe473fe5d1512920ebbc612b -- include/dxc/DXIL/DxilConstants.h include/dxc/DXIL/DxilInstructions.h include/dxc/DXIL/DxilOperations.h include/dxc/HlslIntrinsicOp.h lib/DXIL/DxilOperations.cpp lib/DXIL/DxilUtil.cpp lib/DxilValidation/DxilValidation.cpp lib/HLSL/DxilLinker.cpp lib/HLSL/HLMatrixBitcastLowerPass.cpp lib/HLSL/HLOperationLower.cpp lib/Transforms/Scalar/DxilEliminateVector.cpp lib/Transforms/Scalar/LowerTypePasses.cpp lib/Transforms/Scalar/ScalarReplAggregatesHLSL.cpp lib/Transforms/Scalar/Scalarizer.cpp tools/clang/include/clang/AST/DeclCXX.h tools/clang/include/clang/AST/HlslTypes.h tools/clang/include/clang/Basic/LangOptions.h tools/clang/include/clang/Sema/SemaHLSL.h tools/clang/lib/AST/ASTContextHLSL.cpp tools/clang/lib/AST/DeclCXX.cpp tools/clang/lib/AST/HlslTypes.cpp tools/clang/lib/CodeGen/CGExprScalar.cpp tools/clang/lib/Sema/SemaDXR.cpp tools/clang/lib/Sema/SemaHLSL.cpp tools/clang/lib/Sema/SemaHLSLDiagnoseTU.cpp tools/clang/lib/Sema/SemaTemplateInstantiate.cpp tools/clang/tools/dxcompiler/dxcompilerobj.cpp tools/clang/unittests/HLSL/LinkerTest.cpp
View the diff from clang-format here.
diff --git a/lib/DxilValidation/DxilValidation.cpp b/lib/DxilValidation/DxilValidation.cpp
index 9c93b70c..b7c0da8a 100644
--- a/lib/DxilValidation/DxilValidation.cpp
+++ b/lib/DxilValidation/DxilValidation.cpp
@@ -2119,10 +2119,10 @@ static bool IsDxilBuiltinStructType(StructType *ST, hlsl::OP *hlslOP) {
   case 4:
   case 8: // 2 for doubles, 8 for halfs.
     return ST == hlslOP->GetCBufferRetType(EltTy);
-  break;
+    break;
   case 5:
     return ST == hlslOP->GetResRetType(EltTy);
-  break;
+    break;
   default:
     return false;
   }
diff --git a/lib/HLSL/HLOperationLower.cpp b/lib/HLSL/HLOperationLower.cpp
index d218136b..463433c4 100644
--- a/lib/HLSL/HLOperationLower.cpp
+++ b/lib/HLSL/HLOperationLower.cpp
@@ -481,13 +481,12 @@ Value *TrivialDxilOperation(OP::OpCode opcode, ArrayRef<Value *> refArgs,
   return TrivialDxilOperation(opcode, refArgs, Ty, Inst->getType(), hlslOP, B);
 }
 
-
 Value *TrivialDxilVectorOperation(Function *dxilFunc, OP::OpCode opcode,
-                            ArrayRef<Value *> refArgs, Type *Ty,
-                            OP *hlslOP, IRBuilder<> &Builder) {
+                                  ArrayRef<Value *> refArgs, Type *Ty,
+                                  OP *hlslOP, IRBuilder<> &Builder) {
   if (!Ty->isVoidTy()) {
     Value *retVal =
-      Builder.CreateCall(dxilFunc, refArgs, hlslOP->GetOpCodeName(opcode));
+        Builder.CreateCall(dxilFunc, refArgs, hlslOP->GetOpCodeName(opcode));
     return retVal;
   } else {
     // Cannot add name to void.
@@ -495,20 +494,22 @@ Value *TrivialDxilVectorOperation(Function *dxilFunc, OP::OpCode opcode,
   }
 }
 
-
-Value *TrivialDxilVectorUnaryOperationRet(OP::OpCode opcode, Value *src, Type *Ty,
-					  OP *hlslOP, IRBuilder<> &Builder) {
+Value *TrivialDxilVectorUnaryOperationRet(OP::OpCode opcode, Value *src,
+                                          Type *Ty, OP *hlslOP,
+                                          IRBuilder<> &Builder) {
 
   Constant *opArg = hlslOP->GetU32Const((unsigned)opcode);
   Value *args[] = {opArg, src};
 
   Function *dxilFunc = hlslOP->GetOpFunc(opcode, Ty);
 
-  return TrivialDxilVectorOperation(dxilFunc, opcode, args, Ty, hlslOP, Builder);
+  return TrivialDxilVectorOperation(dxilFunc, opcode, args, Ty, hlslOP,
+                                    Builder);
 }
 
-Value *TrivialDxilVectorBinaryOperation(OP::OpCode opcode, Value *src0, Value *src1,
-                                  hlsl::OP *hlslOP, IRBuilder<> &Builder) {
+Value *TrivialDxilVectorBinaryOperation(OP::OpCode opcode, Value *src0,
+                                        Value *src1, hlsl::OP *hlslOP,
+                                        IRBuilder<> &Builder) {
   Type *Ty = src0->getType();
 
   Constant *opArg = hlslOP->GetU32Const((unsigned)opcode);
@@ -516,7 +517,8 @@ Value *TrivialDxilVectorBinaryOperation(OP::OpCode opcode, Value *src0, Value *s
 
   Function *dxilFunc = hlslOP->GetOpFunc(opcode, Ty);
 
-  return TrivialDxilVectorOperation(dxilFunc, opcode, args, Ty, hlslOP, Builder);
+  return TrivialDxilVectorOperation(dxilFunc, opcode, args, Ty, hlslOP,
+                                    Builder);
 }
 
 Value *TrivialDxilUnaryOperationRet(OP::OpCode opcode, Value *src, Type *RetTy,
@@ -545,24 +547,26 @@ Value *TrivialDxilBinaryOperation(OP::OpCode opcode, Value *src0, Value *src1,
   return TrivialDxilOperation(opcode, args, Ty, Ty, hlslOP, Builder);
 }
 
-Value *TrivialDxilTrinaryOperationRet(OP::OpCode opcode, Value *src0, Value *src1,
-				      Value *src2, Type *Ty, hlsl::OP *hlslOP,
-				      IRBuilder<> &Builder) {
+Value *TrivialDxilTrinaryOperationRet(OP::OpCode opcode, Value *src0,
+                                      Value *src1, Value *src2, Type *Ty,
+                                      hlsl::OP *hlslOP, IRBuilder<> &Builder) {
   Constant *opArg = hlslOP->GetU32Const((unsigned)opcode);
   Value *args[] = {opArg, src0, src1, src2};
 
   return TrivialDxilOperation(opcode, args, Ty, Ty, hlslOP, Builder);
 }
 
-Value *TrivialDxilVectorTrinaryOperationRet(OP::OpCode opcode, Value *src0, Value *src1,
-					    Value *src2, Type *Ty, hlsl::OP *hlslOP,
-					    IRBuilder<> &Builder) {
+Value *TrivialDxilVectorTrinaryOperationRet(OP::OpCode opcode, Value *src0,
+                                            Value *src1, Value *src2, Type *Ty,
+                                            hlsl::OP *hlslOP,
+                                            IRBuilder<> &Builder) {
   Constant *opArg = hlslOP->GetU32Const((unsigned)opcode);
   Value *args[] = {opArg, src0, src1, src2};
 
   Function *dxilFunc = hlslOP->GetOpFunc(opcode, Ty);
 
-  return TrivialDxilVectorOperation(dxilFunc, opcode, args, Ty, hlslOP, Builder);
+  return TrivialDxilVectorOperation(dxilFunc, opcode, args, Ty, hlslOP,
+                                    Builder);
 }
 
 Value *TrivialUnaryOperation(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
@@ -577,22 +581,20 @@ Value *TrivialUnaryOperation(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
   return retVal;
 }
 
-Value *TrivialVectorizableUnaryOperation(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
-					 HLOperationLowerHelper &helper,
-					 HLObjectOperationLowerHelper *pObjHelper,
-					 bool &Translated) {
+Value *TrivialVectorizableUnaryOperation(
+    CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
+    HLOperationLowerHelper &helper, HLObjectOperationLowerHelper *pObjHelper,
+    bool &Translated) {
   Value *src0 = CI->getArgOperand(HLOperandIndex::kUnaryOpSrc0Idx);
   Type *Ty = CI->getType();
   IRBuilder<> Builder(CI);
   hlsl::OP *hlslOP = &helper.hlslOP;
 
-  if (Ty->isVectorTy() &&
-      helper.M.GetShaderModel()->IsSM69Plus())
-    return TrivialDxilVectorUnaryOperationRet(opcode, src0, Ty,
-					      hlslOP, Builder);
+  if (Ty->isVectorTy() && helper.M.GetShaderModel()->IsSM69Plus())
+    return TrivialDxilVectorUnaryOperationRet(opcode, src0, Ty, hlslOP,
+                                              Builder);
   else
-    return TrivialDxilUnaryOperationRet(opcode, src0, Ty,
-					hlslOP, Builder);
+    return TrivialDxilUnaryOperationRet(opcode, src0, Ty, hlslOP, Builder);
 }
 
 Value *TrivialBinaryOperation(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
@@ -609,10 +611,11 @@ Value *TrivialBinaryOperation(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
   return binOp;
 }
 
-Value *TrivialVectorBinaryOperation(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
-				    HLOperationLowerHelper &helper,
-				    HLObjectOperationLowerHelper *pObjHelper,
-				    bool &Translated) {
+Value *TrivialVectorBinaryOperation(CallInst *CI, IntrinsicOp IOP,
+                                    OP::OpCode opcode,
+                                    HLOperationLowerHelper &helper,
+                                    HLObjectOperationLowerHelper *pObjHelper,
+                                    bool &Translated) {
   hlsl::OP *hlslOP = &helper.hlslOP;
   Value *src0 = CI->getArgOperand(HLOperandIndex::kBinaryOpSrc0Idx);
   Value *src1 = CI->getArgOperand(HLOperandIndex::kBinaryOpSrc1Idx);
@@ -624,9 +627,9 @@ Value *TrivialVectorBinaryOperation(CallInst *CI, IntrinsicOp IOP, OP::OpCode op
 }
 
 Value *TranslateFMA(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
-		    HLOperationLowerHelper &helper,
-		    HLObjectOperationLowerHelper *pObjHelper,
-		    bool &Translated) {
+                    HLOperationLowerHelper &helper,
+                    HLObjectOperationLowerHelper *pObjHelper,
+                    bool &Translated) {
   hlsl::OP *hlslOP = &helper.hlslOP;
   Type *Ty = CI->getType();
   Value *src0 = CI->getArgOperand(HLOperandIndex::kTrinaryOpSrc0Idx);
@@ -634,11 +637,12 @@ Value *TranslateFMA(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
   Value *src2 = CI->getArgOperand(HLOperandIndex::kTrinaryOpSrc2Idx);
   IRBuilder<> Builder(CI);
 
-  if (Ty->isVectorTy() &&
-      helper.M.GetShaderModel()->IsSM69Plus())
-    return TrivialDxilVectorTrinaryOperationRet(opcode, src0, src1, src2, Ty, hlslOP, Builder);
+  if (Ty->isVectorTy() && helper.M.GetShaderModel()->IsSM69Plus())
+    return TrivialDxilVectorTrinaryOperationRet(opcode, src0, src1, src2, Ty,
+                                                hlslOP, Builder);
   else
-    return TrivialDxilTrinaryOperationRet(opcode, src0, src1, src2, Ty, hlslOP, Builder);
+    return TrivialDxilTrinaryOperationRet(opcode, src0, src1, src2, Ty, hlslOP,
+                                          Builder);
 }
 
 Value *TrivialIsSpecialFloat(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
@@ -1984,15 +1988,16 @@ Value *TranslateClamp(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
 
   IRBuilder<> Builder(CI);
   // min(max(x, minVal), maxVal).
-  if (Ty->isVectorTy() &&
-      helper.M.GetShaderModel()->IsSM69Plus()) {
+  if (Ty->isVectorTy() && helper.M.GetShaderModel()->IsSM69Plus()) {
     Value *maxXMinVal =
-      TrivialDxilVectorBinaryOperation(maxOp, x, minVal, hlslOP, Builder);
-    return TrivialDxilVectorBinaryOperation(minOp, maxXMinVal, maxVal, hlslOP, Builder);
+        TrivialDxilVectorBinaryOperation(maxOp, x, minVal, hlslOP, Builder);
+    return TrivialDxilVectorBinaryOperation(minOp, maxXMinVal, maxVal, hlslOP,
+                                            Builder);
   } else {
     Value *maxXMinVal =
-      TrivialDxilBinaryOperation(maxOp, x, minVal, hlslOP, Builder);
-    return TrivialDxilBinaryOperation(minOp, maxXMinVal, maxVal, hlslOP, Builder);
+        TrivialDxilBinaryOperation(maxOp, x, minVal, hlslOP, Builder);
+    return TrivialDxilBinaryOperation(minOp, maxXMinVal, maxVal, hlslOP,
+                                      Builder);
   }
 }
 
@@ -2306,11 +2311,12 @@ Value *TranslateExp(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
         ConstantVector::getSplat(Ty->getVectorNumElements(), log2eConst);
   }
   val = Builder.CreateFMul(log2eConst, val);
-  if (Ty->isVectorTy() &&
-      helper.M.GetShaderModel()->IsSM69Plus())
-    return TrivialDxilVectorUnaryOperationRet(OP::OpCode::Exp, val, Ty, hlslOP, Builder);
+  if (Ty->isVectorTy() && helper.M.GetShaderModel()->IsSM69Plus())
+    return TrivialDxilVectorUnaryOperationRet(OP::OpCode::Exp, val, Ty, hlslOP,
+                                              Builder);
   else
-    return TrivialDxilUnaryOperationRet(OP::OpCode::Exp, val, Ty, hlslOP, Builder);
+    return TrivialDxilUnaryOperationRet(OP::OpCode::Exp, val, Ty, hlslOP,
+                                        Builder);
 }
 
 Value *TranslateLog(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
@@ -2326,11 +2332,12 @@ Value *TranslateLog(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
     ln2Const = ConstantVector::getSplat(Ty->getVectorNumElements(), ln2Const);
   }
   Value *log = nullptr;
-  if (Ty->isVectorTy() &&
-      helper.M.GetShaderModel()->IsSM69Plus())
-    log = TrivialDxilVectorUnaryOperationRet(OP::OpCode::Log, val, Ty, hlslOP, Builder);
+  if (Ty->isVectorTy() && helper.M.GetShaderModel()->IsSM69Plus())
+    log = TrivialDxilVectorUnaryOperationRet(OP::OpCode::Log, val, Ty, hlslOP,
+                                             Builder);
   else
-    log = TrivialDxilUnaryOperationRet(OP::OpCode::Log, val, Ty, hlslOP, Builder);
+    log =
+        TrivialDxilUnaryOperationRet(OP::OpCode::Log, val, Ty, hlslOP, Builder);
 
   return Builder.CreateFMul(ln2Const, log);
 }
@@ -2390,13 +2397,12 @@ Value *TranslateFUIBinary(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
       break;
     }
   }
-  if (CI->getType()->isVectorTy() &&
-      helper.M.GetShaderModel()->IsSM69Plus())
+  if (CI->getType()->isVectorTy() && helper.M.GetShaderModel()->IsSM69Plus())
     return TrivialVectorBinaryOperation(CI, IOP, opcode, helper, pObjHelper,
-					Translated);
+                                        Translated);
   else
     return TrivialBinaryOperation(CI, IOP, opcode, helper, pObjHelper,
-				  Translated);
+                                  Translated);
 }
 
 Value *TranslateFUITrinary(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
@@ -2421,7 +2427,8 @@ Value *TranslateFUITrinary(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
   Value *src2 = CI->getArgOperand(HLOperandIndex::kTrinaryOpSrc2Idx);
   IRBuilder<> Builder(CI);
 
-  return TrivialDxilTrinaryOperationRet(opcode, src0, src1, src2, Ty, hlslOP, Builder);
+  return TrivialDxilTrinaryOperationRet(opcode, src0, src1, src2, Ty, hlslOP,
+                                        Builder);
 }
 
 Value *TranslateFrexp(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
@@ -2545,9 +2552,8 @@ Value *TrivialDotOperation(OP::OpCode opcode, Value *src0, Value *src1,
 
 // Instead of using a DXIL intrinsic, implement a dot product operation using
 // multiply and add operations. Used for integer dots and long vectors.
-Value *ExpandDot(Value *arg0, Value *arg1, unsigned vecSize,
-		 hlsl::OP *hlslOP, IRBuilder<> &Builder,
-		 bool Unsigned = false) {
+Value *ExpandDot(Value *arg0, Value *arg1, unsigned vecSize, hlsl::OP *hlslOP,
+                 IRBuilder<> &Builder, bool Unsigned = false) {
   auto madOpCode = Unsigned ? DXIL::OpCode::UMad : DXIL::OpCode::IMad;
   if (arg0->getType()->getScalarType()->isFloatingPointTy())
     madOpCode = DXIL::OpCode::FMad;
@@ -2557,8 +2563,8 @@ Value *ExpandDot(Value *arg0, Value *arg1, unsigned vecSize,
   for (unsigned Elt = 1; Elt < vecSize; ++Elt) {
     Elt0 = Builder.CreateExtractElement(arg0, Elt);
     Elt1 = Builder.CreateExtractElement(arg1, Elt);
-    Result = TrivialDxilTrinaryOperationRet(madOpCode, Elt0, Elt1, Result, Elt0->getType(), hlslOP,
-					    Builder);
+    Result = TrivialDxilTrinaryOperationRet(madOpCode, Elt0, Elt1, Result,
+                                            Elt0->getType(), hlslOP, Builder);
   }
 
   return Result;
@@ -2596,11 +2602,12 @@ Value *TranslateDot(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
   unsigned vecSize = Ty->getVectorNumElements();
   Value *arg1 = CI->getArgOperand(HLOperandIndex::kBinaryOpSrc1Idx);
   IRBuilder<> Builder(CI);
-  if (Ty->getScalarType()->isFloatingPointTy() && Ty->getVectorNumElements() <= 4) {
+  if (Ty->getScalarType()->isFloatingPointTy() &&
+      Ty->getVectorNumElements() <= 4) {
     return TranslateFDot(arg0, arg1, vecSize, hlslOP, Builder);
   } else {
     return ExpandDot(arg0, arg1, vecSize, hlslOP, Builder,
-                         IOP == IntrinsicOp::IOP_udot);
+                     IOP == IntrinsicOp::IOP_udot);
   }
 }
 
@@ -2783,8 +2790,9 @@ Value *TranslateMSad4(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
   byteSrc = Builder.CreateInsertElement(byteSrc, byteSrcElt, 3);
 
   // Msad on vecref and byteSrc.
-  return TrivialDxilTrinaryOperationRet(DXIL::OpCode::Msad, vecRef, byteSrc, accum,
-					vecRef->getType(), hlslOP, Builder);
+  return TrivialDxilTrinaryOperationRet(DXIL::OpCode::Msad, vecRef, byteSrc,
+                                        accum, vecRef->getType(), hlslOP,
+                                        Builder);
 }
 
 Value *TranslateRCP(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
@@ -3167,7 +3175,7 @@ Value *TranslateMul(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
         return TranslateFDot(arg0, arg1, vecSize, hlslOP, Builder);
       } else {
         return ExpandDot(arg0, arg1, vecSize, hlslOP, Builder,
-                             IOP == IntrinsicOp::IOP_umul);
+                         IOP == IntrinsicOp::IOP_umul);
       }
     } else {
       // mul(vector, scalar) == vector * scalar-splat
@@ -4187,8 +4195,7 @@ ResLoadHelper::ResLoadHelper(CallInst *CI, DxilResource::Kind RK,
         status = CI->getArgOperand(kStatusIdx);
     }
   } else {
-    if (opcode == OP::OpCode::RawBufferLoad &&
-        CI->getType()->isVectorTy() &&
+    if (opcode == OP::OpCode::RawBufferLoad && CI->getType()->isVectorTy() &&
         CI->getType()->getVectorNumElements() > 1 &&
         CI->getModule()->GetHLModule().GetShaderModel()->IsSM69Plus())
       opcode = OP::OpCode::RawBufferVectorLoad;
@@ -4306,7 +4313,6 @@ static SmallVector<Value *, 12> GetBufLoadArgs(ResLoadHelper helper,
       // RawBufferVectorLoad takes no mask argument.
       Args.emplace_back(alignmentVal); // alignment @4
     }
-
   }
   return Args;
 }
@@ -4375,11 +4381,11 @@ Value *TranslateBufLoad(ResLoadHelper &helper, HLResource::Kind RK,
         if (RK == DxilResource::Kind::RawBuffer)
           // Raw buffers can't use offset param. Add to coord index.
           Args[kCoordIdx] =
-            Builder.CreateAdd(Args[kCoordIdx], OP->GetU32Const(4 * LdSize));
+              Builder.CreateAdd(Args[kCoordIdx], OP->GetU32Const(4 * LdSize));
         else
           // Structured buffers increment the offset parameter.
           Args[kOffsetIdx] =
-            Builder.CreateAdd(Args[kOffsetIdx], OP->GetU32Const(4 * LdSize));
+              Builder.CreateAdd(Args[kOffsetIdx], OP->GetU32Const(4 * LdSize));
       }
     }
     retValNew = ScalarizeElements(Ty, elts, Builder);
@@ -6505,7 +6511,8 @@ IntrinsicLower gLowerTable[] = {
     {IntrinsicOp::IOP_asint16, TranslateBitcast, DXIL::OpCode::NumOpCodes},
     {IntrinsicOp::IOP_asuint, TranslateAsUint, DXIL::OpCode::SplitDouble},
     {IntrinsicOp::IOP_asuint16, TranslateAsUint, DXIL::OpCode::NumOpCodes},
-    {IntrinsicOp::IOP_atan, TrivialVectorizableUnaryOperation, DXIL::OpCode::Atan},
+    {IntrinsicOp::IOP_atan, TrivialVectorizableUnaryOperation,
+     DXIL::OpCode::Atan},
     {IntrinsicOp::IOP_atan2, TranslateAtan2, DXIL::OpCode::NumOpCodes},
     {IntrinsicOp::IOP_ceil, TrivialUnaryOperation, DXIL::OpCode::Round_pi},
     {IntrinsicOp::IOP_clamp, TranslateClamp, DXIL::OpCode::NumOpCodes},
@@ -6596,7 +6603,8 @@ IntrinsicLower gLowerTable[] = {
     {IntrinsicOp::IOP_sqrt, TrivialUnaryOperation, DXIL::OpCode::Sqrt},
     {IntrinsicOp::IOP_step, TranslateStep, DXIL::OpCode::NumOpCodes},
     {IntrinsicOp::IOP_tan, TrivialUnaryOperation, DXIL::OpCode::Tan},
-    {IntrinsicOp::IOP_tanh, TrivialVectorizableUnaryOperation, DXIL::OpCode::Htan},
+    {IntrinsicOp::IOP_tanh, TrivialVectorizableUnaryOperation,
+     DXIL::OpCode::Htan},
     {IntrinsicOp::IOP_tex1D, EmptyLower, DXIL::OpCode::NumOpCodes},
     {IntrinsicOp::IOP_tex1Dbias, EmptyLower, DXIL::OpCode::NumOpCodes},
     {IntrinsicOp::IOP_tex1Dgrad, EmptyLower, DXIL::OpCode::NumOpCodes},
@@ -8464,9 +8472,9 @@ void TranslateStructBufSubscript(CallInst *CI, Value *handle, Value *status,
 namespace {
 
 Value *TranslateTypedBufSubscript(CallInst *CI, DXIL::ResourceKind RK,
-                             DXIL::ResourceClass RC, Value *handle,
-                             LoadInst *ldInst, IRBuilder<> &Builder,
-                             hlsl::OP *hlslOP, const DataLayout &DL) {
+                                  DXIL::ResourceClass RC, Value *handle,
+                                  LoadInst *ldInst, IRBuilder<> &Builder,
+                                  hlsl::OP *hlslOP, const DataLayout &DL) {
   // The arguments to the call instruction are used to determine the access,
   // the return value and type come from the load instruction.
   ResLoadHelper ldHelper(CI, RK, RC, handle, IntrinsicOp::MOP_Load, ldInst);
@@ -8514,8 +8522,8 @@ Value *UpdateVectorElt(Value *VecVal, Value *EltVal, Value *EltIdx,
 }
 
 void TranslateTypedBufferSubscript(CallInst *CI, HLOperationLowerHelper &helper,
-                               HLObjectOperationLowerHelper *pObjHelper,
-                               bool &Translated) {
+                                   HLObjectOperationLowerHelper *pObjHelper,
+                                   bool &Translated) {
   Value *ptr = CI->getArgOperand(HLOperandIndex::kSubscriptObjectOpIdx);
 
   hlsl::OP *hlslOP = &helper.hlslOP;
@@ -8533,7 +8541,7 @@ void TranslateTypedBufferSubscript(CallInst *CI, HLOperationLowerHelper &helper,
     Value *UndefI = UndefValue::get(Builder.getInt32Ty());
     if (LoadInst *ldInst = dyn_cast<LoadInst>(user)) {
       TranslateTypedBufSubscript(CI, RK, RC, handle, ldInst, Builder, hlslOP,
-                            helper.dataLayout);
+                                 helper.dataLayout);
     } else if (StoreInst *stInst = dyn_cast<StoreInst>(user)) {
       Value *val = stInst->getValueOperand();
       TranslateStore(RK, handle, val,
diff --git a/tools/clang/lib/Sema/SemaHLSL.cpp b/tools/clang/lib/Sema/SemaHLSL.cpp
index 555b0ba4..1824b7c1 100644
--- a/tools/clang/lib/Sema/SemaHLSL.cpp
+++ b/tools/clang/lib/Sema/SemaHLSL.cpp
@@ -1017,16 +1017,19 @@ static const ArBasicKind g_UIntCT[] = {AR_BASIC_UINT32, AR_BASIC_LITERAL_INT,
 // AR_BASIC_INT32 should be the default for any int since min precision integers
 // should map to int32, not int16 or int64
 static const ArBasicKind g_AnyIntCT[] = {
-    AR_BASIC_INT32, AR_BASIC_INT16,  AR_BASIC_UINT32,      AR_BASIC_UINT16,
-    AR_BASIC_INT64, AR_BASIC_UINT64, AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED,
-    AR_BASIC_LITERAL_INT, AR_BASIC_UNKNOWN};
+    AR_BASIC_INT32,        AR_BASIC_INT16,         AR_BASIC_UINT32,
+    AR_BASIC_UINT16,       AR_BASIC_INT64,         AR_BASIC_UINT64,
+    AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, AR_BASIC_LITERAL_INT,
+    AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_AnyInt32CT[] = {
-  AR_BASIC_INT32, AR_BASIC_UINT32, AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, AR_BASIC_LITERAL_INT, AR_BASIC_UNKNOWN};
+    AR_BASIC_INT32,         AR_BASIC_UINT32,      AR_BASIC_INT8_4PACKED,
+    AR_BASIC_UINT8_4PACKED, AR_BASIC_LITERAL_INT, AR_BASIC_UNKNOWN};
 
-static const ArBasicKind g_UIntOnlyCT[] = {AR_BASIC_UINT32, AR_BASIC_UINT64,AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, 
-                                           AR_BASIC_LITERAL_INT,
-                                           AR_BASIC_NOCAST, AR_BASIC_UNKNOWN};
+static const ArBasicKind g_UIntOnlyCT[] = {
+    AR_BASIC_UINT32,        AR_BASIC_UINT64,      AR_BASIC_INT8_4PACKED,
+    AR_BASIC_UINT8_4PACKED, AR_BASIC_LITERAL_INT, AR_BASIC_NOCAST,
+    AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_FloatCT[] = {
     AR_BASIC_FLOAT32, AR_BASIC_FLOAT32_PARTIAL_PRECISION,
@@ -1064,20 +1067,21 @@ static const ArBasicKind g_NumericCT[] = {
     AR_BASIC_UINT16,        AR_BASIC_UINT32,
     AR_BASIC_MIN12INT,      AR_BASIC_MIN16INT,
     AR_BASIC_MIN16UINT,     AR_BASIC_INT64,
-    AR_BASIC_UINT64,        AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, AR_BASIC_UNKNOWN};
+    AR_BASIC_UINT64,        AR_BASIC_INT8_4PACKED,
+    AR_BASIC_UINT8_4PACKED, AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_Numeric32CT[] = {
     AR_BASIC_FLOAT32,       AR_BASIC_FLOAT32_PARTIAL_PRECISION,
     AR_BASIC_LITERAL_FLOAT, AR_BASIC_LITERAL_INT,
     AR_BASIC_INT32,         AR_BASIC_UINT32,
-AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED,     
+    AR_BASIC_INT8_4PACKED,  AR_BASIC_UINT8_4PACKED,
     AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_Numeric32OnlyCT[] = {
     AR_BASIC_FLOAT32,       AR_BASIC_FLOAT32_PARTIAL_PRECISION,
     AR_BASIC_LITERAL_FLOAT, AR_BASIC_LITERAL_INT,
     AR_BASIC_INT32,         AR_BASIC_UINT32,
-AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, 
+    AR_BASIC_INT8_4PACKED,  AR_BASIC_UINT8_4PACKED,
     AR_BASIC_NOCAST,        AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_AnyCT[] = {
@@ -1090,7 +1094,7 @@ static const ArBasicKind g_AnyCT[] = {
     AR_BASIC_MIN12INT,      AR_BASIC_MIN16INT,
     AR_BASIC_MIN16UINT,     AR_BASIC_BOOL,
     AR_BASIC_INT64,         AR_BASIC_UINT64,
-AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, 
+    AR_BASIC_INT8_4PACKED,  AR_BASIC_UINT8_4PACKED,
     AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_AnySamplerCT[] = {
@@ -1153,10 +1157,10 @@ static const ArBasicKind g_Numeric16OnlyCT[] = {
     AR_BASIC_LITERAL_FLOAT, AR_BASIC_LITERAL_INT, AR_BASIC_NOCAST,
     AR_BASIC_UNKNOWN};
 
-static const ArBasicKind g_Int32OnlyCT[] = {AR_BASIC_INT32, AR_BASIC_UINT32,
-AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, 
-                                            AR_BASIC_LITERAL_INT,
-                                            AR_BASIC_NOCAST, AR_BASIC_UNKNOWN};
+static const ArBasicKind g_Int32OnlyCT[] = {
+    AR_BASIC_INT32,         AR_BASIC_UINT32,      AR_BASIC_INT8_4PACKED,
+    AR_BASIC_UINT8_4PACKED, AR_BASIC_LITERAL_INT, AR_BASIC_NOCAST,
+    AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_Float32OnlyCT[] = {
     AR_BASIC_FLOAT32, AR_BASIC_LITERAL_FLOAT, AR_BASIC_NOCAST,
@@ -1178,14 +1182,13 @@ static const ArBasicKind g_UInt8_4PackedCT[] = {
     AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_AnyInt16Or32CT[] = {
-    AR_BASIC_INT32,  AR_BASIC_UINT32,      AR_BASIC_INT16,
-    AR_BASIC_UINT16, 
-AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, AR_BASIC_LITERAL_INT, AR_BASIC_UNKNOWN};
+    AR_BASIC_INT32,       AR_BASIC_UINT32,       AR_BASIC_INT16,
+    AR_BASIC_UINT16,      AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED,
+    AR_BASIC_LITERAL_INT, AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_SInt16Or32OnlyCT[] = {
-    AR_BASIC_INT32, AR_BASIC_INT16, AR_BASIC_LITERAL_INT, 
-AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, 
-AR_BASIC_NOCAST,
+    AR_BASIC_INT32,        AR_BASIC_INT16,         AR_BASIC_LITERAL_INT,
+    AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, AR_BASIC_NOCAST,
     AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_ByteAddressBufferCT[] = {
@@ -8619,7 +8622,6 @@ ExprResult HLSLExternalSource::LookupVectorMemberExprForHLSL(
     llvm_unreachable("Unknown VectorMemberAccessError value");
   }
 
-
   if (colCount > 4)
     msg = diag::err_hlsl_vector_member_on_long_vector;
 
  • Check this box to apply formatting changes to this branch.

@VladM1076
Copy link

Hi @pow2clk, is this the latest PR for LongVectors and is longvecs.hlsl (tools/clang/test/CodeGenDXIL/hlsl/types/longvecs.hlsl) failure in DXC expected?

I am using this to start testing things on our side and want to make sure I am testing the right stuff.

pow2clk and others added 10 commits March 25, 2025 18:00
This change makes hlsl::IntrinsicOp enum values stable by:
- adding hlsl_intrinsic_opcodes.json to capture assigned indices
- adds this to the files generated by hctgen
- generation assigns new indices after the last index
- hlsl::IntrinsicOp enum values have explicit assignments
- removes ENABLE_SPIRV_CODEGEN ifdefs around opcode definitions and
lowering table entries to keep these stable whether or not the spirv
build setting is enabled.

Fixes microsoft#7230
This is just the diffs for the new version that was meant to fix the
warnings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

Successfully merging this pull request may close these issues.

3 participants