Longvec mega change #7188

pow2clk · 2025-03-10T14:59:15Z

A collection of incompletely tested and probably buggy implementations of long/native vector functionality.

Remove errors in Sema diagnostics for vectors longer than 4 in 6.9. Test for failures using long vectors in unspported contexts and for correct codegen in supported contexts. Verify errors persist in pre-6.9 shader models The type buffer cache expects a max vector size of 4. By just skipping the cache for longer vectors, we don't overrun and store float7 vectors in the double3 slot or retrieve the double3 in place of float7. Testing is for acceptance, mangling and basic copying that takes place at the high level to ensure they are being accepted and recognized correctly. The intent is not to tully test the passing of data as that requires enabling vector operations to do properly. This test is used to verify that these same constructs are disallowed in 6.8 and earlier. A separate test verifies that disallowed contexts produce the appropriate errors Fixes microsoft#7117

Disallow long vectors, and arrays or structs containing long vectors in cbuffers, entry functions, node records, tessellation patchs, or special intrinsic parameters with user-defined struct parameters.

This got lost somewhere

…decls_pr

Expand resource attribute to all resource types by adding reskind and resclass arguments indicating the specific resource type. Change detection in HlslTypes to use these attribute arguments. Similarly add vertex number arguments to output stream attribute and a boolean indicator of input or output for tessellation patches. Add geomstream attr to detect those objects Use attribute to detect tesselation patches Removes template arg counts and startswith stirngs to identify tesslations patches and distinguish them from multisampled textures

Add setting for max vec size. Determine long vector presence using DefinitionData bit? OR Rename testing for long vecs function? Add attribute for geometry streams, produce and test errors for long vectors there. Add and test errors for > 1024 element vectors. Add vector size to error messages good test changes

Go for consistent test filename formatting. most LLVM tests have dashes, so dashes it is. Remove redundant sm68 test

Expand existing tests to different target and contexts. Add thorough testing for geometry streams and tessellation patches. Add toolong vector test. Verify that vectors that are over the maximum for 6.9 fail. Add subobjects and template classes to tests. These are unfortunately disabled because the code to make them work causes other tests to fail.

Use RequireCompleteType to force specialization of templates encountered in global and other scopes where finding long vectors is necessary where possible. This populates the definitiondata which contains the base class chain needed to detect when a base class has disqualifying long vectors. It was also needed to detect when dependent types in a template class result in long vectors. Work graph node types didn't check their base classes for failures. This affects base classes with longvectors that have sub classes used for node objects which should fail for having long vector members. Respond to feedback about iterating through fields in clunky manner which got left out of the last reviewer feedback response

I guess it was about time. Should simplify some things later as well as at present and it was too easy to not do. Specifically, I was going to need to add another string check to the template instantiation code to identify longvectors. This is cleaner. Incidentally convert another feedback texture string check to use attribs. Incidentally resort the recently-added attribs to not break up the node shader attribs.

Vector types can be cached in a 2D array that has a column for lenghts 1-4. This uses the added contant to indicate the length and for the checks that confirm it isn't exceeded.

By setting the bit when the vector template is instantiated and then propagating it when members, be they standard members or base classes, the bit will be set correctly for any struct or struct-like type. For arrays, the arrays are pealed away in a utility function to get at the elements. Decided to separate the check for completeness from the check for long vectors. Even though the latter almost always requires the former, they are separate concepts and embedding the first in the second would be unexpected

Output Streams, Tessellation patches, and global variables should be complete when receiving other correctness checks. If they cannot be made complete, they should produce an error. This was omitted for various of these including non-template globals, which was fine, but it meant that redundant errors were produced for templates, but not standard globals likely just because that was what was tested. This removes that distinction and adds testing for all of the above to the existing incomplete-type.hlsl test.

remove some stale elements. Add some HLSL type helper functions and add some new ones. Make resource type retreiveals type-safe. Add some parameter comments and names to make clearer what the effect of them are. Pass resource attribute to cbuffer/tbuffer creation. Clean up and clarify error messages. Remove redundant type canonization from type queries. Correct resclass of tbuffers. Use multimatch utility of verify to condense checks

Disables various forms of scalarization and vector elimination to permit vectors to pass through to final DXIL when used in native LLVM operations and loading/storing. Introduces a few vector manipulation llvm instructions to DXIL allowing for them to appear in output DXIL. Skips passes for 6.9 that scalarize, convert to arrays, or otherwise eliminate vectors. This eliminates the element-by-element loading of the vectors In many cases, this required plumbing the shader model information to passes that didn't have it before. Many changes were needed for the MatrixBitcastLower pass related to linking to avoid converting matrix vectors, but also to perform the conversion if a shader was compiled for 6.9+, but then linked to a earlier target. This now adapts to the linker target to either preserve vectors for 6.9 or arrays for previous versions. This requires running the DynamicIndexing VectorToArray pass during linking since 6_x and 6_9+ will fail to run this in the initial compile, but will still need to lower vectors to arrays. Ternary conditional/select operators were element extracted in codegen. Removing this allows 6.9 to preserve the vectors, but also maintains behavior for previous shader models because the operations get scalarized later anyway. Keep groupshared variables as vectors for 6.9. They are no longer represented as indivual groupshared scalars. Adds extensive tests for these operations using different types and sizes and testing them appropriately. Booleans produce significantly different code, so they get their own test. Fixes microsoft#7123

Disentangles the raw buffer lowering implmentation into an isolated function. Alters the various places that lowering took place to call into the common function. This function will be expanded to handle other lowering later. When raw buffers use a templated load with a struct, they reuse the subscript path also used for subscripted structured buffers. Such loads with structs containing vectors or matrices will invoke the load lowering from within this recursive call that traverses GEPs and other users of the original call to set up correct offsets etc. This adapts that code to use the common load lowering that enables long vectors within structs to be correctly loaded. Since the code expects byte address buffers, it is not (yet) adapted to structured buffers, so those code paths are kept as they were. This requires the ability to override the type used by the resloadhelper explicitly, so a member is added to accomodate the matrices vector representation that doesn't match the types of the load call. This also requires removing the bufIdx and offset swapping that was done, confusingly throughout the TranslateStructBufSubscriptUser code to account for the fact that byte address buffers have to represent offsets using the main coord parameter in favor of passing the Resource Kind down such that the right parameter can receive the incrementation when necessary for longer types such as matrices. This is enabled also by adding ResKind appropriate offset calculation in the ResLoadHelper. ResLoadHelper also gets an opcode set based on the ResKind for both overaloads in preparation for further expansion to different resource kinds.

the default offset behavior for non-call instructions was different in every location so making it explicit and letting each call location calculate it as appropriate cleaned things up. It turns out the only case of matrix loading where the type was thought to be different from the type of the value to replace was always the same, so the type member is removed. The only case where offset calculation was different for the call instruction constructor was also where the replaced instruction needed to be explicit, so the boolean parameter was replaced by the explicit replacement instruction which sets that boolean and explicitly sets the replaced instruction to either the matrix load instruction or a load instruction of the result of a subscript operator.

making the translateresourcebuffer call into the correct buffer lowererer Adapt resloadhelper to set miplevel correctly for MS textures Rename some typed buffer load utility functions. With recent changes to load lowering, these function names are misleading implying that they are used for loads instead of just subscript operators or that they are used more broadly than just for typed buffers. Add testing for texture load/stores. move arg collection to separate function to better enable iteration control and handle the complicated typed buf arguments

Clarify some naming and reduce some redundancy between GenerateRawBufLd and TranslateBufLoad. Part of this involved passing the correct vector of i32 type for loading boolean vectors in sm69 raw buffer loads.

Add a new native vector overload type to DXIL intrinsics and the corresponding generation. Add new raw buffer vector load/store intrinsics that use that overload type.

When the loaded/stored type is a vector of more than 1 element, the shader model is 6.9 or higher, and the operation is on a raw buffer, enable the generation of a native vector raw buffer load or store.

Added structured buffer support to TranslateStore and used it for all such lowerings.

add vector overload type and apply to the relevant builtins for 6.9, don't lower vectors for exp lowering. ugly fix for opcode reordering note that still need to clean up loading from BABuffers too

Preliminary groupshared support Just adds groupshared to the test and performs the switch to CS to allow it. Additionally required storing output to a buffer, which was something that needed testing anyway. keep groupshared as vectors for 6.9 They are no longer represented as inidivual groupshared scalars, but they are still retrived one single element at a time. I'm not sure we have another way to do it just yet.

Support dot product on long vecs by expanding the inrinsic into mul/mad ops like is done with integer dot products

Since the or() and and() intrinsics did their own scalarization, the or/and operators would never be applied to full vectors. This leaves the scalarization for the scalarization pass, which will skip it for 6.9

github-actions · 2025-03-10T15:00:23Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff 4d3a2f5489fd9f438f13b2308e767a93882d4728 5674a1848801ac94fe473fe5d1512920ebbc612b -- include/dxc/DXIL/DxilConstants.h include/dxc/DXIL/DxilInstructions.h include/dxc/DXIL/DxilOperations.h include/dxc/HlslIntrinsicOp.h lib/DXIL/DxilOperations.cpp lib/DXIL/DxilUtil.cpp lib/DxilValidation/DxilValidation.cpp lib/HLSL/DxilLinker.cpp lib/HLSL/HLMatrixBitcastLowerPass.cpp lib/HLSL/HLOperationLower.cpp lib/Transforms/Scalar/DxilEliminateVector.cpp lib/Transforms/Scalar/LowerTypePasses.cpp lib/Transforms/Scalar/ScalarReplAggregatesHLSL.cpp lib/Transforms/Scalar/Scalarizer.cpp tools/clang/include/clang/AST/DeclCXX.h tools/clang/include/clang/AST/HlslTypes.h tools/clang/include/clang/Basic/LangOptions.h tools/clang/include/clang/Sema/SemaHLSL.h tools/clang/lib/AST/ASTContextHLSL.cpp tools/clang/lib/AST/DeclCXX.cpp tools/clang/lib/AST/HlslTypes.cpp tools/clang/lib/CodeGen/CGExprScalar.cpp tools/clang/lib/Sema/SemaDXR.cpp tools/clang/lib/Sema/SemaHLSL.cpp tools/clang/lib/Sema/SemaHLSLDiagnoseTU.cpp tools/clang/lib/Sema/SemaTemplateInstantiate.cpp tools/clang/tools/dxcompiler/dxcompilerobj.cpp tools/clang/unittests/HLSL/LinkerTest.cpp

View the diff from clang-format here.

diff --git a/lib/DxilValidation/DxilValidation.cpp b/lib/DxilValidation/DxilValidation.cpp
index 9c93b70c..b7c0da8a 100644
--- a/lib/DxilValidation/DxilValidation.cpp
+++ b/lib/DxilValidation/DxilValidation.cpp
@@ -2119,10 +2119,10 @@ static bool IsDxilBuiltinStructType(StructType *ST, hlsl::OP *hlslOP) {
   case 4:
   case 8: // 2 for doubles, 8 for halfs.
     return ST == hlslOP->GetCBufferRetType(EltTy);
-  break;
+    break;
   case 5:
     return ST == hlslOP->GetResRetType(EltTy);
-  break;
+    break;
   default:
     return false;
   }
diff --git a/lib/HLSL/HLOperationLower.cpp b/lib/HLSL/HLOperationLower.cpp
index d218136b..463433c4 100644
--- a/lib/HLSL/HLOperationLower.cpp
+++ b/lib/HLSL/HLOperationLower.cpp
@@ -481,13 +481,12 @@ Value *TrivialDxilOperation(OP::OpCode opcode, ArrayRef<Value *> refArgs,
   return TrivialDxilOperation(opcode, refArgs, Ty, Inst->getType(), hlslOP, B);
 }
 
-
 Value *TrivialDxilVectorOperation(Function *dxilFunc, OP::OpCode opcode,
-                            ArrayRef<Value *> refArgs, Type *Ty,
-                            OP *hlslOP, IRBuilder<> &Builder) {
+                                  ArrayRef<Value *> refArgs, Type *Ty,
+                                  OP *hlslOP, IRBuilder<> &Builder) {
   if (!Ty->isVoidTy()) {
     Value *retVal =
-      Builder.CreateCall(dxilFunc, refArgs, hlslOP->GetOpCodeName(opcode));
+        Builder.CreateCall(dxilFunc, refArgs, hlslOP->GetOpCodeName(opcode));
     return retVal;
   } else {
     // Cannot add name to void.
@@ -495,20 +494,22 @@ Value *TrivialDxilVectorOperation(Function *dxilFunc, OP::OpCode opcode,
   }
 }
 
-
-Value *TrivialDxilVectorUnaryOperationRet(OP::OpCode opcode, Value *src, Type *Ty,
-					  OP *hlslOP, IRBuilder<> &Builder) {
+Value *TrivialDxilVectorUnaryOperationRet(OP::OpCode opcode, Value *src,
+                                          Type *Ty, OP *hlslOP,
+                                          IRBuilder<> &Builder) {
 
   Constant *opArg = hlslOP->GetU32Const((unsigned)opcode);
   Value *args[] = {opArg, src};
 
   Function *dxilFunc = hlslOP->GetOpFunc(opcode, Ty);
 
-  return TrivialDxilVectorOperation(dxilFunc, opcode, args, Ty, hlslOP, Builder);
+  return TrivialDxilVectorOperation(dxilFunc, opcode, args, Ty, hlslOP,
+                                    Builder);
 }
 
-Value *TrivialDxilVectorBinaryOperation(OP::OpCode opcode, Value *src0, Value *src1,
-                                  hlsl::OP *hlslOP, IRBuilder<> &Builder) {
+Value *TrivialDxilVectorBinaryOperation(OP::OpCode opcode, Value *src0,
+                                        Value *src1, hlsl::OP *hlslOP,
+                                        IRBuilder<> &Builder) {
   Type *Ty = src0->getType();
 
   Constant *opArg = hlslOP->GetU32Const((unsigned)opcode);
@@ -516,7 +517,8 @@ Value *TrivialDxilVectorBinaryOperation(OP::OpCode opcode, Value *src0, Value *s
 
   Function *dxilFunc = hlslOP->GetOpFunc(opcode, Ty);
 
-  return TrivialDxilVectorOperation(dxilFunc, opcode, args, Ty, hlslOP, Builder);
+  return TrivialDxilVectorOperation(dxilFunc, opcode, args, Ty, hlslOP,
+                                    Builder);
 }
 
 Value *TrivialDxilUnaryOperationRet(OP::OpCode opcode, Value *src, Type *RetTy,
@@ -545,24 +547,26 @@ Value *TrivialDxilBinaryOperation(OP::OpCode opcode, Value *src0, Value *src1,
   return TrivialDxilOperation(opcode, args, Ty, Ty, hlslOP, Builder);
 }
 
-Value *TrivialDxilTrinaryOperationRet(OP::OpCode opcode, Value *src0, Value *src1,
-				      Value *src2, Type *Ty, hlsl::OP *hlslOP,
-				      IRBuilder<> &Builder) {
+Value *TrivialDxilTrinaryOperationRet(OP::OpCode opcode, Value *src0,
+                                      Value *src1, Value *src2, Type *Ty,
+                                      hlsl::OP *hlslOP, IRBuilder<> &Builder) {
   Constant *opArg = hlslOP->GetU32Const((unsigned)opcode);
   Value *args[] = {opArg, src0, src1, src2};
 
   return TrivialDxilOperation(opcode, args, Ty, Ty, hlslOP, Builder);
 }
 
-Value *TrivialDxilVectorTrinaryOperationRet(OP::OpCode opcode, Value *src0, Value *src1,
-					    Value *src2, Type *Ty, hlsl::OP *hlslOP,
-					    IRBuilder<> &Builder) {
+Value *TrivialDxilVectorTrinaryOperationRet(OP::OpCode opcode, Value *src0,
+                                            Value *src1, Value *src2, Type *Ty,
+                                            hlsl::OP *hlslOP,
+                                            IRBuilder<> &Builder) {
   Constant *opArg = hlslOP->GetU32Const((unsigned)opcode);
   Value *args[] = {opArg, src0, src1, src2};
 
   Function *dxilFunc = hlslOP->GetOpFunc(opcode, Ty);
 
-  return TrivialDxilVectorOperation(dxilFunc, opcode, args, Ty, hlslOP, Builder);
+  return TrivialDxilVectorOperation(dxilFunc, opcode, args, Ty, hlslOP,
+                                    Builder);
 }
 
 Value *TrivialUnaryOperation(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
@@ -577,22 +581,20 @@ Value *TrivialUnaryOperation(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
   return retVal;
 }
 
-Value *TrivialVectorizableUnaryOperation(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
-					 HLOperationLowerHelper &helper,
-					 HLObjectOperationLowerHelper *pObjHelper,
-					 bool &Translated) {
+Value *TrivialVectorizableUnaryOperation(
+    CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
+    HLOperationLowerHelper &helper, HLObjectOperationLowerHelper *pObjHelper,
+    bool &Translated) {
   Value *src0 = CI->getArgOperand(HLOperandIndex::kUnaryOpSrc0Idx);
   Type *Ty = CI->getType();
   IRBuilder<> Builder(CI);
   hlsl::OP *hlslOP = &helper.hlslOP;
 
-  if (Ty->isVectorTy() &&
-      helper.M.GetShaderModel()->IsSM69Plus())
-    return TrivialDxilVectorUnaryOperationRet(opcode, src0, Ty,
-					      hlslOP, Builder);
+  if (Ty->isVectorTy() && helper.M.GetShaderModel()->IsSM69Plus())
+    return TrivialDxilVectorUnaryOperationRet(opcode, src0, Ty, hlslOP,
+                                              Builder);
   else
-    return TrivialDxilUnaryOperationRet(opcode, src0, Ty,
-					hlslOP, Builder);
+    return TrivialDxilUnaryOperationRet(opcode, src0, Ty, hlslOP, Builder);
 }
 
 Value *TrivialBinaryOperation(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
@@ -609,10 +611,11 @@ Value *TrivialBinaryOperation(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
   return binOp;
 }
 
-Value *TrivialVectorBinaryOperation(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
-				    HLOperationLowerHelper &helper,
-				    HLObjectOperationLowerHelper *pObjHelper,
-				    bool &Translated) {
+Value *TrivialVectorBinaryOperation(CallInst *CI, IntrinsicOp IOP,
+                                    OP::OpCode opcode,
+                                    HLOperationLowerHelper &helper,
+                                    HLObjectOperationLowerHelper *pObjHelper,
+                                    bool &Translated) {
   hlsl::OP *hlslOP = &helper.hlslOP;
   Value *src0 = CI->getArgOperand(HLOperandIndex::kBinaryOpSrc0Idx);
   Value *src1 = CI->getArgOperand(HLOperandIndex::kBinaryOpSrc1Idx);
@@ -624,9 +627,9 @@ Value *TrivialVectorBinaryOperation(CallInst *CI, IntrinsicOp IOP, OP::OpCode op
 }
 
 Value *TranslateFMA(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
-		    HLOperationLowerHelper &helper,
-		    HLObjectOperationLowerHelper *pObjHelper,
-		    bool &Translated) {
+                    HLOperationLowerHelper &helper,
+                    HLObjectOperationLowerHelper *pObjHelper,
+                    bool &Translated) {
   hlsl::OP *hlslOP = &helper.hlslOP;
   Type *Ty = CI->getType();
   Value *src0 = CI->getArgOperand(HLOperandIndex::kTrinaryOpSrc0Idx);
@@ -634,11 +637,12 @@ Value *TranslateFMA(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
   Value *src2 = CI->getArgOperand(HLOperandIndex::kTrinaryOpSrc2Idx);
   IRBuilder<> Builder(CI);
 
-  if (Ty->isVectorTy() &&
-      helper.M.GetShaderModel()->IsSM69Plus())
-    return TrivialDxilVectorTrinaryOperationRet(opcode, src0, src1, src2, Ty, hlslOP, Builder);
+  if (Ty->isVectorTy() && helper.M.GetShaderModel()->IsSM69Plus())
+    return TrivialDxilVectorTrinaryOperationRet(opcode, src0, src1, src2, Ty,
+                                                hlslOP, Builder);
   else
-    return TrivialDxilTrinaryOperationRet(opcode, src0, src1, src2, Ty, hlslOP, Builder);
+    return TrivialDxilTrinaryOperationRet(opcode, src0, src1, src2, Ty, hlslOP,
+                                          Builder);
 }
 
 Value *TrivialIsSpecialFloat(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
@@ -1984,15 +1988,16 @@ Value *TranslateClamp(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
 
   IRBuilder<> Builder(CI);
   // min(max(x, minVal), maxVal).
-  if (Ty->isVectorTy() &&
-      helper.M.GetShaderModel()->IsSM69Plus()) {
+  if (Ty->isVectorTy() && helper.M.GetShaderModel()->IsSM69Plus()) {
     Value *maxXMinVal =
-      TrivialDxilVectorBinaryOperation(maxOp, x, minVal, hlslOP, Builder);
-    return TrivialDxilVectorBinaryOperation(minOp, maxXMinVal, maxVal, hlslOP, Builder);
+        TrivialDxilVectorBinaryOperation(maxOp, x, minVal, hlslOP, Builder);
+    return TrivialDxilVectorBinaryOperation(minOp, maxXMinVal, maxVal, hlslOP,
+                                            Builder);
   } else {
     Value *maxXMinVal =
-      TrivialDxilBinaryOperation(maxOp, x, minVal, hlslOP, Builder);
-    return TrivialDxilBinaryOperation(minOp, maxXMinVal, maxVal, hlslOP, Builder);
+        TrivialDxilBinaryOperation(maxOp, x, minVal, hlslOP, Builder);
+    return TrivialDxilBinaryOperation(minOp, maxXMinVal, maxVal, hlslOP,
+                                      Builder);
   }
 }
 
@@ -2306,11 +2311,12 @@ Value *TranslateExp(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
         ConstantVector::getSplat(Ty->getVectorNumElements(), log2eConst);
   }
   val = Builder.CreateFMul(log2eConst, val);
-  if (Ty->isVectorTy() &&
-      helper.M.GetShaderModel()->IsSM69Plus())
-    return TrivialDxilVectorUnaryOperationRet(OP::OpCode::Exp, val, Ty, hlslOP, Builder);
+  if (Ty->isVectorTy() && helper.M.GetShaderModel()->IsSM69Plus())
+    return TrivialDxilVectorUnaryOperationRet(OP::OpCode::Exp, val, Ty, hlslOP,
+                                              Builder);
   else
-    return TrivialDxilUnaryOperationRet(OP::OpCode::Exp, val, Ty, hlslOP, Builder);
+    return TrivialDxilUnaryOperationRet(OP::OpCode::Exp, val, Ty, hlslOP,
+                                        Builder);
 }
 
 Value *TranslateLog(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
@@ -2326,11 +2332,12 @@ Value *TranslateLog(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
     ln2Const = ConstantVector::getSplat(Ty->getVectorNumElements(), ln2Const);
   }
   Value *log = nullptr;
-  if (Ty->isVectorTy() &&
-      helper.M.GetShaderModel()->IsSM69Plus())
-    log = TrivialDxilVectorUnaryOperationRet(OP::OpCode::Log, val, Ty, hlslOP, Builder);
+  if (Ty->isVectorTy() && helper.M.GetShaderModel()->IsSM69Plus())
+    log = TrivialDxilVectorUnaryOperationRet(OP::OpCode::Log, val, Ty, hlslOP,
+                                             Builder);
   else
-    log = TrivialDxilUnaryOperationRet(OP::OpCode::Log, val, Ty, hlslOP, Builder);
+    log =
+        TrivialDxilUnaryOperationRet(OP::OpCode::Log, val, Ty, hlslOP, Builder);
 
   return Builder.CreateFMul(ln2Const, log);
 }
@@ -2390,13 +2397,12 @@ Value *TranslateFUIBinary(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
       break;
     }
   }
-  if (CI->getType()->isVectorTy() &&
-      helper.M.GetShaderModel()->IsSM69Plus())
+  if (CI->getType()->isVectorTy() && helper.M.GetShaderModel()->IsSM69Plus())
     return TrivialVectorBinaryOperation(CI, IOP, opcode, helper, pObjHelper,
-					Translated);
+                                        Translated);
   else
     return TrivialBinaryOperation(CI, IOP, opcode, helper, pObjHelper,
-				  Translated);
+                                  Translated);
 }
 
 Value *TranslateFUITrinary(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
@@ -2421,7 +2427,8 @@ Value *TranslateFUITrinary(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
   Value *src2 = CI->getArgOperand(HLOperandIndex::kTrinaryOpSrc2Idx);
   IRBuilder<> Builder(CI);
 
-  return TrivialDxilTrinaryOperationRet(opcode, src0, src1, src2, Ty, hlslOP, Builder);
+  return TrivialDxilTrinaryOperationRet(opcode, src0, src1, src2, Ty, hlslOP,
+                                        Builder);
 }
 
 Value *TranslateFrexp(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
@@ -2545,9 +2552,8 @@ Value *TrivialDotOperation(OP::OpCode opcode, Value *src0, Value *src1,
 
 // Instead of using a DXIL intrinsic, implement a dot product operation using
 // multiply and add operations. Used for integer dots and long vectors.
-Value *ExpandDot(Value *arg0, Value *arg1, unsigned vecSize,
-		 hlsl::OP *hlslOP, IRBuilder<> &Builder,
-		 bool Unsigned = false) {
+Value *ExpandDot(Value *arg0, Value *arg1, unsigned vecSize, hlsl::OP *hlslOP,
+                 IRBuilder<> &Builder, bool Unsigned = false) {
   auto madOpCode = Unsigned ? DXIL::OpCode::UMad : DXIL::OpCode::IMad;
   if (arg0->getType()->getScalarType()->isFloatingPointTy())
     madOpCode = DXIL::OpCode::FMad;
@@ -2557,8 +2563,8 @@ Value *ExpandDot(Value *arg0, Value *arg1, unsigned vecSize,
   for (unsigned Elt = 1; Elt < vecSize; ++Elt) {
     Elt0 = Builder.CreateExtractElement(arg0, Elt);
     Elt1 = Builder.CreateExtractElement(arg1, Elt);
-    Result = TrivialDxilTrinaryOperationRet(madOpCode, Elt0, Elt1, Result, Elt0->getType(), hlslOP,
-					    Builder);
+    Result = TrivialDxilTrinaryOperationRet(madOpCode, Elt0, Elt1, Result,
+                                            Elt0->getType(), hlslOP, Builder);
   }
 
   return Result;
@@ -2596,11 +2602,12 @@ Value *TranslateDot(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
   unsigned vecSize = Ty->getVectorNumElements();
   Value *arg1 = CI->getArgOperand(HLOperandIndex::kBinaryOpSrc1Idx);
   IRBuilder<> Builder(CI);
-  if (Ty->getScalarType()->isFloatingPointTy() && Ty->getVectorNumElements() <= 4) {
+  if (Ty->getScalarType()->isFloatingPointTy() &&
+      Ty->getVectorNumElements() <= 4) {
     return TranslateFDot(arg0, arg1, vecSize, hlslOP, Builder);
   } else {
     return ExpandDot(arg0, arg1, vecSize, hlslOP, Builder,
-                         IOP == IntrinsicOp::IOP_udot);
+                     IOP == IntrinsicOp::IOP_udot);
   }
 }
 
@@ -2783,8 +2790,9 @@ Value *TranslateMSad4(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
   byteSrc = Builder.CreateInsertElement(byteSrc, byteSrcElt, 3);
 
   // Msad on vecref and byteSrc.
-  return TrivialDxilTrinaryOperationRet(DXIL::OpCode::Msad, vecRef, byteSrc, accum,
-					vecRef->getType(), hlslOP, Builder);
+  return TrivialDxilTrinaryOperationRet(DXIL::OpCode::Msad, vecRef, byteSrc,
+                                        accum, vecRef->getType(), hlslOP,
+                                        Builder);
 }
 
 Value *TranslateRCP(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
@@ -3167,7 +3175,7 @@ Value *TranslateMul(CallInst *CI, IntrinsicOp IOP, OP::OpCode opcode,
         return TranslateFDot(arg0, arg1, vecSize, hlslOP, Builder);
       } else {
         return ExpandDot(arg0, arg1, vecSize, hlslOP, Builder,
-                             IOP == IntrinsicOp::IOP_umul);
+                         IOP == IntrinsicOp::IOP_umul);
       }
     } else {
       // mul(vector, scalar) == vector * scalar-splat
@@ -4187,8 +4195,7 @@ ResLoadHelper::ResLoadHelper(CallInst *CI, DxilResource::Kind RK,
         status = CI->getArgOperand(kStatusIdx);
     }
   } else {
-    if (opcode == OP::OpCode::RawBufferLoad &&
-        CI->getType()->isVectorTy() &&
+    if (opcode == OP::OpCode::RawBufferLoad && CI->getType()->isVectorTy() &&
         CI->getType()->getVectorNumElements() > 1 &&
         CI->getModule()->GetHLModule().GetShaderModel()->IsSM69Plus())
       opcode = OP::OpCode::RawBufferVectorLoad;
@@ -4306,7 +4313,6 @@ static SmallVector<Value *, 12> GetBufLoadArgs(ResLoadHelper helper,
       // RawBufferVectorLoad takes no mask argument.
       Args.emplace_back(alignmentVal); // alignment @4
     }
-
   }
   return Args;
 }
@@ -4375,11 +4381,11 @@ Value *TranslateBufLoad(ResLoadHelper &helper, HLResource::Kind RK,
         if (RK == DxilResource::Kind::RawBuffer)
           // Raw buffers can't use offset param. Add to coord index.
           Args[kCoordIdx] =
-            Builder.CreateAdd(Args[kCoordIdx], OP->GetU32Const(4 * LdSize));
+              Builder.CreateAdd(Args[kCoordIdx], OP->GetU32Const(4 * LdSize));
         else
           // Structured buffers increment the offset parameter.
           Args[kOffsetIdx] =
-            Builder.CreateAdd(Args[kOffsetIdx], OP->GetU32Const(4 * LdSize));
+              Builder.CreateAdd(Args[kOffsetIdx], OP->GetU32Const(4 * LdSize));
       }
     }
     retValNew = ScalarizeElements(Ty, elts, Builder);
@@ -6505,7 +6511,8 @@ IntrinsicLower gLowerTable[] = {
     {IntrinsicOp::IOP_asint16, TranslateBitcast, DXIL::OpCode::NumOpCodes},
     {IntrinsicOp::IOP_asuint, TranslateAsUint, DXIL::OpCode::SplitDouble},
     {IntrinsicOp::IOP_asuint16, TranslateAsUint, DXIL::OpCode::NumOpCodes},
-    {IntrinsicOp::IOP_atan, TrivialVectorizableUnaryOperation, DXIL::OpCode::Atan},
+    {IntrinsicOp::IOP_atan, TrivialVectorizableUnaryOperation,
+     DXIL::OpCode::Atan},
     {IntrinsicOp::IOP_atan2, TranslateAtan2, DXIL::OpCode::NumOpCodes},
     {IntrinsicOp::IOP_ceil, TrivialUnaryOperation, DXIL::OpCode::Round_pi},
     {IntrinsicOp::IOP_clamp, TranslateClamp, DXIL::OpCode::NumOpCodes},
@@ -6596,7 +6603,8 @@ IntrinsicLower gLowerTable[] = {
     {IntrinsicOp::IOP_sqrt, TrivialUnaryOperation, DXIL::OpCode::Sqrt},
     {IntrinsicOp::IOP_step, TranslateStep, DXIL::OpCode::NumOpCodes},
     {IntrinsicOp::IOP_tan, TrivialUnaryOperation, DXIL::OpCode::Tan},
-    {IntrinsicOp::IOP_tanh, TrivialVectorizableUnaryOperation, DXIL::OpCode::Htan},
+    {IntrinsicOp::IOP_tanh, TrivialVectorizableUnaryOperation,
+     DXIL::OpCode::Htan},
     {IntrinsicOp::IOP_tex1D, EmptyLower, DXIL::OpCode::NumOpCodes},
     {IntrinsicOp::IOP_tex1Dbias, EmptyLower, DXIL::OpCode::NumOpCodes},
     {IntrinsicOp::IOP_tex1Dgrad, EmptyLower, DXIL::OpCode::NumOpCodes},
@@ -8464,9 +8472,9 @@ void TranslateStructBufSubscript(CallInst *CI, Value *handle, Value *status,
 namespace {
 
 Value *TranslateTypedBufSubscript(CallInst *CI, DXIL::ResourceKind RK,
-                             DXIL::ResourceClass RC, Value *handle,
-                             LoadInst *ldInst, IRBuilder<> &Builder,
-                             hlsl::OP *hlslOP, const DataLayout &DL) {
+                                  DXIL::ResourceClass RC, Value *handle,
+                                  LoadInst *ldInst, IRBuilder<> &Builder,
+                                  hlsl::OP *hlslOP, const DataLayout &DL) {
   // The arguments to the call instruction are used to determine the access,
   // the return value and type come from the load instruction.
   ResLoadHelper ldHelper(CI, RK, RC, handle, IntrinsicOp::MOP_Load, ldInst);
@@ -8514,8 +8522,8 @@ Value *UpdateVectorElt(Value *VecVal, Value *EltVal, Value *EltIdx,
 }
 
 void TranslateTypedBufferSubscript(CallInst *CI, HLOperationLowerHelper &helper,
-                               HLObjectOperationLowerHelper *pObjHelper,
-                               bool &Translated) {
+                                   HLObjectOperationLowerHelper *pObjHelper,
+                                   bool &Translated) {
   Value *ptr = CI->getArgOperand(HLOperandIndex::kSubscriptObjectOpIdx);
 
   hlsl::OP *hlslOP = &helper.hlslOP;
@@ -8533,7 +8541,7 @@ void TranslateTypedBufferSubscript(CallInst *CI, HLOperationLowerHelper &helper,
     Value *UndefI = UndefValue::get(Builder.getInt32Ty());
     if (LoadInst *ldInst = dyn_cast<LoadInst>(user)) {
       TranslateTypedBufSubscript(CI, RK, RC, handle, ldInst, Builder, hlslOP,
-                            helper.dataLayout);
+                                 helper.dataLayout);
     } else if (StoreInst *stInst = dyn_cast<StoreInst>(user)) {
       Value *val = stInst->getValueOperand();
       TranslateStore(RK, handle, val,
diff --git a/tools/clang/lib/Sema/SemaHLSL.cpp b/tools/clang/lib/Sema/SemaHLSL.cpp
index 555b0ba4..1824b7c1 100644
--- a/tools/clang/lib/Sema/SemaHLSL.cpp
+++ b/tools/clang/lib/Sema/SemaHLSL.cpp
@@ -1017,16 +1017,19 @@ static const ArBasicKind g_UIntCT[] = {AR_BASIC_UINT32, AR_BASIC_LITERAL_INT,
 // AR_BASIC_INT32 should be the default for any int since min precision integers
 // should map to int32, not int16 or int64
 static const ArBasicKind g_AnyIntCT[] = {
-    AR_BASIC_INT32, AR_BASIC_INT16,  AR_BASIC_UINT32,      AR_BASIC_UINT16,
-    AR_BASIC_INT64, AR_BASIC_UINT64, AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED,
-    AR_BASIC_LITERAL_INT, AR_BASIC_UNKNOWN};
+    AR_BASIC_INT32,        AR_BASIC_INT16,         AR_BASIC_UINT32,
+    AR_BASIC_UINT16,       AR_BASIC_INT64,         AR_BASIC_UINT64,
+    AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, AR_BASIC_LITERAL_INT,
+    AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_AnyInt32CT[] = {
-  AR_BASIC_INT32, AR_BASIC_UINT32, AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, AR_BASIC_LITERAL_INT, AR_BASIC_UNKNOWN};
+    AR_BASIC_INT32,         AR_BASIC_UINT32,      AR_BASIC_INT8_4PACKED,
+    AR_BASIC_UINT8_4PACKED, AR_BASIC_LITERAL_INT, AR_BASIC_UNKNOWN};
 
-static const ArBasicKind g_UIntOnlyCT[] = {AR_BASIC_UINT32, AR_BASIC_UINT64,AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, 
-                                           AR_BASIC_LITERAL_INT,
-                                           AR_BASIC_NOCAST, AR_BASIC_UNKNOWN};
+static const ArBasicKind g_UIntOnlyCT[] = {
+    AR_BASIC_UINT32,        AR_BASIC_UINT64,      AR_BASIC_INT8_4PACKED,
+    AR_BASIC_UINT8_4PACKED, AR_BASIC_LITERAL_INT, AR_BASIC_NOCAST,
+    AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_FloatCT[] = {
     AR_BASIC_FLOAT32, AR_BASIC_FLOAT32_PARTIAL_PRECISION,
@@ -1064,20 +1067,21 @@ static const ArBasicKind g_NumericCT[] = {
     AR_BASIC_UINT16,        AR_BASIC_UINT32,
     AR_BASIC_MIN12INT,      AR_BASIC_MIN16INT,
     AR_BASIC_MIN16UINT,     AR_BASIC_INT64,
-    AR_BASIC_UINT64,        AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, AR_BASIC_UNKNOWN};
+    AR_BASIC_UINT64,        AR_BASIC_INT8_4PACKED,
+    AR_BASIC_UINT8_4PACKED, AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_Numeric32CT[] = {
     AR_BASIC_FLOAT32,       AR_BASIC_FLOAT32_PARTIAL_PRECISION,
     AR_BASIC_LITERAL_FLOAT, AR_BASIC_LITERAL_INT,
     AR_BASIC_INT32,         AR_BASIC_UINT32,
-AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED,     
+    AR_BASIC_INT8_4PACKED,  AR_BASIC_UINT8_4PACKED,
     AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_Numeric32OnlyCT[] = {
     AR_BASIC_FLOAT32,       AR_BASIC_FLOAT32_PARTIAL_PRECISION,
     AR_BASIC_LITERAL_FLOAT, AR_BASIC_LITERAL_INT,
     AR_BASIC_INT32,         AR_BASIC_UINT32,
-AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, 
+    AR_BASIC_INT8_4PACKED,  AR_BASIC_UINT8_4PACKED,
     AR_BASIC_NOCAST,        AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_AnyCT[] = {
@@ -1090,7 +1094,7 @@ static const ArBasicKind g_AnyCT[] = {
     AR_BASIC_MIN12INT,      AR_BASIC_MIN16INT,
     AR_BASIC_MIN16UINT,     AR_BASIC_BOOL,
     AR_BASIC_INT64,         AR_BASIC_UINT64,
-AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, 
+    AR_BASIC_INT8_4PACKED,  AR_BASIC_UINT8_4PACKED,
     AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_AnySamplerCT[] = {
@@ -1153,10 +1157,10 @@ static const ArBasicKind g_Numeric16OnlyCT[] = {
     AR_BASIC_LITERAL_FLOAT, AR_BASIC_LITERAL_INT, AR_BASIC_NOCAST,
     AR_BASIC_UNKNOWN};
 
-static const ArBasicKind g_Int32OnlyCT[] = {AR_BASIC_INT32, AR_BASIC_UINT32,
-AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, 
-                                            AR_BASIC_LITERAL_INT,
-                                            AR_BASIC_NOCAST, AR_BASIC_UNKNOWN};
+static const ArBasicKind g_Int32OnlyCT[] = {
+    AR_BASIC_INT32,         AR_BASIC_UINT32,      AR_BASIC_INT8_4PACKED,
+    AR_BASIC_UINT8_4PACKED, AR_BASIC_LITERAL_INT, AR_BASIC_NOCAST,
+    AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_Float32OnlyCT[] = {
     AR_BASIC_FLOAT32, AR_BASIC_LITERAL_FLOAT, AR_BASIC_NOCAST,
@@ -1178,14 +1182,13 @@ static const ArBasicKind g_UInt8_4PackedCT[] = {
     AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_AnyInt16Or32CT[] = {
-    AR_BASIC_INT32,  AR_BASIC_UINT32,      AR_BASIC_INT16,
-    AR_BASIC_UINT16, 
-AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, AR_BASIC_LITERAL_INT, AR_BASIC_UNKNOWN};
+    AR_BASIC_INT32,       AR_BASIC_UINT32,       AR_BASIC_INT16,
+    AR_BASIC_UINT16,      AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED,
+    AR_BASIC_LITERAL_INT, AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_SInt16Or32OnlyCT[] = {
-    AR_BASIC_INT32, AR_BASIC_INT16, AR_BASIC_LITERAL_INT, 
-AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, 
-AR_BASIC_NOCAST,
+    AR_BASIC_INT32,        AR_BASIC_INT16,         AR_BASIC_LITERAL_INT,
+    AR_BASIC_INT8_4PACKED, AR_BASIC_UINT8_4PACKED, AR_BASIC_NOCAST,
     AR_BASIC_UNKNOWN};
 
 static const ArBasicKind g_ByteAddressBufferCT[] = {
@@ -8619,7 +8622,6 @@ ExprResult HLSLExternalSource::LookupVectorMemberExprForHLSL(
     llvm_unreachable("Unknown VectorMemberAccessError value");
   }
 
-
   if (colCount > 4)
     msg = diag::err_hlsl_vector_member_on_long_vector;

Check this box to apply formatting changes to this branch.

VladM1076 · 2025-03-20T21:03:15Z

Hi @pow2clk, is this the latest PR for LongVectors and is longvecs.hlsl (tools/clang/test/CodeGenDXIL/hlsl/types/longvecs.hlsl) failure in DXC expected?

I am using this to start testing things on our side and want to make sure I am testing the right stuff.

This change makes hlsl::IntrinsicOp enum values stable by: - adding hlsl_intrinsic_opcodes.json to capture assigned indices - adds this to the files generated by hctgen - generation assigns new indices after the last index - hlsl::IntrinsicOp enum values have explicit assignments - removes ENABLE_SPIRV_CODEGEN ifdefs around opcode definitions and lowering table entries to keep these stable whether or not the spirv build setting is enabled. Fixes microsoft#7230

This is just the diffs for the new version that was meant to fix the warnings

- Apply fixes

pow2clk and others added 30 commits February 17, 2025 21:49

Produce errors for long vectors in invalid contexts

e010223

Disallow long vectors, and arrays or structs containing long vectors in cbuffers, entry functions, node records, tessellation patchs, or special intrinsic parameters with user-defined struct parameters.

fix assert for tesselation patch template args

cd72abe

This got lost somewhere

Merge remote-tracking branch 'refs/remotes/origin/main' into longvec_…

4d5c2a2

…decls_pr

Reaname and consolidate longvecs tests

76dde0d

Go for consistent test filename formatting. most LLVM tests have dashes, so dashes it is. Remove redundant sm68 test

clang-format

fb6538e

chore: autopublish 2025-03-05T20:50:11Z

466bb14

Use constant vector limit value for cached types

20c2609

Vector types can be cached in a 2D array that has a column for lenghts 1-4. This uses the added contant to indicate the length and for the checks that confirm it isn't exceeded.

clang-format

eedab25

Respond to feedback from a different PR

e9cf3d2

Allow structuredbuf load lowering in new load lowering code

d01056b

add scalars test for load/store

9287299

refactor GenerateRawBufLd and TranslateBufLd interface

69b7cf7

Clarify some naming and reduce some redundancy between GenerateRawBufLd and TranslateBufLoad. Part of this involved passing the correct vector of i32 type for loading boolean vectors in sm69 raw buffer loads.

Enable native vector DXIL intrinsic overload for vector load/store

a90b420

Add a new native vector overload type to DXIL intrinsics and the corresponding generation. Add new raw buffer vector load/store intrinsics that use that overload type.

Generate native vector raw buffers load/stores

50ea061

When the loaded/stored type is a vector of more than 1 element, the shader model is 6.9 or higher, and the operation is on a raw buffer, enable the generation of a native vector raw buffer load or store.

Add sm69 load store test

b90a97d

Consolidate buffer store translation

faa7dbd

Added structured buffer support to TranslateStore and used it for all such lowerings.

actually allow the given ops to take vectors

35e39f4

add vector overload type and apply to the relevant builtins for 6.9, don't lower vectors for exp lowering. ugly fix for opcode reordering note that still need to clean up loading from BABuffers too

pow2clk added 5 commits March 10, 2025 07:53

get min/max/clamp working

2051eb5

Disallow swizzles on long vectors

a41e0a6

Make dot product work for long vecs

f88e010

Support dot product on long vecs by expanding the inrinsic into mul/mad ops like is done with integer dot products

Allow vectors on or/and intrinsics

13ae80f

Since the or() and and() intrinsics did their own scalarization, the or/and operators would never be applied to full vectors. This leaves the scalarization for the scalarization pass, which will skip it for 6.9

github-project-automation bot added this to HLSL Roadmap Mar 10, 2025

github-project-automation bot moved this to New in HLSL Roadmap Mar 10, 2025

Enable native vector rawbuffer stores

e1aca97

damyanp mentioned this pull request Mar 13, 2025

Long vector test plan microsoft/hlsl-specs#421

Open

pow2clk and others added 10 commits March 25, 2025 18:00

clumsy addition of packed types

ed52646

Add extended and vector overload support for DXIL

aebbbc8

update intrinsics to new vector param notation

b06f5f4

regenerate dxiloperations

4e94388

REVISED: Add extended and vector overload support for DXIL

645e24b

This is just the diffs for the new version that was meant to fix the warnings

regenerate DxilOperations.cpp

9fb3d83

Fix warnings about missing braces in clang

0bc07ce

update Dxiloperations.cpp again

5bf2d91

Update extended and vector overload support for DXIL

5674a18

- Apply fixes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Longvec mega change #7188

Longvec mega change #7188

pow2clk commented Mar 10, 2025

github-actions bot commented Mar 10, 2025 •

edited

Loading

VladM1076 commented Mar 20, 2025

Longvec mega change #7188

Are you sure you want to change the base?

Longvec mega change #7188

Conversation

pow2clk commented Mar 10, 2025

github-actions bot commented Mar 10, 2025 • edited Loading

VladM1076 commented Mar 20, 2025

github-actions bot commented Mar 10, 2025 •

edited

Loading