Skip to content

[HLSL][Docs] Add documentation for HLSL functions #75397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 10, 2024

Conversation

llvm-beanz
Copy link
Collaborator

This adds a new document that covers the HLSL approach to function calls and parameter semantics. At time of writing this document is a proposal for the implementation.

@llvmbot llvmbot added clang Clang issues not falling into any other category HLSL HLSL Language Support labels Dec 13, 2023
@llvmbot
Copy link
Member

llvmbot commented Dec 13, 2023

@llvm/pr-subscribers-hlsl

@llvm/pr-subscribers-clang

Author: Chris B (llvm-beanz)

Changes

This adds a new document that covers the HLSL approach to function calls and parameter semantics. At time of writing this document is a proposal for the implementation.


Full diff: https://github.com/llvm/llvm-project/pull/75397.diff

2 Files Affected:

  • (added) clang/docs/HLSL/FunctionCalls.rst (+300)
  • (modified) clang/docs/HLSL/HLSLDocs.rst (+1)
diff --git a/clang/docs/HLSL/FunctionCalls.rst b/clang/docs/HLSL/FunctionCalls.rst
new file mode 100644
index 00000000000000..aeddac087cb89d
--- /dev/null
+++ b/clang/docs/HLSL/FunctionCalls.rst
@@ -0,0 +1,300 @@
+===================
+HLSL Function Calls
+===================
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+This document descries the design and implementation of HLSL's function call
+semantics in Clang. This includes details related to argument conversion and
+parameter lifetimes.
+
+This document does not seek to serve as official documentation for HLSL's
+call semantics, but does provide an overview to assist a reader. The
+authoritative documentation for HLSL's language semantics is the `draft language
+specification<https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf>`_.
+
+Argument Semantics
+==================
+
+In HLSL, all function arguments are passed by value in and out of functions.
+HLSL has 3 keywords which denote the parameter semantics (``in``, ``out`` and
+``inout``). In a function declaration a parameter may be annotated any of the
+following ways:
+
+#. <no parameter annotation> - denotes input
+#. ``in`` - denotes input
+#. ``out`` - denotes output
+#. ``in out`` - denotes input and output
+#. ``out in`` - denotes input and output
+#. ``inout`` - denotes input and output
+
+Parameters that are exclusively input behave like C/C++ parameters that are
+passed by value.
+
+For parameters that are output (or input and output), a temporary value is
+created in the caller. The temporary value is then passed by-address. For
+output-only parameters, the temporary is uninitialized when passed (it is
+undefined behavior to not explicitly initialize an ``out`` parameter inside a
+function). For input and output parameters, the temporary is initialized from
+the lvalue argument expression through implicit or explicit casting from the
+lvalue argument type to the parameter type.
+
+On return of the function, the values of any parameter temporaries are written
+back to the argument expression through an inverted conversion sequence (if an
+``out`` parameter was not initialized in the function, the uninitialized value
+may be written back).
+
+Parameters of constant-sized array type, are also passed with value semantics.
+This requires input parameters of arrays to construct temporaries and the
+temporaries go through array-to-pointer decay when initializing parameters.
+
+Implementations are allowed to avoid unnecessary temporaries, and HLSL's strict
+no-alias rules can enable some trivial optimizations.
+
+Array Temporaries
+-----------------
+
+Given the following example:
+
+.. code-block:: c++
+  void fn(float a[4]) {
+    a[0] = a[1] + a[2] + a[3];
+  }
+
+  float4 main() : SV_Target {
+    float arr[4] = {1, 1, 1, 1};
+    fn(arr);
+    return float4(a[0], a[1], a[2], a[3]);
+  }
+
+In C or C++, the array parameter decays to a pointer, so after the call to
+``fn``, the value of ``a[0]`` is ``3``. In HLSL, the array is passed by value,
+so modifications inside ``fn`` do not propagate out.
+
+.. note::
+  DXC supports unsized arrays passed directly as decayed pointers, which is an
+  unfortunate behavior divergence.
+
+Out Parameter Temporaries
+-------------------------
+
+.. code-block:: c++
+  void Init(inout int X, inout int Y) {
+    Y = 2;
+    X = 1;
+  }
+
+  void main() {
+    int V;
+    Init(V, V); // MSVC ABI V == 2, Itanium V == 1
+  }
+
+In the above example the ``Init`` function's behavior depends on the C++ ABI
+implementation. In the MSVC C++ ABI (used for the HLSL DXIL target), call
+arguments are emitted right-to-left and destroyed left-to-right. This means that
+the parameter initialization and destruction occurs in the order: {``Y``,
+``X``, ``~X``, ``~Y``}. This causes the write-back of the value of ``Y`` to occur
+last, so the resulting value of ``V`` is ``2``. In the Itanium C++ ABI, the
+parameter ordering is reversed, so the initialization and destruction occurs in
+the order: {``X``, ``Y``, ``~Y``, ``X``}. This causes the write-back of the
+value ``X`` to occur last, resulting in the value of ``V`` being set to ``1``.
+
+.. code-block:: c++
+  void Trunc(inout int3 V) { }
+
+
+  void main() {
+    float3 F = {1.5, 2.6, 3.3};
+    Trunc(F); // F == {1.0, 2.0, 3.0}
+  }
+
+In the above example, the argument expression ``F`` undergoes element-wise
+conversion from a float vector to an integer vector to create a temporary
+``int3``. On expiration the temporary undergoes elementwise conversion back to
+the floating point vector type ``float3``. This results in an implicit
+truncation of the vector even if the value is unused in the function.
+
+
+.. code-block:: c++
+  void UB(out int X) {}
+
+  void main() {
+    int X = 7;
+    UB(X); // X is undefined!
+  }
+
+In this example an initialized value is passed to an ``out`` parameter.
+Parameters marked ``out`` are not initialized by the argument expression or
+implicitly by the function. They must be explicitly initialized. In this case
+the argument is not initialized in the function so the temporary is still
+uninitialized when it is copied back to the argument expression. This is
+undefined behavior in HLSL, and may be illegal in generated programs.
+
+Clang Implementation 
+====================
+
+.. note::
+  The implementation described here is a proposal. It has not yet been fully
+  implemented, so the current state of Clang's sources may not reflect this
+  design. A prototype implementation was built on DXC which is Clang-3.7 based.
+  The prototype can be found
+  `here<https://github.com/microsoft/DirectXShaderCompiler/pull/5249>`_. A lot
+  of the changes in the prototype implementation are restoring Clang-3.7 code
+  that was previously modified to its original state.
+
+The implementation in clang depends on two new AST nodes and minor extensions to
+Clang's existing support for Objective-C write-back arguments. The goal of this
+design is to capture the semantic details of HLSL function calls in the AST, and
+minimize the amount of magic that needs to occur during IR generation.
+
+The two new AST nodes are ``HLSLArrayTemporaryExpr`` and ``HLSLOutParamExpr``,
+which respectively represent the temporaries used for passing arrays by value
+and the temporaries created for function outputs.
+
+Array Temporaries
+-----------------
+
+The ``HLSLArrayTemporaryExpr`` represents temporary values for input
+constant-sized array arguments. This applies for all constant-sized array
+arguments regardless of whether or not the parameter is constant-sized or
+unsized.
+
+.. code-block::c++
+  void SizedArray(float a[4]);
+  void UnsizedArray(float a[]);
+
+  void main() {
+    float arr[4] = {1, 1, 1, 1};
+    SizedArray(arr);
+    UnsizedArray(arr);
+  }
+
+In the example above, the following AST is generated for the call to
+``SizedArray``:
+
+.. code-block:: text
+  CallExpr 'void'
+  |-ImplicitCastExpr 'void (*)(float [4])' <FunctionToPointerDecay>
+  | `-DeclRefExpr 'void (float [4])' lvalue Function 'SizedArray' 'void (float [4])'
+  `-HLSLArrayTemporaryExpr 'float [4]'
+    `-DeclRefExpr 'float [4]' lvalue Var 'arr' 'float [4]'
+
+In the example above, the following AST is generated for the call to
+``UnsizedArray``:
+
+.. code-block:: text
+  CallExpr 'void'
+  |-ImplicitCastExpr 'void (*)(float [])' <FunctionToPointerDecay>
+  | `-DeclRefExpr 'void (float [])' lvalue Function 'UnsizedArray' 'void (float [])'
+  `-HLSLArrayTemporaryExpr 'float [4]'
+    `-DeclRefExpr 'float [4]' lvalue Var 'arr' 'float [4]'
+
+In both of these cases the argument expression is of known array size so we can
+initialize an appropriately sized temporary.
+
+It is illegal in HLSL to convert an unsized array to a sized array:
+
+.. code-block::c++
+  void SizedArray(float a[4]);
+  void UnsizedArray(float a[]) {
+    SizedArray(a); // Cannot convert float[] to float[4]
+  }
+
+When converting a sized array to an unsized array, an array temporary can also
+be inserted. Given the following code:
+
+.. code-block::c++
+  void UnsizedArray(float a[]);
+  void SizedArray(float a[4]) {
+    UnsizedArray(a);
+  }
+
+An expected AST should be something like:
+
+.. code-block:: text
+  CallExpr 'void'
+  |-ImplicitCastExpr 'void (*)(float [])' <FunctionToPointerDecay>
+  | `-DeclRefExpr 'void (float [])' lvalue Function 'UnsizedArray' 'void (float [])'
+  `-HLSLArrayTemporaryExpr 'float [4]'
+    `-DeclRefExpr 'float [4]' lvalue Var 'arr' 'float [4]'
+
+Out Parameter Temporaries
+-------------------------
+
+Output parameters are defined in HLSL as *casting expiring values* (cx-values),
+which is a term made up for HLSL. A cx-value is a temporary value which may be
+the result of a cast, and stores its value back to an lvalue when the value
+expires.
+
+To represent this concept in Clang we introduce a new ``HLSLOutParamExpr``. An
+``HLSLOutParamExpr`` has two forms, one with a single sub-expression and one
+with two sub-expressions.
+
+The single sub-expression form is used when the argument expression and the
+function parameter are the same type, so no cast is required. As in this
+example:
+
+.. code-block::c++
+  void Init(inout int X) {
+    X = 1;
+  }
+
+  void main() {
+    int V;
+    Init(V);
+  }
+
+The expected AST formulation for this code would be something like:
+
+.. code-block::text
+  CallExpr 'void'
+  |-ImplicitCastExpr 'void (*)(int &)' <FunctionToPointerDecay>
+  | `-DeclRefExpr 'void (int &)' lvalue Function  'Init' 'void (int &)'
+  |-HLSLOutParamExpr 'int' lvalue inout
+    `-DeclRefExpr 'int' lvalue Var 'V' 'int'
+
+The ``HLSLOutParamExpr`` captures that the value is ``inout`` vs ``out`` to
+denote whether or not the temporary is initialized from the sub-expression. If
+no casting is required the sub-expression denotes the lvalue expression that the
+cx-value will be copied to when the value expires.
+
+The two sub-expression form of the AST node is required when the argument type
+is not the same as the parameter type. Given this example:
+
+.. code-block:: c++
+  void Trunc(inout int3 V) { }
+
+
+  void main() {
+    float3 F = {1.5, 2.6, 3.3};
+    Trunc(F);
+  }
+
+For this case the ``HLSLOutParamExpr`` will have sub-expressions to record both
+casting expression sequences for the initialization and write back:
+
+.. code-block::text
+  -CallExpr 'void'
+    |-ImplicitCastExpr 'void (*)(int3 &)' <FunctionToPointerDecay>
+    | `-DeclRefExpr 'void (int3 &)' lvalue Function 'inc_i32' 'void (int3 &)'
+    `-HLSLOutParamExpr 'int3' lvalue inout
+      |-ImplicitCastExpr 'float3' <IntegralToFloating>
+      | `-ImplicitCastExpr 'int3' <LValueToRValue>
+      |   `-OpaqueValueExpr 'int3' lvalue
+      `-ImplicitCastExpr 'int3' <FloatingToIntegral>
+        `-ImplicitCastExpr 'float3' <LValueToRValue>
+          `-DeclRefExpr 'float3' lvalue 'F' 'float3'
+
+In this formation the write-back casts are captured as the first sub-expression
+and they cast from an ``OpaqueValueExpr``. In IR generation we can use the
+``OpaqueValueExpr`` as a placeholder for the ``HLSLOutParamExpr``'s temporary
+value on function return.
+
+In code generation this can be implemented with some targeted extensions to the
+Objective-C write-back support. Specifically extending CGCall.cpp's
+``EmitWriteback`` function to support casting expressions and emission of
+aggregate lvalues.
diff --git a/clang/docs/HLSL/HLSLDocs.rst b/clang/docs/HLSL/HLSLDocs.rst
index a02dd2e8a96261..1f232129548d0b 100644
--- a/clang/docs/HLSL/HLSLDocs.rst
+++ b/clang/docs/HLSL/HLSLDocs.rst
@@ -14,3 +14,4 @@ HLSL Design and Implementation
    HLSLIRReference
    ResourceTypes
    EntryFunctions
+   FunctionCalls

This adds a new document that covers the HLSL approach to function
calls and parameter semantics. At time of writing this document is a
proposal for the implementation.
@llvm-beanz llvm-beanz force-pushed the cbieneman/hlsl-functions-docs branch from d6f57c2 to 4ebc0c5 Compare December 13, 2023 23:13
Introduction
============

This document descries the design and implementation of HLSL's function call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: descries -> describes

}

In the above example the ``Init`` function's behavior depends on the C++ ABI
implementation. In the MSVC C++ ABI (used for the HLSL DXIL target), call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this defined behavior that we have to ensure continues to work so HLSL developers targeting DXIL can depend on it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we explicitly define it, but we need the Clang DXIL generation to match otherwise we could expose subtle bugs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly me being pedantic, but I don't believe this ordering is specified in the Itanium ABI at all (and I don't see anything obvious here: https://itanium-cxx-abi.github.io/cxx-abi/abi.html). I'm not sure about the MSVC ABI. Would it be more accurate to point out that C++ has historically not defined this ordering, describe the two alternatives, and point out that DXC behaves in the MSVC way and so we want to match that?

Copy link
Contributor

@rjmccall rjmccall Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is a general compiler difference and not actually ABI-specific except that the MSVC C++ ABI does effectively require that arguments of class type are evaluated right-to-left (because they're destroyed left-to-right in the callee).

Justin's comment captures the right spirit here, I think. The order in which write-backs occur can definitely be semantically important, and while we could argue that it's UB to depend on it, in practice we probably just need to imitate current implementations.

UnsizedArray(arr);
}

In the example above, the following AST is generated for the call to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the rst renderer is confused here and thinks that line 176 is still part of the code block? Unsure if this is a real problem with the source or a bug in the renderer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the code-blocks formatted wrong... rst needs a blank line there. Update coming.

Comment on lines 40 to 42
output-only parameters, the temporary is uninitialized when passed (it is
undefined behavior to not explicitly initialize an ``out`` parameter inside a
function). For input and output parameters, the temporary is initialized from
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it is undefined behavior to not explicitly initialize an out parameter inside a function)

Perhaps it's better to say that an out parameter has an undefined value if not initialized inside the function, rather than this being considered undefined behavior?

float4 main() : SV_Target {
float arr[4] = {1, 1, 1, 1};
fn(arr);
return float4(a[0], a[1], a[2], a[3]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return float4(a[0], a[1], a[2], a[3]); references the wrong array. It should be arr[0], ..., right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doh!

}

In C or C++, the array parameter decays to a pointer, so after the call to
``fn``, the value of ``a[0]`` is ``3``. In HLSL, the array is passed by value,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this say: "the value of arr[0] is 3"? (wrong array again)

Comment on lines 81 to 82
DXC supports unsized arrays passed directly as decayed pointers, which is an
unfortunate behavior divergence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not reliably. This area is weird and buggy, in both DXC and FXC. This was originally supported in DXC because it appeared that FXC supported it. However, support of unsized array parameters in FXC was limited to resource arrays (and only on SM 5.1). These can also be buggy in FXC, with differences in behavior between SM 5.0 and SM 5.1.

In DXC, use of unsized arrays can lead to asserts or crashes. The primary area where they are likely to be useful is when passing an unbounded resource array to a function, where you do not want a copy of the resource handles, but want the original resource range declared globally to be passed to a function and indexed there. This scenario works for SM 5.1 in FXC, but appears to assert in DXC.

For these reasons, we should consider removing support for them in DXC.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should remove these from the language, but I think our current implementation needs to assume that they are around since... they are.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed an issue on HLSL Specs to track considering approaches for unsized arrays at the language level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the note should say something like "DXC may pass unsized arrays directly as decayed pointers, which is an unfortunate behavior divergence" then, rather than claiming support?

Comment on lines 122 to 123
the floating point vector type ``float3``. This results in an implicit
truncation of the vector even if the value is unused in the function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results in an implicit float to int conversion for each component of the vector, not a vector truncation (which would be a reduction in the size of the vector).

implicitly by the function. They must be explicitly initialized. In this case
the argument is not initialized in the function so the temporary is still
uninitialized when it is copied back to the argument expression. This is
undefined behavior in HLSL, and may be illegal in generated programs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and may be illegal in generated programs

Can we clarify what this means? In FXC, it would (usually try to) generate an error if you didn't initialize the output parameter, whether or not the caller used that output, but this might depend on static control flow, and whether or not you called the function at all from an active entry point.

For this statement, are we saying that it may simply result in code that tries to read the undefined value, which in some cases will be caught as illegal by the DXIL validator?

I think we need to refine our rules for this, and what the compiler is expected to do. That doesn't have to be done here, but maybe we should make note that this area needs improvement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try and wordsmith so that it is more clear. The intent was to capture that we would generate invalid output resulting in a late compiler error or validation failure.

@llvm-beanz
Copy link
Collaborator Author

@rjmccall, I'm curious if you have any thoughts on the proposed implementation approach here?

The TL;DR for the gnarly bit is to have AST nodes representing parameters that need temporary values, and for "output" parameters where there may be cast sequences involved the AST node will capture the cast sequences for both parameter initialization and writing back to the argument lvalue. Then we can slightly tweak the CGCall write back support to support having a cast sequence in the AST.

``out`` parameter was not initialized in the function, the uninitialized value
may be written back).

Parameters of constant-sized array type, are also passed with value semantics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spurious comma

parameter is not explicitly initialized inside the function an undefined value
is stored back to the argument expression). For input and output parameters, the
temporary is initialized from the lvalue argument expression through implicit
or explicit casting from the lvalue argument type to the parameter type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a little bit of ambiguity in this paragraph when you use the term "input and output", where it could be read as "parameters that are input and parameters that are output" rather than "parameters that are both input and output". I'm not sure the best way to improve this without getting overly wordy though - I suppose we could spell it "input-and-output" to be clear that it's atomic, or maybe we could use "both" and update that in the list above as well.

Comment on lines 81 to 82
DXC supports unsized arrays passed directly as decayed pointers, which is an
unfortunate behavior divergence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the note should say something like "DXC may pass unsized arrays directly as decayed pointers, which is an unfortunate behavior divergence" then, rather than claiming support?

}

In the above example the ``Init`` function's behavior depends on the C++ ABI
implementation. In the MSVC C++ ABI (used for the HLSL DXIL target), call
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly me being pedantic, but I don't believe this ordering is specified in the Itanium ABI at all (and I don't see anything obvious here: https://itanium-cxx-abi.github.io/cxx-abi/abi.html). I'm not sure about the MSVC ABI. Would it be more accurate to point out that C++ has historically not defined this ordering, describe the two alternatives, and point out that DXC behaves in the MSVC way and so we want to match that?

@rjmccall
Copy link
Contributor

@rjmccall, I'm curious if you have any thoughts on the proposed implementation approach here?

The TL;DR for the gnarly bit is to have AST nodes representing parameters that need temporary values, and for "output" parameters where there may be cast sequences involved the AST node will capture the cast sequences for both parameter initialization and writing back to the argument lvalue. Then we can slightly tweak the CGCall write back support to support having a cast sequence in the AST.

Yeah, I think having the argument being some kind of special expression that stores the cast sequences back and forth makes a lot of sense.

I've frequently felt that our Expr-centric representation for cast sequences is unnecessarily awkward to work with; we'd probably benefit from some sort of explicit Cast representation that would just get stored in a CastExpr. I can't in good conscience ask you to implement that and then refactor all our conversion code in CodeGen around it, though. Given our current representations, you probably just need to do something with OpaqueValueExpr.

@llvm-beanz llvm-beanz merged commit 183eae0 into llvm:main Jan 10, 2024
of an undefined value which may be illegal in the target (DXIL programs with
used or potentially used ``undef`` or ``poison`` values fail validation).

Clang Implementation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a trailing space here which is causing errors

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. I'm not sure why the doc build didn't encounter any issues for me locally, but should be fixed now.

justinfargnoli pushed a commit to justinfargnoli/llvm-project that referenced this pull request Jan 28, 2024
This adds a new document that covers the HLSL approach to function calls
and parameter semantics. At time of writing this document is a proposal
for the implementation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category HLSL HLSL Language Support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants