Skip to content

[CIR][WIP] Add ABI lowering pass #1471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2,342 commits into
base: main
Choose a base branch
from
Draft

Conversation

Lancern
Copy link
Member

@Lancern Lancern commented Mar 12, 2025

This PR attempts to add a new pass cir-abi-lowering to the CIR dialect. This pass runs before the CallConvLowering pass, and it expands all ABI-dependent types and operations inside a function to their ABI-independent equivalences according to the ABI specification.

The patch also moves the lowering code of the following types and operations from the LLVM lowering conversion to the new pass:

  • The pointer-to-data-member type cir.data_member;
  • The pointer-to-member-function type cir.method;
  • All operations working on operands of the above types.

bcardosolopes and others added 30 commits January 27, 2025 12:54
It was always the intention for `cir.cmp` operations to return bool
result. Due
to missing constraints, a bug in codegen has slipped in which created
`cir.cmp`
operations with result type that matches the original AST expression
type. In
C, as opposed to C++, boolean expression types are "int". This resulted
with
extra operations being codegened around boolean expressions and their
usage.

This commit both enforces `cir.cmp` in the op definition and fixes the
mentioned bug.
This is the first patch to support TBAA, following the discussion at
llvm#1076 (comment)

- add skeleton for CIRGen, utilizing `decorateOperationWithTBAA`
- add empty implementation in `CIRGenTBAA`
- introduce `CIR_TBAAAttr` with empty body
- attach `CIR_TBAAAttr` to `LoadOp` and `StoreOp`
- no handling of vtable pointer
- no LLVM lowering
)

The title describes the purpose of the PR. It adds initial support for
structures with padding to the call convention lowering for AArch64.

I have also _initial support_ for the missing feature
[FinishLayout](https://github.com/llvm/clangir/blob/5c5d58402bebdb1e851fb055f746662d4e7eb586/clang/lib/AST/RecordLayoutBuilder.cpp#L786)
for records, and the logic is gotten from the original codegen.

Finally, I added a test for verification.
…m#1152)

The function `populateCIRToLLVMConversionPatterns` contains a spaghetti
of LLVM dialect conversion patterns, which results in merge conflicts
very easily. Besides, a few patterns are even registered for more than
once, possibly due to careless resolution of merge conflicts.

This PR attempts to mitigate this problem. Pattern names now are sorted
in alphabetical order, and each source code line now only lists exactly
one pattern name to reduce potential merge conflicts.
…ltinExpr (llvm#1133)

This PR is a NFC as we just NYI every builtID of neon SISD. We will
implement them in subsequent PRs.
…CFG (llvm#1147)

This PR implements NYI in CIRScopeOpFlattening. It seems to me the best
way is to let results of ScopeOp forwarded as block arguments of the
last block split from the cir.scope block.
…der (llvm#1157)

With [llvm-project#116090](llvm/llvm-project#116090)
merged, we can get rid of `#include "../../../../Basic/Targets.h"` now.
This PR moves the lowering code for data member pointers from the
conversion patterns to the implementation of CXXABI because this part
should be ABI-specific.
…ant arrays or pointers (llvm#1136)

This PR adds support for function arguments with structs that contain
constant arrays or pointers for AArch64.

For example, 
```
typedef struct {
  int a[42];
} CAT;

void pass_cat(CAT a) {}
```

As usual, the main ideas are gotten from the original
[CodeGen](https://github.com/llvm/clangir/blob/3aed38cf52e72cb51a907fad9dd53802f6505b81/clang/lib/AST/ASTContext.cpp#L1823),
and I have added a couple of tests.
…ge (64, 128) (llvm#1141)

This PR adds support for the lowering of AArch64 calls with structs
having sizes greater than 64 and less than 128.

The idea is from the original
[CodeGen](https://github.com/llvm/clangir/blob/da601b374deea6665f710f7e432dfa82f457059e/clang/lib/CodeGen/CGCall.cpp#L1329),
where we perform a coercion through memory for these type of calls.

I have added a test for this.
Recommit llvm#1101

I am not sure what happened. But that merged PR doesn't show in the git
log. Maybe the stacked PR may not get successed? But after all, we need
to land it again.

Following off are original commit messages:

---

This is the following of llvm#1100.

After llvm#1100, when we want to use
LongDouble for VAArg, we will be in trouble due to details in X86_64's
ABI and this patch tries to address this.

The practical impact the patch is, after this patch, with
llvm#1088 and a small following up fix,
we can build and run all C's benchmark in SpecCPU 2017. I think it is a
milestone.
…vm#1151)

They are rounding shift of vectors, and shift amount is from the least
significant byte of the corresponding element of the second input
vector. Thus, it is implemented in [its own ASM
](https://godbolt.org/z/v65sbeKaW). These make them not suitable to be
lowered to CIR ShiftOp though it supports vector type now.
Caught by ASAN. We were creating a reference to an object going out of
scope. Incidentally, this feels like the sort of issue the lifetime
checker will be great for detecting :)
…ut` (llvm#1156)

Since LLVM specific data layout string is not proper in ClangIR, this PR
replaces it with existing MLIR DLTI equivalent and eliminate the
redundancy.

Although the constructor of `LowerModule` of TargetLowering library
requires a llvm data layout string, it is not used currently. (I believe
it would also not matter in the future.) Therefore, this PR has no
functional change.
Based on
https://mlir.llvm.org/docs/Tutorials/UnderstandingTheIRStructure/#traversing-the-def-use-chains,
the users of an op are linked together in a doubly-linked list, so
replacing a user while iterating the users causes a use-after-free.
Store the users in a worklist and process them afterwards instead to
avoid this.
We need to perform all erasures via the rewrite API instead of directly
for the framework to work correctly. This was detected by a combination
of `-DMLIR_ENABLE_EXPENSIVE_PATTERN_API_CHECKS=ON` [1] and ASAN.

[1]
https://mlir.llvm.org/getting_started/Debugging/#detecting-invalid-api-usage
…#1081)

Now implement the same as
[OG](https://github.com/llvm/clangir/blob/7619b20d7461b2d46c17a3154ec4b2f12ca35ea5/clang/lib/CodeGen/CGBuiltin.cpp#L7886),
which is to call llvm aarch64 intrinsic which would eventually become
[an ARM64
instruction](https://developer.arm.com/documentation/ddi0596/2021-03/SIMD-FP-Instructions/ABS--Absolute-value--vector--?lang=en).
However, clearly there is an alternative, which is to extend CIR::AbsOp
and CIR::FAbsOp to support vector type and only lower it at LLVM
Lowering stage to either [LLVM::FAbsOP
](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrfabs-llvmfabsop) or
[[LLVM::AbsOP
]](https://mlir.llvm.org/docs/Dialects/LLVM/#llvmintrabs-llvmabsop),
provided LLVM dialect could do the right thing of TargetLowering by
translating to llvm aarch64 intrinsic eventually.

The question is whether it is worth doing it? 

Any way, put up this diff for suggestions and ideas.
…f -fstrict-vtable-pointers (llvm#1138)

Without using flag `-fstrict-vtable-pointers`, `__builtin_launder` is a
noop. This PR implements that, and leave implementation for the case of
`-fstrict-vtable-pointers` to future where there is a need.

This PR also adapted most of test cases from [OG test
case](https://github.com/llvm/clangir/blob/3aed38cf52e72cb51a907fad9dd53802f6505b81/clang/test/CodeGenCXX/builtin-launder.cpp#L1).
I didn't use test cases in the namespace
[pessimizing_cases](https://github.com/llvm/clangir/blob/3aed38cf52e72cb51a907fad9dd53802f6505b81/clang/test/CodeGenCXX/builtin-launder.cpp#L269),
as they have no difference even when `-fstrict-vtable-pointers` is on.
…1165)

If a record type contains an array of non-record types, we can generate
a copy for it inside the copy constructor, as CodeGen does. CodeGen does
so for arrays of record types where applicable as well, but we'll want
to represent the construction of those explicitly, as outlined in
llvm#1055.
… semantics of C++ initializer list (llvm#1121)

I don't finish all work about `cir.initlist`. But I want to get some
feedback about the cir design to make sure I am in correct way.

Fixed: llvm#777
This PR also changed implementation of BI__builtin_neon_vshlq_v into
using CIR ShiftOp
xlauko and others added 27 commits February 28, 2025 17:04
Cleans up default linkage query implementations. Removes duplicities
from `extraClassDeclaration` that are now introduced through
`CIRGlobalValueInterface`. This makes it more consistent with the
`llvm::GlobalValue` methods.
… functions (llvm#1424)

This PR adds CIRGen and LLVM lowering support for base-to-derived and
derived-to-base cast operations on pointers to member functions.

This PR includes a new operation `cir.update_member` to help the LLVM
lowering procedure of such cast operations.

Resolve llvm#973 .
CIR didn't work on structs with destructor but without constructor. Now
it is fixed.

Moreover, CUDA kernels must be emitted if it was referred to in the
destructor of a non-device variable. It seems already working, so I just
unblocked the code path.
Currently, the following code snippet fails during CodeGen, using
`clang++ tmp.cpp -fclangir -Xclang -emit-cir -S`:
```
#include <fstream>

void foo(const char *path) {
  std::ofstream fout1(path);
  fout1 << path;
  std::ofstream fout2(path);
  fout2 << path;
}
```
It fails with: 
```
error: 'cir.yield' op expects parent op to be one of 'cir.if, cir.scope, cir.switch, cir.while, cir.for, cir.await, cir.ternary, cir.global, cir.do, cir.try, cir.array.ctor, cir.array.dtor, cir.call, cir.case'
```
The relevant part of the CIR dump before verification looks like: 
```
    "cir.br"()[^bb1] : () -> ()
  ^bb1:  // pred: ^bb0
    "cir.yield"() : () -> ()
    "cir.return"() : () -> ()
  }) : () -> ()
```
Two things are wrong: the YieldOp has `cir.func` as a parent and there
is a `cir.return` too. These come right after the second destructor for
`basic_ofstream`.

This PR fixes this by checking if there is a terminator and removing (if
it exists) before adding an implicit return. I have also added a test
that mimics the behavior of `std::basic_ofstream`.
Currently `__shared__` and `__constant__` variables are ignored by
CodeGen. This patch fixes this.
(It is also fixed in llvm#1436 .)

Device and constant variables should be marked as
`externally_initialized`, as they might be initialized by host, rather
than on device. We can't identify which variables are device ones at
lowering stage, so this patch adds a new attribute for it in CodeGen.

Similar to `__global__` functions, global variables on device
corresponds to "shadow" variables on host, and they must be registered
to their counterpart. I added a `CUDAShadowNameAttr` in this patch for
later use, but I didn't insert code to actually generate it.
The generation is quite complicated so I plan to separate it into
several parts.

The registration function should be like:
```cpp
const char *__cuda_fatbin_str = /* Raw content of file in -fcuda-include-gpubinary */;
struct {
  int magicNum, version;
  void *binaryData, *unused;
} __cuda_fatbin_wrapper = { /*CUDA Magic Num*/, 1, __cuda_fatbin_str, nullptr };

void __cuda_module_ctor() {
  handle = __cudaRegisterFatBinary(&wrapper);
  __cuda_register_globals();
}
```
In this PR, we generate everything except the `__cuda_register_globals`
function.

OG doesn't give a name to `__cuda_fatbin_str`, which isn't allowed for
cir::GlobalOp, so I invented a name for it. Other names are kept
consistent with OG.
This PR adds the flag `-emit-mlir-llvm` to allow emitting of MLIR in the
LLVM dialect (cc @xlauko who asked me to do this).
I'm not sure if the naming of the flag is the best and maybe someone
will have a better idea.
Another solution would be to make the `-emit-mlir` flag have a value,
that specifies the target dialect (CIR/MLIR std dialects/LLVM Dialect).
GCC, unlike clang, issues a warning when one virtual function is
overridden in a derived class but one or more other virtual functions
with the same name and different signature from a base class are not
overridden. This leads to many warnings in the MLIR and ClangIR code
when using the OpenConversionPattern<>::matchAndRewrite() function in
the ordinary way. The "hiding" behavior is what we want.
This patch introduces support for pointer TBAA, which can be enabled
using the `-fpointer-tbaa` flag. By default, this feature is now
enabled.

To ensure test compatibility, the tests (`tbaa-enum.cpp`, `tbaa-enum.c`,
and `tbaa-struct.cpp`) have been updated to include the
`-fno-pointer-tbaa` flag.

Related Pull Requests of OG:
- llvm/llvm-project#76612
- llvm/llvm-project#116991
This patch adds a new pass cir-abi-lowering to the CIR dialect. This pass runs
before the CallConvLowering pass, and it expands all ABI-dependent types and
operations inside a function to their ABI-independent equivalences according to
the ABI specification.

This patch also moves the lowering code of the following types and operations
from the LLVM lowering conversion to the new pass:
  - The pointer-to-data-member type `cir.data_member`;
  - The pointer-to-member-function type `cir.method`;
  - All operations working on operands of the above types.
@Lancern
Copy link
Member Author

Lancern commented Mar 12, 2025

The direct motivation for this new pass is the proper CallConvLowering of the !cir.method type. Currently, this type is lowered during LLVM lowering, which is too late to achieve a proper lowering. Consider the following CIR code:

cir.func @test(%arg0: !cir.method) {
  // ...
}

Following the current lowering approach, which keeps !cir.method until we arrive at LLVM lowering, we would eventually get the following LLVM IR:

define dso_local @test({ i64, i64 } %0) {
  ; ...
}

But we actually expect the following LLVM IR (note the differences on the function signatures):

define dso_local @test(i64 %0, i64 %1) {
  ; ...
}

To achieve this, I have 3 choices:

  1. Teach the CallConvLowering pass about the !cir.method type.
  2. Move the lowering of !cir.method to the LoweringPrepare pass, which runs before CallConvLowering.
  3. Add a new pass before CallConvLowering that lowers !cir.method to !cir.struct.

At the beginning I thought option 1 would be the easiest way. But as I dig through the rabbit hole I found some tricky stuff behind the scene. The problem comes from the CodeGen of function prologue and epilogue. In the prologue, each argument is assigned a stack slot and stored there. For an argument of type !cir.method, after CallConvLowering it would expands into two arguments of type !s64i. Thus in the function prologue I would have to come up a way to store two !s64i values into the stack slot allocated for a !cir.method value, which is tricky. Similar problems also exist in the epilogue.

The problem of option 3 is that the LoweringPrepare pass is not a conversion pass, which could be really tricky if you want to do type conversion stuff in it. In my case I have to convert every appearances of !cir.method to !cir.struct and this kind of job is better suited for a conversion pass.

Anyway, this PR is still very incomplete and under construction, I'd like to hear some early comments about this from the community.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.