[CIR] [workaround] Don't use InactiveUnionFieldAttr for non-largest member #1160

ChuanqiXu9 · 2024-11-25T06:23:39Z

I took more time than I thought on this. The problem seems to be more complex than I thought. And the patch itself is a workaround to stop the regression.

Following off are some details that we can look back future:

The direct problem is, when we had paddings for the initializers for a union, with the use of InactiveUnionFieldAttr, the line to compare the synthesized type and desired type (the union type) fails.

clangir/clang/lib/CIR/CodeGen/CIRGenExprConst.cpp

Lines 456 to 458 in 3aed38c

    
           if (auto desired = dyn_cast<cir::StructType>(DesiredTy)) 
        
             if (desired.isLayoutIdentical(strType)) 
        
               strType = desired;

then the type of the initializer is not considered to be an union, so the computed size is not correct, which is the reason for the crash.

It is worth to note that, before we introduce InactiveUnionFieldAttr, the comparison between the synthesized type and the desired type would fail too. But we can have the right size after all. So the reproducer in Union Initializer Fails During CIRGen #1131 works.
It doesn't work if we don't add these padding bits if we used InactiveUnionFieldAttr. It can avoid the crash. But then when lowering the code, we have to adding these padding bits by our selves. The problem is the type mismatch.

In such manner, in

clangir/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp

Line 433 in 3aed38c

auto llvmTy = converter->convertType(constStruct.getType());

, the generated llvm type for the initializer of the union is struct { i64 }. But the generated type for the init value (in

clangir/clang/lib/CIR/Lowering/DirectToLLVM/LowerToLLVM.cpp

Line 447 in 3aed38c

mlir::Value init = lowerCirAttrAsValue(parentOp, elt, rewriter, converter);

) is i32. Then we have to consider how to instantiate a struct { i64 } with an i32. The specific example is easy by using zext instruction. But how about the generic case? Especially the type of the struct can be not compatible with the init type. This may not be super hard but I'd like to not fix it in the patch.

What's the problematic case now? It is similar to [ClangIR][CIRGen] Handle nested union in arrays of struct #1007 but the size of types in the union are not the same and we tried to init the union with the smaller types. Then we failed to recognize them as the same type so that we will failed to recognize their context as an array.

…ember

bruteforceboy · 2024-11-25T13:59:43Z

Hi! Thanks for looking into this again. For the case below the current implementation still fails to lower to LLVM, but works fine without InactiveUnionFieldAttr

typedef struct {
  int a, b, c, d;
} T;

typedef union {
  struct {
    int a;
    long b;
  };
  T c;
} S;

S s = {.c = {1, 2, 3, 4}};

Is this expected?

bcardosolopes · 2024-11-25T21:28:00Z

then the type of the initializer is not considered to be an union, so the computed size is not correct, which is the reason for the crash.

Have you considered making isLayoutIdentical more robust? It's possible we might need to use layoutInfo for taking some of the decisions (it's lazily computed and can be augmented to do more work), etc. It's probably best if we can hide this extra logic and keep some more similarity to OG skeleton to the extend we can.

ChuanqiXu9 · 2024-11-26T01:59:49Z

Hi! Thanks for looking into this again. For the case below the current implementation still fails to lower to LLVM, but works fine without InactiveUnionFieldAttr
typedef struct {
  int a, b, c, d;
} T;

typedef union {
  struct {
    int a;
    long b;
  };
  T c;
} S;

S s = {.c = {1, 2, 3, 4}};
Is this expected?

Yes, this is more or less "expected". I took a quick look and find it is due to the incorrect result from isLayoutIdentical too

ChuanqiXu9 · 2024-11-26T05:11:26Z

then the type of the initializer is not considered to be an union, so the computed size is not correct, which is the reason for the crash.

Have you considered making isLayoutIdentical more robust? It's possible we might need to use layoutInfo for taking some of the decisions (it's lazily computed and can be augmented to do more work), etc. It's probably best if we can hide this extra logic and keep some more similarity to OG skeleton to the extend we can.

It may not solve the root problem. I spent some time to think about this and my conclusion is:

We may not have very similar skeleton for the particular problem. Since the type for union in LLVM and CIR are fundamentally different.
To solve the problem, we might have to have two type systems. One for CIRGen and one for LLVM lowering.

Here is the story:

In CIRGen, we hope the different expression to initialize the union to have the same type (isLayoutIdentical). This is what addressed by the previous PR.
But the problem comes when the initialized value is not the largest member, then we have to have some paddings.
Then in LLVM lowering, we will convert the union type to the largest type (i64 in this example) and we have members like i32 and paddings (array i8 * 4). And now we have problems for how to initialize i64 with i32 and a array of i8 * 4.

Previously, it just works because the type of the initializer is not considered to be an union type. It makes sense in LLVM since union type is not a thing in LLVM. But it causes problem in the previous PR.

ChuanqiXu9 · 2024-11-26T07:44:19Z

I sent #1166 which is another perspective of what I called "two type systems".

bcardosolopes · 2024-11-26T23:45:23Z

To solve the problem, we might have to have two type systems. One for CIRGen and one for LLVM lowering.

This makes sense, as long as we can still emit the same LLVM, feels like CIR could still be more generic.

Then in LLVM lowering, we will convert the union type to the largest type (i64 in this example) and we have members like i32 and paddings (array i8 * 4). And now we have problems for how to initialize i64 with i32 and a array of i8 * 4.

Yea. I guess we could do casts followed by memcpy's or memsets? What does OG codegen does?

I sent #1166 which is another perspective of what I called "two type systems".

I think #1166 sounds a bit more simple and probably prefered if it's a complete replacement for this PR, what's your take? My understanding is that by reverting the largest member stuff we emit code that is closer to LLVM? or is there anything else extra in that PR?

ChuanqiXu9 · 2024-11-27T01:39:11Z

Yea. I guess we could do casts followed by memcpy's or memsets? What does OG codegen does?

Yeah, for local variables, we can do that (we have comments in LowerToLLVM.cpp right now for this). But for global variable with constant initializer, it looks not the case. For global variables, it looks like its type is not the union but the initializer type: https://godbolt.org/z/rvMEKcEYh and when we want to get its value, due to the opaque pointer, we can load it directly in LLVM.

But in CIR, the story looks sightly different, we will decide the type of the global variable first and decide the type of the initializer later. Then if the type mismatches, the problem will be thrown directly. So I feel this is some fundamental differences.

I think #1166 sounds a bit more simple and probably prefered if it's a complete replacement for this PR, what's your take?

Yes, it can be a replacement for this. I should make it clear. If we like that, we can close this.

…1166) Close #1131 This is another solution to #1160 This patch revert #1007 and remain its test. The problem described in #1007 is workaround by skipping the check of equivalent of element types in arrays. We can't mock such checks simply by adding another attribute to `ConstStructAttr` since the types are aggregated. e.g., we have to handle the cases like `struct { union { ... } }` and `struct { struct { union { ... } } }` and so on. To make it, we have to introduce what I called "two type systems" in #1160. This is not very good giving it removes a reasonable check. But it might not be so problematic since the Sema part has already checked it. (Of course, we still need face the risks to introduce new bugs any way)

bcardosolopes · 2024-11-27T22:43:11Z

Landed #1166, closing this one

…1166) Close #1131 This is another solution to #1160 This patch revert #1007 and remain its test. The problem described in #1007 is workaround by skipping the check of equivalent of element types in arrays. We can't mock such checks simply by adding another attribute to `ConstStructAttr` since the types are aggregated. e.g., we have to handle the cases like `struct { union { ... } }` and `struct { struct { union { ... } } }` and so on. To make it, we have to introduce what I called "two type systems" in #1160. This is not very good giving it removes a reasonable check. But it might not be so problematic since the Sema part has already checked it. (Of course, we still need face the risks to introduce new bugs any way)

…lvm#1166) Close llvm/clangir#1131 This is another solution to llvm/clangir#1160 This patch revert llvm/clangir#1007 and remain its test. The problem described in llvm/clangir#1007 is workaround by skipping the check of equivalent of element types in arrays. We can't mock such checks simply by adding another attribute to `ConstStructAttr` since the types are aggregated. e.g., we have to handle the cases like `struct { union { ... } }` and `struct { struct { union { ... } } }` and so on. To make it, we have to introduce what I called "two type systems" in llvm/clangir#1160. This is not very good giving it removes a reasonable check. But it might not be so problematic since the Sema part has already checked it. (Of course, we still need face the risks to introduce new bugs any way)

[CIR] [workaround] Don't use InactiveUnionFieldAttr for non-largest m…

0c0842c

…ember

ChuanqiXu9 requested review from lanza and bcardosolopes as code owners November 25, 2024 06:23

ChuanqiXu9 mentioned this pull request Nov 25, 2024

Union Initializer Fails During CIRGen #1131

Closed

ChuanqiXu9 mentioned this pull request Nov 26, 2024

[CIR] Remove InavalidUnionFieldAttr and remove a check in array attr #1166

Merged

bcardosolopes closed this Nov 27, 2024

ChuanqiXu9 mentioned this pull request Nov 28, 2024

Reinstate #1007 equality of array types #1178

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CIR] [workaround] Don't use InactiveUnionFieldAttr for non-largest member #1160

[CIR] [workaround] Don't use InactiveUnionFieldAttr for non-largest member #1160

Uh oh!

ChuanqiXu9 commented Nov 25, 2024

Uh oh!

bruteforceboy commented Nov 25, 2024

Uh oh!

bcardosolopes commented Nov 25, 2024 •

edited

Loading

Uh oh!

ChuanqiXu9 commented Nov 26, 2024

Uh oh!

ChuanqiXu9 commented Nov 26, 2024

Uh oh!

ChuanqiXu9 commented Nov 26, 2024

Uh oh!

bcardosolopes commented Nov 26, 2024

Uh oh!

ChuanqiXu9 commented Nov 27, 2024

Uh oh!

bcardosolopes commented Nov 27, 2024

Uh oh!

Uh oh!

	if (auto desired = dyn_cast<cir::StructType>(DesiredTy))
	if (desired.isLayoutIdentical(strType))
	strType = desired;

[CIR] [workaround] Don't use InactiveUnionFieldAttr for non-largest member #1160

[CIR] [workaround] Don't use InactiveUnionFieldAttr for non-largest member #1160

Uh oh!

Conversation

ChuanqiXu9 commented Nov 25, 2024

Uh oh!

bruteforceboy commented Nov 25, 2024

Uh oh!

bcardosolopes commented Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChuanqiXu9 commented Nov 26, 2024

Uh oh!

ChuanqiXu9 commented Nov 26, 2024

Uh oh!

ChuanqiXu9 commented Nov 26, 2024

Uh oh!

bcardosolopes commented Nov 26, 2024

Uh oh!

ChuanqiXu9 commented Nov 27, 2024

Uh oh!

bcardosolopes commented Nov 27, 2024

Uh oh!

Uh oh!

bcardosolopes commented Nov 25, 2024 •

edited

Loading