Skip to content

[ClangIR][CIRGen] Handle nested union in arrays of struct #1007

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 4, 2024

Conversation

ChuanqiXu9
Copy link
Member

@ChuanqiXu9 ChuanqiXu9 commented Oct 25, 2024

Reproducer:

struct nested
{
  union {
    const char *single;
    const char *const *multi;
  } output;
};
static const char * const test[] = {
  "test",
};
const struct nested data[] = 
{
    {
        {
            .multi = test,
        },
    },
    {
        {
            .single = "hello",
        },
    },
};

ClangIR now failed to recognize data as an array since it failed to recognize the initializer for union. This comes from a fundamental difference between CIR and LLVM IR. In LLVM IR, the union is simply a struct with the largest member. So it is fine to have only one init element. But in CIR, the union has the information for all members. So if we only pass a single init element, we may be in trouble. We solve the problem by appending placeholder attribute for the uninitialized fields.

@bcardosolopes
Copy link
Member

Just a heads up: I usually don't review patches failing to pass tests, let me know when this is ready

@bcardosolopes bcardosolopes changed the title [clangir] handle nested union in struct in array [ClangIR][CIRGen] Handle nested union in arrays of struct Oct 28, 2024
@ChuanqiXu9
Copy link
Member Author

(Triggering rerunning)

@ChuanqiXu9
Copy link
Member Author

@bcardosolopes the CI is green now

@smeenai
Copy link
Collaborator

smeenai commented Oct 29, 2024

I'm working on adding an undef attr in #993. Do you think we could reuse that here instead of a new placeholder attr? I haven't fully thought through the LLVM semantics implications of that.

@ChuanqiXu9
Copy link
Member Author

I'm working on adding an undef attr in #993. Do you think we could reuse that here instead of a new placeholder attr? I haven't fully thought through the LLVM semantics implications of that.

Yes, this was the reason I called it as a placeholder attr instead of undef. Since undef has a corresponding part to LLVM and placeholder attr won't. My thought to placeholder is, it is literally a place holder for CIR to complete its semantics. We should never lower a place holder. So I think this is the key difference. WDYT?

@ChuanqiXu9
Copy link
Member Author

I think one motivating example may be, if in some transformation we changed the init of a union into a undef, then it looks slightly prettier:

(undef, placeholder, placeholder)

than

(undef, undef, undef)

at least we know the expected initialized field.

@smeenai
Copy link
Collaborator

smeenai commented Oct 29, 2024

Yeah, that makes sense to me, though I'll wait for Bruno to weigh in too :)

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Comments inline

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, one more round of reviews

@bcardosolopes
Copy link
Member

looks like new CI failures!

@ChuanqiXu9 ChuanqiXu9 force-pushed the handling_union_in_array_struct branch from 949f67d to 55374fa Compare November 4, 2024 03:16
@ChuanqiXu9
Copy link
Member Author

It looks like the windows failure is not relevant.

@bcardosolopes bcardosolopes merged commit 6f15a2e into llvm:main Nov 4, 2024
5 of 6 checks passed
bcardosolopes pushed a commit that referenced this pull request Nov 27, 2024
…1166)

Close #1131

This is another solution to #1160

This patch revert #1007 and remain
its test. The problem described in
#1007 is workaround by skipping the
check of equivalent of element types in arrays.

We can't mock such checks simply by adding another attribute to
`ConstStructAttr` since the types are aggregated. e.g., we have to
handle the cases like `struct { union { ... } }` and `struct { struct {
union { ... } } }` and so on. To make it, we have to introduce what I
called "two type systems" in #1160.

This is not very good giving it removes a reasonable check. But it might
not be so problematic since the Sema part has already checked it. (Of
course, we still need face the risks to introduce new bugs any way)
lanza pushed a commit that referenced this pull request Mar 18, 2025
Reproducer:

```
struct nested
{
  union {
    const char *single;
    const char *const *multi;
  } output;
};
static const char * const test[] = {
  "test",
};
const struct nested data[] = 
{
    {
        {
            .multi = test,
        },
    },
    {
        {
            .single = "hello",
        },
    },
};
```

ClangIR now failed to recognize `data` as an array since it failed to
recognize the initializer for union. This comes from a fundamental
difference between CIR and LLVM IR. In LLVM IR, the union is simply a
struct with the largest member. So it is fine to have only one init
element. But in CIR, the union has the information for all members. So
if we only pass a single init element, we may be in trouble. We solve
the problem by appending placeholder attribute for the uninitialized
fields.
lanza pushed a commit that referenced this pull request Mar 18, 2025
…1166)

Close #1131

This is another solution to #1160

This patch revert #1007 and remain
its test. The problem described in
#1007 is workaround by skipping the
check of equivalent of element types in arrays.

We can't mock such checks simply by adding another attribute to
`ConstStructAttr` since the types are aggregated. e.g., we have to
handle the cases like `struct { union { ... } }` and `struct { struct {
union { ... } } }` and so on. To make it, we have to introduce what I
called "two type systems" in #1160.

This is not very good giving it removes a reasonable check. But it might
not be so problematic since the Sema part has already checked it. (Of
course, we still need face the risks to introduce new bugs any way)
xlauko pushed a commit to trailofbits/instafix-llvm that referenced this pull request Mar 28, 2025
…lvm#1166)

Close llvm/clangir#1131

This is another solution to llvm/clangir#1160

This patch revert llvm/clangir#1007 and remain
its test. The problem described in
llvm/clangir#1007 is workaround by skipping the
check of equivalent of element types in arrays.

We can't mock such checks simply by adding another attribute to
`ConstStructAttr` since the types are aggregated. e.g., we have to
handle the cases like `struct { union { ... } }` and `struct { struct {
union { ... } } }` and so on. To make it, we have to introduce what I
called "two type systems" in llvm/clangir#1160.

This is not very good giving it removes a reasonable check. But it might
not be so problematic since the Sema part has already checked it. (Of
course, we still need face the risks to introduce new bugs any way)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants