Skip to content

[WIP] Correct lowering of fp128 intrinsics #76558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

tgross35
Copy link
Contributor

Currently fp128 math intrinsics are lowered to functions expecting long double, which is a problem when long double and f128 do not have the same layout (e.g. long double on x86 is f80).

This patchset does the following:

  1. Move long double layout logic from Clang to LLVM so we can use it on all targets
  2. Default to lowering to __float128 math calls rather than long double calls (sinf128 instead of sinl)
  3. Add logic to still emit long double calls on platforms where long double == f128,

I still need to figure out how to support -mlong-double-128 and similar flags, and need to add a test for a target where ld == f128 such as aarch64. A quick review at this point would still be appreciated to make sure I am on the right track.

Fixes: #44744
Discourse discussion: https://discourse.llvm.org/t/fp128-math-functions-strange-results/72708
Initial patchset: https://reviews.llvm.org/D157836 / http://108.170.204.19/D157836

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@tgross35
Copy link
Contributor Author

@efriedma-quic was looking at this on phabricator

@tgross35 tgross35 force-pushed the f128-math-lowering branch 7 times, most recently from 16f30b5 to f6b6ca7 Compare December 30, 2023 01:39
Copy link

github-actions bot commented Dec 30, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@efriedma-quic
Copy link
Collaborator

This is basically the approach I was expecting: we check the type of "long double" when we build the TargetLowering, and pick appropriate names based on that.

I expect that for -mlong-double-128, you just want to add a module flag that overrides the default choice.

I think I'd prefer to keep the clang type information computation independent from the backend's type information, even if it overlaps. We try to layer the clang frontend so it isn't directly tied to LLVM IR outside of CodeGen.

My first thought was that the computation of the defaults should be in the backend, not Triple.h, since nothing else needs it at the moment. But I guess it could be useful outside the backend, so maybe that's fine. (At the moment, all the relevant optimizations just check the type of the call itself, but I can imagine certain optimizations could benefit from being able to compute the type without an existing signature to consult.)

@tgross35 tgross35 force-pushed the f128-math-lowering branch 5 times, most recently from c00254c to 67033b2 Compare January 13, 2024 09:57
@tgross35 tgross35 force-pushed the f128-math-lowering branch 4 times, most recently from 2208d1c to 76e30ed Compare January 20, 2024 12:36
@tgross35
Copy link
Contributor Author

I'm struggling a bit with how to handle ABI information since that affects layout (e.g. ARM aapcs), which I think explains most of the errors in https://buildkite.com/llvm-project/github-pull-requests/builds/31198#018d26e2-fd17-4e15-a1eb-08580c189056. This needs to be available at TargetLoweringBase::InitLibcalls, which calls getCLayouts.

TargetMachine is available at that time, so would it be better to move CLayouts from Triple to TargetMachine? If so subclasses could be used rather than the if block, which more closely follows the Clang side.

Also, are there currently any module flags that make it to TargetLowering? Looking for a reference on how get the -mlong-double-128 information.

@efriedma-quic
Copy link
Collaborator

Putting a function in TargetMachine seems reasonable.

@efriedma-quic
Copy link
Collaborator

For the question about querying module flags, we do that in a few different places in codegen; grep for "getModuleFlag". Not sure if there's anything specifically in TargetLowering.

`f128` intrinsic functions sometimes lower to `long double` library
calls when they instead need to be `f128` versions. Add a test
demonstrating current behavior.
Information about the size and alignment of `long double` is currently part of
clang. Copy this logic to LLVM so it can be used to control lowering of
intrinsics.

Additionally, add an assertion to make sure Clang and LLVM agree.
Switch from emitting long double functions to using `f128`-specific functions.

Fixes llvm#44744.
@tgross35
Copy link
Contributor Author

Finally getting around to this after more than a year. @efriedma-quic as an alternative to the current implementation of duplicating long double layout information from Clang to LLVM, would it work if LLVM lowers to *f128 calls but provides a module flag fp128_use_long_double_libcalls to prefer the *l versions? So if Clang or other frontends know that their long double is _Float128, it can select those libcalls.

The advantage is avoided code duplication and the logic is easier to follow. Also this avoids problems if linking a library built with an unexpected -mlong-double- configuration.

The disadvantage is that frontends that don't know about C's long double can't benefit from the more common *l symbols. I don't think this is too big of a problem though: it makes no difference with glibc (the f128 aliases have been around sufficiently long) or on any platforms where long double is not _Float128. And it is easy enough for frontends to set fp128_use_long_double_libcalls on a case-by-case basis if they know what math library is being used (e.g. aarch64 musl).

(I handle the f128 support for Rust and would much rather never think about *l symbols, I can alias them to *f128 if needed or set the flag)

@tgross35
Copy link
Contributor Author

tgross35 commented Mar 1, 2025

In either case, I need to have the module flags available pretty early and I'm not sure how to do that. Ideally they would be available when TargetLowering is constructed or sometime before it is used for lowering, but it only gets a TargetMachine as a paremeter. All values in TargetOptions seem to be configured once and don't pay attention to module flags or take the module as a parameter - is there a reason for that? I'm wondering if TargetMachine is intended to be unchanging across different modules.

Comment on lines +11 to +15
; REQUIRES: aarch64-registered-target
; REQUIRES: powerpc-registered-target
; REQUIRES: riscv-registered-target
; REQUIRES: systemz-registered-target
; REQUIRES: x86-registered-target
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Todo: replace this with %if somehow so this test still runs if only a subset of architectures is available https://llvm.org/docs/TestingGuide.html

@tgross35
Copy link
Contributor Author

tgross35 commented Apr 22, 2025

Talked to arsenm on discord, long discussion starting around here https://discord.com/channels/636084430946959380/636732535434510338/1362207130559578185. The outcome is that this is effectively a target option and needs to be tied to the triple rather than per-module. Which makes sense and avoids the above problem.

So, I'll be doing the following:

  1. Make LLVM assume that sqrtf128 (and similar) libcalls are available by default
  2. On 64-bit arm, loongarch, mips, risc-v, and s390x musl targets, use sqrtl instead
  3. Add some way to make musl targets also use sqrtf128 if a custom libm is provided, like -nolongdouble in the target triple

This should work because calling sqrtf128 is correct on most platforms:

  • On Windows, Apple, and 32-bit platforms, long double is f64 so sqrtf128 is the only correct call
  • On x86, long double is the x87 80-bit float so sqrtf128 is the only correct call
  • On anything glibc, sqrtf128 is an alias to sqrtl on platforms where that works, so sqrtf128 can always be called
  • That leaves 64-bit musl on platforms where long double is f128 as the only platforms where sqrtl has to be called (otherwise calling sqrtl from C would get intercepted and relowered as sqrtf128 for a linker error)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid lowering of llvm.*.f128 intrinsics
2 participants