Skip to content

Commit 16ff91e

Browse files
committed
Introduce intrinsic llvm.isnan
Clang has builtin function '__builtin_isnan', which implements C library function 'isnan'. This function now is implemented entirely in clang codegen, which expands the function into set of IR operations. There are three mechanisms by which the expansion can be made. * The most common mechanism is using an unordered comparison made by instruction 'fcmp uno'. This simple solution is target-independent and works well in most cases. It however is not suitable if floating point exceptions are tracked. Corresponding IEEE 754 operation and C function must never raise FP exception, even if the argument is a signaling NaN. Compare instructions usually does not have such property, they raise 'invalid' exception in such case. So this mechanism is unsuitable when exception behavior is strict. In particular it could result in unexpected trapping if argument is SNaN. * Another solution was implemented in https://reviews.llvm.org/D95948. It is used in the cases when raising FP exceptions by 'isnan' is not allowed. This solution implements 'isnan' using integer operations. It solves the problem of exceptions, but offers one solution for all targets, however some can do the check in more efficient way. * Solution implemented by https://reviews.llvm.org/D96568 introduced a hook 'clang::TargetCodeGenInfo::testFPKind', which injects target specific code into IR. Now only SystemZ implements this hook and it generates a call to target specific intrinsic function. Although these mechanisms allow to implement 'isnan' with enough efficiency, expanding 'isnan' in clang has drawbacks: * The operation 'isnan' is hidden behind generic integer operations or target-specific intrinsics. It complicates analysis and can prevent some optimizations. * IR can be created by tools other than clang, in this case treatment of 'isnan' has to be duplicated in that tool. Another issue with the current implementation of 'isnan' comes from the use of options '-ffast-math' or '-fno-honor-nans'. If such option is specified, 'fcmp uno' may be optimized to 'false'. It is valid optimization in general, but it results in 'isnan' always returning 'false'. For example, in some libc++ implementations the following code returns 'false': std::isnan(std::numeric_limits<float>::quiet_NaN()) The options '-ffast-math' and '-fno-honor-nans' imply that FP operation operands are never NaNs. This assumption however should not be applied to the functions that check FP number properties, including 'isnan'. If such function returns expected result instead of actually making checks, it becomes useless in many cases. The option '-ffast-math' is often used for performance critical code, as it can speed up execution by the expense of manual treatment of corner cases. If 'isnan' returns assumed result, a user cannot use it in the manual treatment of NaNs and has to invent replacements, like making the check using integer operations. There is a discussion in https://reviews.llvm.org/D18513#387418, which also expresses the opinion, that limitations imposed by '-ffast-math' should be applied only to 'math' functions but not to 'tests'. To overcome these drawbacks, this change introduces a new IR intrinsic function 'llvm.isnan', which realizes the check as specified by IEEE-754 and C standards in target-agnostic way. During IR transformations it does not undergo undesirable optimizations. It reaches instruction selection, where is lowered in target-dependent way. The lowering can vary depending on options like '-ffast-math' or '-ffp-model' so the resulting code satisfies requested semantics. Differential Revision: https://reviews.llvm.org/D104854
1 parent 2f00281 commit 16ff91e

24 files changed

+2597
-145
lines changed

clang/lib/CodeGen/CGBuiltin.cpp

+4-24
Original file line numberDiff line numberDiff line change
@@ -3068,37 +3068,17 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
30683068
// ZExt bool to int type.
30693069
return RValue::get(Builder.CreateZExt(LHS, ConvertType(E->getType())));
30703070
}
3071+
30713072
case Builtin::BI__builtin_isnan: {
30723073
CodeGenFunction::CGFPOptionsRAII FPOptsRAII(*this, E);
30733074
Value *V = EmitScalarExpr(E->getArg(0));
3074-
llvm::Type *Ty = V->getType();
3075-
const llvm::fltSemantics &Semantics = Ty->getFltSemantics();
3076-
if (!Builder.getIsFPConstrained() ||
3077-
Builder.getDefaultConstrainedExcept() == fp::ebIgnore ||
3078-
!Ty->isIEEE()) {
3079-
V = Builder.CreateFCmpUNO(V, V, "cmp");
3080-
return RValue::get(Builder.CreateZExt(V, ConvertType(E->getType())));
3081-
}
30823075

30833076
if (Value *Result = getTargetHooks().testFPKind(V, BuiltinID, Builder, CGM))
30843077
return RValue::get(Result);
30853078

3086-
// NaN has all exp bits set and a non zero significand. Therefore:
3087-
// isnan(V) == ((exp mask - (abs(V) & exp mask)) < 0)
3088-
unsigned bitsize = Ty->getScalarSizeInBits();
3089-
llvm::IntegerType *IntTy = Builder.getIntNTy(bitsize);
3090-
Value *IntV = Builder.CreateBitCast(V, IntTy);
3091-
APInt AndMask = APInt::getSignedMaxValue(bitsize);
3092-
Value *AbsV =
3093-
Builder.CreateAnd(IntV, llvm::ConstantInt::get(IntTy, AndMask));
3094-
APInt ExpMask = APFloat::getInf(Semantics).bitcastToAPInt();
3095-
Value *Sub =
3096-
Builder.CreateSub(llvm::ConstantInt::get(IntTy, ExpMask), AbsV);
3097-
// V = sign bit (Sub) <=> V = (Sub < 0)
3098-
V = Builder.CreateLShr(Sub, llvm::ConstantInt::get(IntTy, bitsize - 1));
3099-
if (bitsize > 32)
3100-
V = Builder.CreateTrunc(V, ConvertType(E->getType()));
3101-
return RValue::get(V);
3079+
Function *F = CGM.getIntrinsic(Intrinsic::isnan, V->getType());
3080+
Value *Call = Builder.CreateCall(F, V);
3081+
return RValue::get(Builder.CreateZExt(Call, ConvertType(E->getType())));
31023082
}
31033083

31043084
case Builtin::BI__builtin_matrix_transpose: {

clang/test/CodeGen/X86/strictfp_builtins.c

+17-20
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ int printf(const char *, ...);
1717
// CHECK-NEXT: store i32 [[X:%.*]], i32* [[X_ADDR]], align 4
1818
// CHECK-NEXT: [[TMP0:%.*]] = load i8*, i8** [[STR_ADDR]], align 8
1919
// CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[X_ADDR]], align 4
20-
// CHECK-NEXT: [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]]) [[ATTR4:#.*]]
20+
// CHECK-NEXT: [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]]) #[[ATTR3:[0-9]+]]
2121
// CHECK-NEXT: ret void
2222
//
2323
void p(char *str, int x) {
@@ -29,13 +29,13 @@ void p(char *str, int x) {
2929
// CHECK-LABEL: @test_long_double_isinf(
3030
// CHECK-NEXT: entry:
3131
// CHECK-NEXT: [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
32-
// CHECK-NEXT: store x86_fp80 [[D:%.*]], x86_fp80* [[LD_ADDR]], align 16
32+
// CHECK-NEXT: store x86_fp80 [[LD:%.*]], x86_fp80* [[LD_ADDR]], align 16
3333
// CHECK-NEXT: [[TMP0:%.*]] = load x86_fp80, x86_fp80* [[LD_ADDR]], align 16
34-
// CHECK-NEXT: [[BITCAST:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
35-
// CHECK-NEXT: [[SHL1:%.*]] = shl i80 [[BITCAST]], 1
36-
// CHECK-NEXT: [[CMP:%.*]] = icmp eq i80 [[SHL1]], -18446744073709551616
37-
// CHECK-NEXT: [[RES:%.*]] = zext i1 [[CMP]] to i32
38-
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
34+
// CHECK-NEXT: [[TMP1:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
35+
// CHECK-NEXT: [[TMP2:%.*]] = shl i80 [[TMP1]], 1
36+
// CHECK-NEXT: [[TMP3:%.*]] = icmp eq i80 [[TMP2]], -18446744073709551616
37+
// CHECK-NEXT: [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
38+
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.1, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
3939
// CHECK-NEXT: ret void
4040
//
4141
void test_long_double_isinf(long double ld) {
@@ -47,13 +47,13 @@ void test_long_double_isinf(long double ld) {
4747
// CHECK-LABEL: @test_long_double_isfinite(
4848
// CHECK-NEXT: entry:
4949
// CHECK-NEXT: [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
50-
// CHECK-NEXT: store x86_fp80 [[D:%.*]], x86_fp80* [[LD_ADDR]], align 16
50+
// CHECK-NEXT: store x86_fp80 [[LD:%.*]], x86_fp80* [[LD_ADDR]], align 16
5151
// CHECK-NEXT: [[TMP0:%.*]] = load x86_fp80, x86_fp80* [[LD_ADDR]], align 16
52-
// CHECK-NEXT: [[BITCAST:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
53-
// CHECK-NEXT: [[SHL1:%.*]] = shl i80 [[BITCAST]], 1
54-
// CHECK-NEXT: [[CMP:%.*]] = icmp ult i80 [[SHL1]], -18446744073709551616
55-
// CHECK-NEXT: [[RES:%.*]] = zext i1 [[CMP]] to i32
56-
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
52+
// CHECK-NEXT: [[TMP1:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
53+
// CHECK-NEXT: [[TMP2:%.*]] = shl i80 [[TMP1]], 1
54+
// CHECK-NEXT: [[TMP3:%.*]] = icmp ult i80 [[TMP2]], -18446744073709551616
55+
// CHECK-NEXT: [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
56+
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.2, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
5757
// CHECK-NEXT: ret void
5858
//
5959
void test_long_double_isfinite(long double ld) {
@@ -65,14 +65,11 @@ void test_long_double_isfinite(long double ld) {
6565
// CHECK-LABEL: @test_long_double_isnan(
6666
// CHECK-NEXT: entry:
6767
// CHECK-NEXT: [[LD_ADDR:%.*]] = alloca x86_fp80, align 16
68-
// CHECK-NEXT: store x86_fp80 [[D:%.*]], x86_fp80* [[LD_ADDR]], align 16
68+
// CHECK-NEXT: store x86_fp80 [[LD:%.*]], x86_fp80* [[LD_ADDR]], align 16
6969
// CHECK-NEXT: [[TMP0:%.*]] = load x86_fp80, x86_fp80* [[LD_ADDR]], align 16
70-
// CHECK-NEXT: [[BITCAST:%.*]] = bitcast x86_fp80 [[TMP0]] to i80
71-
// CHECK-NEXT: [[ABS:%.*]] = and i80 [[BITCAST]], 604462909807314587353087
72-
// CHECK-NEXT: [[TMP1:%.*]] = sub i80 604453686435277732577280, [[ABS]]
73-
// CHECK-NEXT: [[ISNAN:%.*]] = lshr i80 [[TMP1]], 79
74-
// CHECK-NEXT: [[RES:%.*]] = trunc i80 [[ISNAN]] to i32
75-
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
70+
// CHECK-NEXT: [[TMP1:%.*]] = call i1 @llvm.isnan.f80(x86_fp80 [[TMP0]]) #[[ATTR3]]
71+
// CHECK-NEXT: [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
72+
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.3, i64 0, i64 0), i32 [[TMP2]]) #[[ATTR3]]
7673
// CHECK-NEXT: ret void
7774
//
7875
void test_long_double_isnan(long double ld) {

clang/test/CodeGen/aarch64-strictfp-builtins.c

+18-20
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py
12
// RUN: %clang_cc1 %s -emit-llvm -ffp-exception-behavior=maytrap -fexperimental-strict-floating-point -o - -triple arm64-none-linux-gnu | FileCheck %s
23

34
// Test that the constrained intrinsics are picking up the exception
@@ -15,7 +16,7 @@ int printf(const char *, ...);
1516
// CHECK-NEXT: store i32 [[X:%.*]], i32* [[X_ADDR]], align 4
1617
// CHECK-NEXT: [[TMP0:%.*]] = load i8*, i8** [[STR_ADDR]], align 8
1718
// CHECK-NEXT: [[TMP1:%.*]] = load i32, i32* [[X_ADDR]], align 4
18-
// CHECK-NEXT: [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]]) [[ATTR4:#.*]]
19+
// CHECK-NEXT: [[CALL:%.*]] = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([8 x i8], [8 x i8]* @.str, i64 0, i64 0), i8* [[TMP0]], i32 [[TMP1]]) #[[ATTR3:[0-9]+]]
1920
// CHECK-NEXT: ret void
2021
//
2122
void p(char *str, int x) {
@@ -27,13 +28,13 @@ void p(char *str, int x) {
2728
// CHECK-LABEL: @test_long_double_isinf(
2829
// CHECK-NEXT: entry:
2930
// CHECK-NEXT: [[LD_ADDR:%.*]] = alloca fp128, align 16
30-
// CHECK-NEXT: store fp128 [[D:%.*]], fp128* [[LD_ADDR]], align 16
31+
// CHECK-NEXT: store fp128 [[LD:%.*]], fp128* [[LD_ADDR]], align 16
3132
// CHECK-NEXT: [[TMP0:%.*]] = load fp128, fp128* [[LD_ADDR]], align 16
32-
// CHECK-NEXT: [[BITCAST:%.*]] = bitcast fp128 [[TMP0]] to i128
33-
// CHECK-NEXT: [[SHL1:%.*]] = shl i128 [[BITCAST]], 1
34-
// CHECK-NEXT: [[CMP:%.*]] = icmp eq i128 [[SHL1]], -10384593717069655257060992658440192
35-
// CHECK-NEXT: [[RES:%.*]] = zext i1 [[CMP]] to i32
36-
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
33+
// CHECK-NEXT: [[TMP1:%.*]] = bitcast fp128 [[TMP0]] to i128
34+
// CHECK-NEXT: [[TMP2:%.*]] = shl i128 [[TMP1]], 1
35+
// CHECK-NEXT: [[TMP3:%.*]] = icmp eq i128 [[TMP2]], -10384593717069655257060992658440192
36+
// CHECK-NEXT: [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
37+
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.1, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
3738
// CHECK-NEXT: ret void
3839
//
3940
void test_long_double_isinf(long double ld) {
@@ -45,13 +46,13 @@ void test_long_double_isinf(long double ld) {
4546
// CHECK-LABEL: @test_long_double_isfinite(
4647
// CHECK-NEXT: entry:
4748
// CHECK-NEXT: [[LD_ADDR:%.*]] = alloca fp128, align 16
48-
// CHECK-NEXT: store fp128 [[D:%.*]], fp128* [[LD_ADDR]], align 16
49+
// CHECK-NEXT: store fp128 [[LD:%.*]], fp128* [[LD_ADDR]], align 16
4950
// CHECK-NEXT: [[TMP0:%.*]] = load fp128, fp128* [[LD_ADDR]], align 16
50-
// CHECK-NEXT: [[BITCAST:%.*]] = bitcast fp128 [[TMP0]] to i128
51-
// CHECK-NEXT: [[SHL1:%.*]] = shl i128 [[BITCAST]], 1
52-
// CHECK-NEXT: [[CMP:%.*]] = icmp ult i128 [[SHL1]], -10384593717069655257060992658440192
53-
// CHECK-NEXT: [[RES:%.*]] = zext i1 [[CMP]] to i32
54-
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]]) [[ATTR4]]
51+
// CHECK-NEXT: [[TMP1:%.*]] = bitcast fp128 [[TMP0]] to i128
52+
// CHECK-NEXT: [[TMP2:%.*]] = shl i128 [[TMP1]], 1
53+
// CHECK-NEXT: [[TMP3:%.*]] = icmp ult i128 [[TMP2]], -10384593717069655257060992658440192
54+
// CHECK-NEXT: [[TMP4:%.*]] = zext i1 [[TMP3]] to i32
55+
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.2, i64 0, i64 0), i32 [[TMP4]]) #[[ATTR3]]
5556
// CHECK-NEXT: ret void
5657
//
5758
void test_long_double_isfinite(long double ld) {
@@ -63,14 +64,11 @@ void test_long_double_isfinite(long double ld) {
6364
// CHECK-LABEL: @test_long_double_isnan(
6465
// CHECK-NEXT: entry:
6566
// CHECK-NEXT: [[LD_ADDR:%.*]] = alloca fp128, align 16
66-
// CHECK-NEXT: store fp128 [[D:%.*]], fp128* [[LD_ADDR]], align 16
67+
// CHECK-NEXT: store fp128 [[LD:%.*]], fp128* [[LD_ADDR]], align 16
6768
// CHECK-NEXT: [[TMP0:%.*]] = load fp128, fp128* [[LD_ADDR]], align 16
68-
// CHECK-NEXT: [[BITCAST:%.*]] = bitcast fp128 [[TMP0]] to i128
69-
// CHECK-NEXT: [[ABS:%.*]] = and i128 [[BITCAST]], 170141183460469231731687303715884105727
70-
// CHECK-NEXT: [[TMP1:%.*]] = sub i128 170135991163610696904058773219554885632, [[ABS]]
71-
// CHECK-NEXT: [[ISNAN:%.*]] = lshr i128 [[TMP1]], 127
72-
// CHECK-NEXT: [[RES:%.*]] = trunc i128 [[ISNAN]] to i32
73-
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.[[#STRID:STRID+1]], i64 0, i64 0), i32 [[RES]])
69+
// CHECK-NEXT: [[TMP1:%.*]] = call i1 @llvm.isnan.f128(fp128 [[TMP0]]) #[[ATTR3]]
70+
// CHECK-NEXT: [[TMP2:%.*]] = zext i1 [[TMP1]] to i32
71+
// CHECK-NEXT: call void @p(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.3, i64 0, i64 0), i32 [[TMP2]]) #[[ATTR3]]
7472
// CHECK-NEXT: ret void
7573
//
7674
void test_long_double_isnan(long double ld) {

0 commit comments

Comments
 (0)