Lex: add support for `i128` and `ui128` suffixes #130993

compnerd · 2025-03-12T17:08:38Z

Microsoft's compiler supports an extension for 128-bit literals. This is referenced in intsafe.h which is included transitievly. When building with modules, the literal parsing causes a failure due to the missing support for the extension. To alleviate this issue, support parsing this literal, especially now that there is the BitInt extension.

Take the opportunity to tighten up the code slightly by ensuring that we do not access out-of-bounds characters when lexing the token.

llvmbot · 2025-03-12T17:09:21Z

@llvm/pr-subscribers-platform-windows

@llvm/pr-subscribers-clang

Author: Saleem Abdulrasool (compnerd)

Changes

Microsoft's compiler supports an extension for 128-bit literals. This is referenced in intsafe.h which is included transitievly. When building with modules, the literal parsing causes a failure due to the missing support for the extension. To alleviate this issue, support parsing this literal, especially now that there is the BitInt extension.

Take the opportunity to tighten up the code slightly by ensuring that we do not access out-of-bounds characters when lexing the token.

Full diff: https://github.com/llvm/llvm-project/pull/130993.diff

4 Files Affected:

(modified) clang/include/clang/Lex/LiteralSupport.h (+2-2)
(modified) clang/lib/Lex/LiteralSupport.cpp (+10-4)
(modified) clang/lib/Sema/SemaExpr.cpp (+12-4)
(modified) clang/test/Lexer/ms-extensions.c (+9-1)

diff --git a/clang/include/clang/Lex/LiteralSupport.h b/clang/include/clang/Lex/LiteralSupport.h
index 705021fcfa5b1..ea5f63bc20399 100644
--- a/clang/include/clang/Lex/LiteralSupport.h
+++ b/clang/include/clang/Lex/LiteralSupport.h
@@ -82,8 +82,8 @@ class NumericLiteralParser {
   bool isAccum : 1;         // 1.0hk/k/lk/uhk/uk/ulk
   bool isBitInt : 1;        // 1wb, 1uwb (C23) or 1__wb, 1__uwb (Clang extension in C++
                             // mode)
-  uint8_t MicrosoftInteger; // Microsoft suffix extension i8, i16, i32, or i64.
-
+  uint8_t MicrosoftInteger; // Microsoft suffix extension i8, i16, i32, i64, or
+                            // i128.
 
   bool isFixedPointLiteral() const {
     return (saw_period || saw_exponent) && saw_fixed_point_suffix;
diff --git a/clang/lib/Lex/LiteralSupport.cpp b/clang/lib/Lex/LiteralSupport.cpp
index 225a6c2d15baa..218b9e2886f54 100644
--- a/clang/lib/Lex/LiteralSupport.cpp
+++ b/clang/lib/Lex/LiteralSupport.cpp
@@ -1074,29 +1074,35 @@ NumericLiteralParser::NumericLiteralParser(StringRef TokSpelling,
     case 'i':
     case 'I':
       if (LangOpts.MicrosoftExt && !isFPConstant) {
-        // Allow i8, i16, i32, and i64. First, look ahead and check if
+        // Allow i8, i16, i32, i64, and i128. First, look ahead and check if
         // suffixes are Microsoft integers and not the imaginary unit.
         uint8_t Bits = 0;
         size_t ToSkip = 0;
+        if (s + 1 >= ThisTokEnd)
+          break;
         switch (s[1]) {
         case '8': // i8 suffix
           Bits = 8;
           ToSkip = 2;
           break;
         case '1':
-          if (s[2] == '6') { // i16 suffix
+          if (s + 2 < ThisTokEnd && s[2] == '6') { // i16 suffix
             Bits = 16;
             ToSkip = 3;
+          } else if (s + 3 < ThisTokEnd && s[2] == '2' &&
+                     s[3] == '8') { // i128 suffix
+            Bits = 128;
+            ToSkip = 4;
           }
           break;
         case '3':
-          if (s[2] == '2') { // i32 suffix
+          if (s + 2 < ThisTokEnd && s[2] == '2') { // i32 suffix
             Bits = 32;
             ToSkip = 3;
           }
           break;
         case '6':
-          if (s[2] == '4') { // i64 suffix
+          if (s + 2 < ThisTokEnd && s[2] == '4') { // i64 suffix
             Bits = 64;
             ToSkip = 3;
           }
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index a36a2f563739e..86a492dd465a0 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -3924,10 +3924,18 @@ ExprResult Sema::ActOnNumericConstant(const Token &Tok, Scope *UDLScope) {
     // to get the integer value from an overly-wide APInt is *extremely*
     // expensive, so the naive approach of assuming
     // llvm::IntegerType::MAX_INT_BITS is a big performance hit.
-    unsigned BitsNeeded =
-        Literal.isBitInt ? llvm::APInt::getSufficientBitsNeeded(
-                               Literal.getLiteralDigits(), Literal.getRadix())
-                         : Context.getTargetInfo().getIntMaxTWidth();
+    unsigned BitsNeeded = Context.getTargetInfo().getIntMaxTWidth();
+    if (Literal.isBitInt)
+      BitsNeeded = llvm::APInt::getSufficientBitsNeeded(
+          Literal.getLiteralDigits(), Literal.getRadix());
+    if (Literal.MicrosoftInteger) {
+      if (Literal.MicrosoftInteger == 128 &&
+          !Context.getTargetInfo().hasInt128Type())
+        PP.Diag(Tok.getLocation(), diag::err_integer_literal_too_large)
+            << Literal.isUnsigned;
+      BitsNeeded = Literal.MicrosoftInteger;
+    }
+
     llvm::APInt ResultVal(BitsNeeded, 0);
 
     if (Literal.GetIntegerValue(ResultVal)) {
diff --git a/clang/test/Lexer/ms-extensions.c b/clang/test/Lexer/ms-extensions.c
index f1eed337b8737..41a5c2c182cb6 100644
--- a/clang/test/Lexer/ms-extensions.c
+++ b/clang/test/Lexer/ms-extensions.c
@@ -13,16 +13,24 @@ __int64 w = 0x43ui64;
 __int64 z = 9Li64;  // expected-error {{invalid suffix}}
 __int64 q = 10lli64;  // expected-error {{invalid suffix}}
 
-__complex double c1 = 1i;
+__complex double c1 = 1i; // expected-error {{invalid suffix}}
 __complex double c2 = 1.0i;
 __complex float c3 = 1.0if;
 
+#define UINT128_MAX 0xffffffffffffffffffffffffffffffffui128
 #define ULLONG_MAX 0xffffffffffffffffui64
 #define UINT 0xffffffffui32
 #define USHORT 0xffffui16
 #define UCHAR 0xffui8
 
 void a(void) {
+#if __SIZEOF_INT128__
+        __int128 j = UINT128_MAX;
+#else
+        int j = UINT128_MAX;
+        // expected-warning@-1{{implicit conversion from 'unsigned __int128' to 'int' changes value from 340282366920938463463374607431768211455 to -1}}
+        // expected-error@-2{{integer literal is too large to be represented in any integer type}}
+#endif
 	unsigned long long m = ULLONG_MAX;
 	unsigned int n = UINT;
         unsigned short s = USHORT;

github-actions · 2025-03-12T18:29:44Z

✅ With the latest revision this PR passed the C/C++ code formatter.

Fznamznon

Perhaps a release note makes sense as well.

clang/lib/Sema/SemaExpr.cpp

clang/test/Lexer/ms-extensions.c

zmodem

Take the opportunity to tighten up the code slightly by ensuring that we do not access out-of-bounds characters when lexing the token.

That's a nice touch :-)

Do we already have tests somewhere checking that we can codegen this, or do we need to add that?

compnerd · 2025-03-13T16:31:18Z

Take the opportunity to tighten up the code slightly by ensuring that we do not access out-of-bounds characters when lexing the token.

That's a nice touch :-)

😄

Do we already have tests somewhere checking that we can codegen this, or do we need to add that?

I think that generally the _BitInt work should have given us the appropriate coverage. We use existing support for the Microsoft extension type __int128 which has been there for a long time. I feel like we should have appropriate coverage already, but if you have a specific bit of coverage that you think would be good, I can try to integrate that.

zmodem

We use existing support for the Microsoft extension type __int128 which has been there for a long time. I feel like we should have appropriate coverage already

Sounds good to me.

lgtm with the i128 test that Fznamznon suggested, and a release note is a good idea too.

Microsoft's compiler supports an extension for 128-bit literals. This is referenced in `intsafe.h` which is included transitievly. When building with modules, the literal parsing causes a failure due to the missing support for the extension. To alleviate this issue, support parsing this literal, especially now that there is the BitInt extension. Take the opportunity to tighten up the code slightly by ensuring that we do not access out-of-bounds characters when lexing the token.

compnerd · 2025-03-13T17:26:54Z

New version also includes the release notes for this.

Microsoft's compiler supports an extension for 128-bit literals. This is referenced in `intsafe.h` which is included transitievly. When building with modules, the literal parsing causes a failure due to the missing support for the extension. To alleviate this issue, support parsing this literal, especially now that there is the BitInt extension. Take the opportunity to tighten up the code slightly by ensuring that we do not access out-of-bounds characters when lexing the token. (cherry picked from commit dcec224)

Microsoft's compiler supports an extension for 128-bit literals. This is referenced in `intsafe.h` which is included transitievly. When building with modules, the literal parsing causes a failure due to the missing support for the extension. To alleviate this issue, support parsing this literal, especially now that there is the BitInt extension. Take the opportunity to tighten up the code slightly by ensuring that we do not access out-of-bounds characters when lexing the token.

Lex: add support for `i128` and `ui128` suffixes (llvm#130993)

Microsoft's compiler supports an extension for 128-bit literals. This is referenced in `intsafe.h` which is included transitievly. When building with modules, the literal parsing causes a failure due to the missing support for the extension. To alleviate this issue, support parsing this literal, especially now that there is the BitInt extension. Take the opportunity to tighten up the code slightly by ensuring that we do not access out-of-bounds characters when lexing the token.

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Mar 12, 2025

compnerd added the platform:windows label Mar 12, 2025

compnerd force-pushed the ui128 branch from 79ad6e9 to ba82162 Compare March 12, 2025 18:26

compnerd requested a review from AaronBallman March 12, 2025 18:36

AaronBallman requested review from rnk, zmodem and cor3ntin March 12, 2025 18:37

Fznamznon reviewed Mar 13, 2025

View reviewed changes

clang/lib/Sema/SemaExpr.cpp Show resolved Hide resolved

clang/test/Lexer/ms-extensions.c Show resolved Hide resolved

zmodem reviewed Mar 13, 2025

View reviewed changes

zmodem approved these changes Mar 13, 2025

View reviewed changes

compnerd force-pushed the ui128 branch from ba82162 to 1021d05 Compare March 13, 2025 17:26

weliveindetail mentioned this pull request Mar 13, 2025

[windows][toolchain] Run non-executable Swift Runtime tests for aarch64 Android swiftlang/swift#79185

Merged

compnerd merged commit dcec224 into llvm:main Mar 13, 2025
12 checks passed

compnerd deleted the ui128 branch March 13, 2025 23:36

compnerd added a commit to swiftlang/llvm-project that referenced this pull request Mar 14, 2025

Merge pull request #10253 from compnerd/ui128

c34071e

Lex: add support for `i128` and `ui128` suffixes (llvm#130993)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lex: add support for `i128` and `ui128` suffixes #130993

Lex: add support for `i128` and `ui128` suffixes #130993

compnerd commented Mar 12, 2025

llvmbot commented Mar 12, 2025 •

edited

Loading

github-actions bot commented Mar 12, 2025 •

edited

Loading

Fznamznon left a comment

zmodem left a comment

compnerd commented Mar 13, 2025

zmodem left a comment

compnerd commented Mar 13, 2025

Lex: add support for i128 and ui128 suffixes #130993

Lex: add support for i128 and ui128 suffixes #130993

Conversation

compnerd commented Mar 12, 2025

llvmbot commented Mar 12, 2025 • edited Loading

github-actions bot commented Mar 12, 2025 • edited Loading

Fznamznon left a comment

Choose a reason for hiding this comment

zmodem left a comment

Choose a reason for hiding this comment

compnerd commented Mar 13, 2025

zmodem left a comment

Choose a reason for hiding this comment

compnerd commented Mar 13, 2025

Lex: add support for `i128` and `ui128` suffixes #130993

Lex: add support for `i128` and `ui128` suffixes #130993

llvmbot commented Mar 12, 2025 •

edited

Loading

github-actions bot commented Mar 12, 2025 •

edited

Loading