8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 #25551

marc-chevalier · 2025-05-30T15:33:14Z

Problem

On Aarch64, using Integer.bitCount can modify its argument. The problem comes from the implementation of popCountI on Aarch64. For instance, that's what we get with the reproducer Reduced.java on the related issue:

; Load lFld into local x
ldr  x11,      [x10, #120]
; popCountI
mov  w11,      w11
mov  v16.d[0], x11
cnt  v16.8b,   v16.8b
addv b16,      v16.8b
mov  x13,      v16.d[0]
; [...]
; store local x (which is believed to still contain lFld) into result
str  x11,      [x10, #128]

The instruction mov w11, w11 is used to cut the 32 higher bits of x11 since we use popCountI (from Integer.bitCount): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation of popCountI doesn't declare it:

instruct popCountI(iRegINoSp dst, iRegIorL2I src, vRegF tmp) %{
  match(Set dst (PopCountI src));
  effect(TEMP tmp);
  [...]
%}

But then, why resetting the upper word of x11? It all starts with vector instructions:

cnt  v16.8b,   v16.8b
addv b16,      v16.8b

The 8b specifies that it operates on the 8 lower bytes of v16, it would be nice to simply use 4b, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits of v16.d[0] (aka d16), that is v16.s[1], that is v16[32:63] in a more bit-explicit notation. Moreover, unlike with general purpose register doing

mov  v16.s[0], w11

would set v16[0:31] to w11, but not reset v16[32:63]. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all of v16[0:63], which

mov  w11,      w11
mov  v16.d[0], x11

does, but by destroying x11.

Solution

Simply adding USE_KILL src in the effects would be nice, but unfortunately not possible: iRegIorL2I is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists.

The way I went for is rather not to modify the source, but rather do write the two lower words of v16 we are interested in separately:

mov  v16.s[1], wzr      ; Reset the 1-indexed word of v16, that is v16[32:63] <- 0
mov  v16.s[0], w11      ; Set the 0-indexed word of v16 to w11, that is v[0:31] <- w11
cnt  v16.8b,   v16.8b
addv b16,      v16.8b
mov  x13,      v16.s[0]

Unlike other solutions, this is relatively straightforward as it doesn't write twice the same bits, as for instance, this would:

mov  v16.d[0], xzr      ; Reset the 0-indexed double word of v16, that is v16[0:63] <- 0
mov  v16.s[0], w11      ; Set the 0-indexed word of v16 to w11, that is v[0:31] <- w11

and it doesn't use additional temporaries, like this would:

mov  w12,      w11      ; Using a fresh register x12
mov  v16.d[0], x12

Using the zero register rather than an immediate is convenient as it allows to set 32 bits at once, while a 32-bit immediate would not fit in a single instruction.

Format

The printing of this instruction is not very satisfactory. We used to have something that renders in OptoAssembly

movw l2i(R29), l2i(R29)
mov  V16, l2i(R29) # vector (1D)
cnt  V16, V16      # vector (8B)
addv V16, V16      # vector (8B)
mov  R13, V16      # vector (1D)

This is... somewhat arguable. With context, I can understand or guess what movw l2i(R29), l2i(R29) means, but I don't think it's a very nice printout. Also, it's not clear that the second instruction works on the lower word of V16. Alas, my new version is not much better:

mov  V16, zr       # vector (1S)
mov  V16, l2i(R29) # vector (1S)
cnt  V16, V16      # vector (8B)
addv V16, V16      # vector (8B)
mov  R13, V16      # vector (1D)

It's not clear that the first instruction is on the 1-indexed word of V16 while the second is on the 0-indexed word. I couldn't find a nicer example in a similar situation, so I'm open to suggestions! Maybe simply hardcoding it in the format? as such:

format %{ "mov    $tmp.s[1], zr\t# vector (1S)\n\t"
          "mov    $tmp.s[0], $src\t# vector (1S)\n\t"
          "cnt    $tmp, $tmp\t# vector (8B)\n\t"
          "addv   $tmp, $tmp\t# vector (8B)\n\t"
          "mov    $dst, $tmp\t# vector (1D)" %}

Not sure what's the best practice here.

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 (Bug - P3)(⚠️ The fixVersion in this issue is [26] but the fixVersion in .jcheck/conf is 25, a new backport will be created when this pr is integrated.)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25551/head:pull/25551
$ git checkout pull/25551

Update a local copy of the PR:
$ git checkout pull/25551
$ git pull https://git.openjdk.org/jdk.git pull/25551/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25551

View PR using the GUI difftool:
$ git pr show -t 25551

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25551.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-05-30T15:33:40Z

👋 Welcome back mchevalier! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-05-30T15:34:42Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-05-30T15:35:15Z

@marc-chevalier The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

marc-chevalier · 2025-05-30T15:36:30Z

Opinion for people in charge: should I fix the fixVersion in the JBS issue, or wait a bit to integrate?

mlbridge · 2025-05-30T15:42:02Z

Webrevs

00: Full (fb8d64d9)

dean-long · 2025-05-31T00:29:17Z

Opinion for people in charge: should I fix the fixVersion in the JBS issue, or wait a bit to integrate?

I would say yes, change the fixVersion to 25 and try to get this into 25, resulting it one less backport needed.

sendaoYan · 2025-05-31T02:59:48Z

Hi, how does this bug was found, seems the original testcase generated by a fuzz tool.

sendaoYan · 2025-05-31T03:11:28Z

test/hotspot/jtreg/compiler/intrinsics/BitCountIAarch64PreservesArgument.java

+            test();
+            if (result != 0xfedc_ba98_7654_3210L) {
+                // Wrongly outputs the cut input 0x7654_3210 == 1985229328
+                throw new RuntimeException("Wrong result. lFld=" + lFld + "; result=" + result);


How about:

throw new RuntimeException("Wrong result. Expected result = " + lFld + "; Actual result = " + result);

theRealAph · 2025-05-31T14:29:26Z

src/hotspot/cpu/aarch64/aarch64.ad

+    __ mov($tmp$$FloatRegister, __ S, 1, zr);             // tmp[32:63] <- 0
+    __ mov($tmp$$FloatRegister, __ S, 0, $src$$Register); // tmp[ 0:31] <- src


"Where the entire 128-bit wide register is not fully utilized, the vector or scalar quantity is held in the least significant bits of the register, with the most significant bits being cleared to zero on a write."

Suggested change

__ mov($tmp$$FloatRegister, __ S, 1, zr); // tmp[32:63] <- 0

__ mov($tmp$$FloatRegister, __ S, 0, $src$$Register); // tmp[ 0:31] <- src

__ fmovs($tmp$$FloatRegister, $src$$Register);

should do it.

theRealAph · 2025-05-31T14:38:23Z

Opinion for people in charge: should I fix the fixVersion in the JBS issue, or wait a bit to integrate?

Get it in 25. Low risk, significant Java compatibility bug.

marc-chevalier added 4 commits May 30, 2025 11:43

Don't change src, set directly in the vector register

6e47c2b

Add test

9635b52

Adapt test

f5c1122

Add randomization

fb8d64d

openjdk bot added the hotspot-compiler [email protected] label May 30, 2025

marc-chevalier marked this pull request as ready for review May 30, 2025 15:38

openjdk bot added the rfr Pull request is ready for review label May 30, 2025

sendaoYan reviewed May 31, 2025

View reviewed changes

theRealAph reviewed May 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 #25551

8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 #25551

marc-chevalier commented May 30, 2025 •

edited by openjdk bot

Loading

Uh oh!

bridgekeeper bot commented May 30, 2025

Uh oh!

openjdk bot commented May 30, 2025

Uh oh!

openjdk bot commented May 30, 2025

Uh oh!

marc-chevalier commented May 30, 2025

Uh oh!

mlbridge bot commented May 30, 2025

Uh oh!

dean-long commented May 31, 2025 •

edited

Loading

Uh oh!

sendaoYan commented May 31, 2025

Uh oh!

sendaoYan May 31, 2025

Uh oh!

theRealAph May 31, 2025

Uh oh!

theRealAph commented May 31, 2025

Uh oh!

Uh oh!

		__ mov($tmp$$FloatRegister, __ S, 1, zr); // tmp[32:63] <- 0
		__ mov($tmp$$FloatRegister, __ S, 0, $src$$Register); // tmp[ 0:31] <- src

	__ mov($tmp$$FloatRegister, __ S, 1, zr); // tmp[32:63] <- 0
	__ mov($tmp$$FloatRegister, __ S, 0, $src$$Register); // tmp[ 0:31] <- src
	__ fmovs($tmp$$FloatRegister, $src$$Register);

8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 #25551

Are you sure you want to change the base?

8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 #25551

Conversation

marc-chevalier commented May 30, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Format

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented May 30, 2025

Uh oh!

openjdk bot commented May 30, 2025

Uh oh!

openjdk bot commented May 30, 2025

Uh oh!

marc-chevalier commented May 30, 2025

Uh oh!

mlbridge bot commented May 30, 2025

Webrevs

Uh oh!

dean-long commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sendaoYan commented May 31, 2025

Uh oh!

sendaoYan May 31, 2025

Choose a reason for hiding this comment

Uh oh!

theRealAph May 31, 2025

Choose a reason for hiding this comment

Uh oh!

theRealAph commented May 31, 2025

Uh oh!

Uh oh!

marc-chevalier commented May 30, 2025 •

edited by openjdk bot

Loading

dean-long commented May 31, 2025 •

edited

Loading