-
Notifications
You must be signed in to change notification settings - Fork 5.9k
8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 #25551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
8353266: C2: Wrong execution with Integer.bitCount(int) intrinsic on AArch64 #25551
Conversation
👋 Welcome back mchevalier! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@marc-chevalier The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Opinion for people in charge: should I fix the fixVersion in the JBS issue, or wait a bit to integrate? |
I would say yes, change the fixVersion to 25 and try to get this into 25, resulting it one less backport needed. |
Hi, how does this bug was found, seems the original testcase generated by a fuzz tool. |
test(); | ||
if (result != 0xfedc_ba98_7654_3210L) { | ||
// Wrongly outputs the cut input 0x7654_3210 == 1985229328 | ||
throw new RuntimeException("Wrong result. lFld=" + lFld + "; result=" + result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about:
throw new RuntimeException("Wrong result. Expected result = " + lFld + "; Actual result = " + result);
__ mov($tmp$$FloatRegister, __ S, 1, zr); // tmp[32:63] <- 0 | ||
__ mov($tmp$$FloatRegister, __ S, 0, $src$$Register); // tmp[ 0:31] <- src |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Where the entire 128-bit wide register is not fully utilized, the vector or scalar quantity is held in the least significant bits of the register, with the most significant bits being cleared to zero on a write."
__ mov($tmp$$FloatRegister, __ S, 1, zr); // tmp[32:63] <- 0 | |
__ mov($tmp$$FloatRegister, __ S, 0, $src$$Register); // tmp[ 0:31] <- src | |
__ fmovs($tmp$$FloatRegister, $src$$Register); |
should do it.
Get it in 25. Low risk, significant Java compatibility bug. |
Problem
On Aarch64, using
Integer.bitCount
can modify its argument. The problem comes from the implementation ofpopCountI
on Aarch64. For instance, that's what we get with the reproducerReduced.java
on the related issue:The instruction
mov w11, w11
is used to cut the 32 higher bits ofx11
since we usepopCountI
(fromInteger.bitCount
): on aarch64 (like other architectures), assigning the 32 lower bits of a register reset the 32 higher bits. Short: the input is modified, but the implementation ofpopCountI
doesn't declare it:But then, why resetting the upper word of
x11
? It all starts with vector instructions:The
8b
specifies that it operates on the 8 lower bytes ofv16
, it would be nice to simply use4b
, but that doesn't exist: vector instructions can only work on either the whole 128-bit register, or the 64 lower bits (by blocks of 1, 2, 4, 8 or 16 bytes). There is no suffix (and encoding) for a vector instruction to work only on the 32 lower bits, so not to pollute the bit count, we need to reset the 32 higher bits ofv16.d[0]
(akad16
), that isv16.s[1]
, that isv16[32:63]
in a more bit-explicit notation. Moreover, unlike with general purpose register doingwould set
v16[0:31]
tow11
, but not resetv16[32:63]
. Which makes sense! Otherwise, using vector registers would be impractical if writing any piece would reset the rest... So we indeed need to set all ofv16[0:63]
, whichdoes, but by destroying
x11
.Solution
Simply adding
USE_KILL src
in the effects would be nice, but unfortunately not possible:iRegIorL2I
is an operand class (either a 32-bit register or a L2I of a 64-bit register) and those cannot be used in effect lists.The way I went for is rather not to modify the source, but rather do write the two lower words of
v16
we are interested in separately:Unlike other solutions, this is relatively straightforward as it doesn't write twice the same bits, as for instance, this would:
and it doesn't use additional temporaries, like this would:
Using the zero register rather than an immediate is convenient as it allows to set 32 bits at once, while a 32-bit immediate would not fit in a single instruction.
Format
The printing of this instruction is not very satisfactory. We used to have something that renders in OptoAssembly
This is... somewhat arguable. With context, I can understand or guess what
movw l2i(R29), l2i(R29)
means, but I don't think it's a very nice printout. Also, it's not clear that the second instruction works on the lower word ofV16
. Alas, my new version is not much better:It's not clear that the first instruction is on the 1-indexed word of
V16
while the second is on the 0-indexed word. I couldn't find a nicer example in a similar situation, so I'm open to suggestions! Maybe simply hardcoding it in the format? as such:Not sure what's the best practice here.
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25551/head:pull/25551
$ git checkout pull/25551
Update a local copy of the PR:
$ git checkout pull/25551
$ git pull https://git.openjdk.org/jdk.git pull/25551/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 25551
View PR using the GUI difftool:
$ git pr show -t 25551
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25551.diff
Using Webrev
Link to Webrev Comment