Fix reference processing crash #309

wks · 2025-05-09T10:08:05Z

Fixed a bug that when enqueuing references, it swapped the reference pending list with the last node of the new linked list. It should swap with the first node instead.

Worked around a fact that the list given by the method ReferenceGlue::enqueue_references may contain duplicated elements. This is because the reference processor in mmtk-core was designed to work with both of the two use patterns: (1) registering references when created, and (2) discovering references while tracing. Therefore, the reference processor kept the list of Reference instances discovered in the previous GC (in the from-space). But as the OpenJDK binding scans new Reference instances, it will add to-space addresses of the same Reference instances to the reference processor. After the references are processed, both pointers will be normalized to the two-space copy, but as two different entries. We work around this problem in the OpenJDK binding by simply skipping visited entries.

Fixed a bug that when enqueuing references, it swapped the reference pending list with the last node of the new linked list. It should swap with the *first* node instead. Worked around a fact that the list given by the method `ReferenceGlue::enqueue_references` may contain duplicated elements. This is because the reference processor in mmtk-core was designed to work with both of the two use patterns: (1) registering references when created, and (2) discovering references while tracing. Therefore, the reference processor kept the list of Reference instances discovered in the previous GC (in the from-space). But as the OpenJDK binding scans new Reference instances, it will add to-space addresses of the same Reference instances to the reference processor. After the references are processed, both pointers will be normalized to the two-space copy, but as two different entries. We work around this problem in the OpenJDK binding by simply skipping visited entries.

wks · 2025-05-09T10:09:25Z

An interesting side effect of this PR is that lusearch in DaCapo Chopin runs faster with reference processing enabled.

Edit: This is not always true. If I increase the heap size, GenImmix will become slower with ref proc. But StickyImmix is still faster with ref proc. The jython benchmark is the opposite: it is always faster without ref proc.

wks@luna ~/o/dacapo> MMTK_NO_REFERENCE_TYPES=false MMTK_PLAN=GenImmix taskset -c 0-15 ~/projects/mmtk-github/parallels/fix/refproc-discover/openjdk/build/linux-x86_64-normal-server-release/jdk/bin/java -XX:+UseThirdPartyHeap --add-exports java.base/jdk.internal.ref=ALL-UNNAMED -Djava.library.path=$HOME/projects/mmtk-github/probes/out -Dprobes=RustMMTk -cp $HOME/projects/mmtk-github/probes/out/probes.jar:dacapo-23.11-MR2-chopin.jar -Xm{s,x}33M Harness -c probe.DacapoChopinCallback lusearch
Using scaled threading model. 16 processors detected, 16 threads used to drive the workload, in a possible range of [1,2048]
Version: lucene 9.7.0 (use -p to print nominal benchmark stats)
===== DaCapo 23.11-MR2-chopin lusearch starting =====
Starting 524288 requests...
Completing query batches: 100%
Completed requests
============================ MMTk Statistics Totals ============================
GC      time.other      time.stw        majorGC total-work.count        total-work.time.total   total-work.time.max     total-work.time.min
2884    2771.51 1827.13 450     0       0.000   -inf    inf
Total time: 4598.65 ms
------------------------------ End MMTk Statistics -----------------------------
===== DaCapo 23.11-MR2-chopin lusearch PASSED in 4598 msec =====
===== DaCapo processed 524288 requests in 4595 msec, 114099 requests per second =====
===== DaCapo tail latency, simple: 50% 30 usec, 90% 273 usec, 99% 1473 usec, 99.9% 4298 usec, 99.99% 17655 usec, max 52813 usec, measured over 524288 events =====
===== DaCapo tail latency, metered 100ms smoothing: 50% 216 usec, 90% 1857 usec, 99% 8366 usec, 99.9% 23957 usec, 99.99% 36975 usec, max 61184 usec, measured over 524288 events =====
===== DaCapo tail latency, metered full smoothing: 50% 234338 usec, 90% 358797 usec, 99% 373417 usec, 99.9% 376738 usec, 99.99% 377318 usec, max 382308 usec, measured over 524288 events =====
wks@luna ~/o/dacapo> MMTK_NO_REFERENCE_TYPES=true MMTK_PLAN=GenImmix taskset -c 0-15 ~/projects/mmtk-github/parallels/fix/refproc-discover/openjdk/build/linux-x86_64-normal-server-release/jdk/bin/java -XX:+UseThirdPartyHeap --add-exports java.base/jdk.internal.ref=ALL-UNNAMED -Djava.library.path=$HOME/projects/mmtk-github/probes/out -Dprobes=RustMMTk -cp $HOME/projects/mmtk-github/probes/out/probes.jar:dacapo-23.11-MR2-chopin.jar -Xm{s,x}33M Harness -c probe.DacapoChopinCallback lusearch
Using scaled threading model. 16 processors detected, 16 threads used to drive the workload, in a possible range of [1,2048]
Version: lucene 9.7.0 (use -p to print nominal benchmark stats)
===== DaCapo 23.11-MR2-chopin lusearch starting =====
Starting 524288 requests...
Completing query batches: 100%
Completed requests
============================ MMTk Statistics Totals ============================
GC      time.other      time.stw        majorGC total-work.count        total-work.time.total   total-work.time.max     total-work.time.min
3068    2864.54 2866.29 1941    0       0.000   -inf    inf
Total time: 5730.83 ms
------------------------------ End MMTk Statistics -----------------------------
===== DaCapo 23.11-MR2-chopin lusearch PASSED in 5731 msec =====
===== DaCapo processed 524288 requests in 5727 msec, 91546 requests per second =====
===== DaCapo tail latency, simple: 50% 32 usec, 90% 298 usec, 99% 1653 usec, 99.9% 4437 usec, 99.99% 17297 usec, max 47304 usec, measured over 524288 events =====
===== DaCapo tail latency, metered 100ms smoothing: 50% 99 usec, 90% 1670 usec, 99% 6810 usec, 99.9% 26532 usec, 99.99% 35641 usec, max 50804 usec, measured over 524288 events =====
===== DaCapo tail latency, metered full smoothing: 50% 50 usec, 90% 231101 usec, 99% 304568 usec, 99.9% 306834 usec, 99.99% 309198 usec, max 326663 usec, measured over 524288 events =====

Reference processing tests now incclude all test cases in `ci-test-only-normal.sh`.

wks · 2025-05-12T06:23:29Z

@qinsoon I marked this as ready. Since it is a correctness fix, and we plan to move to OpenJDK's own ref processing, I'll not do detailed performance analysis now. It should have no performance impact if reference processing is not enabled. This PR does not change the default value of the option MMTK_NO_REFERENCE_TYPES.

qinsoon

LGTM. Thanks for the fix.

Run the same tests with and without ref proc

cb63c2f

Reference processing tests now incclude all test cases in `ci-test-only-normal.sh`.

wks marked this pull request as ready for review May 12, 2025 06:18

wks requested a review from qinsoon May 12, 2025 06:19

qinsoon approved these changes May 12, 2025

View reviewed changes

wks merged commit cf7879f into mmtk:master May 12, 2025
56 of 57 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix reference processing crash #309

Fix reference processing crash #309

Uh oh!

wks commented May 9, 2025

Uh oh!

wks commented May 9, 2025 •

edited

Loading

Uh oh!

wks commented May 12, 2025

Uh oh!

qinsoon left a comment

Uh oh!

Uh oh!

Uh oh!

Fix reference processing crash #309

Fix reference processing crash #309

Uh oh!

Conversation

wks commented May 9, 2025

Uh oh!

wks commented May 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wks commented May 12, 2025

Uh oh!

qinsoon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wks commented May 9, 2025 •

edited

Loading