Skip to content

Optimize VM's String.== implementation #50190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mkustermann opened this issue Oct 13, 2022 · 4 comments
Closed

Optimize VM's String.== implementation #50190

mkustermann opened this issue Oct 13, 2022 · 4 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P1 A high priority bug; for example, a single project is unusable or has many test failures type-performance Issue relates to performance or code size

Comments

@mkustermann
Copy link
Member

Currently the Dart VM's String equality implementation is based on assembly intrinsics which load & compare each single byte (see asm_intrinsifier_x64.cc):

 
  // Check contents, no fall-through possible.
  // TODO(srdjan): write a faster check.
  __ SmiUntag(RDI);
  __ Bind(&loop);
  __ decq(RDI);
  __ cmpq(RDI, Immediate(0));
  __ j(LESS, &is_true, Assembler::kNearJump);
  if (string_cid == kOneByteStringCid) {
    __ movzxb(RBX, FieldAddress(RAX, RDI, TIMES_1,
                                target::OneByteString::data_offset()));
    __ movzxb(RDX, FieldAddress(RCX, RDI, TIMES_1,
                                target::OneByteString::data_offset()));
  } else if (string_cid == kTwoByteStringCid) {
    __ movzxw(RBX, FieldAddress(RAX, RDI, TIMES_2,
                                target::TwoByteString::data_offset()));
    __ movzxw(RDX, FieldAddress(RCX, RDI, TIMES_2,
                                target::TwoByteString::data_offset()));
  } else {
    UNIMPLEMENTED();
  }
  __ cmpq(RBX, RDX);
  __ j(NOT_EQUAL, &is_false, Assembler::kNearJump);
  __ jmp(&loop, Assembler::kNearJump);

We should consider optimizing this to use word-based loads & compares followed by remainder byte based loads & compares.

We may also consider introducing a MemoryCompareInstr (similar to our existing MemoryCopyInstr).

/cc @aam Maybe interested in picking this up?

@mkustermann mkustermann added area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. type-performance Issue relates to performance or code size labels Oct 13, 2022
@mkustermann
Copy link
Member Author

mkustermann commented Oct 13, 2022

/cc @rakudrama this shows up in dart2js profile (it may indicate some strings would benefit from canonicalization - to more often hit the fast if (identical(this, other)) return true path)

@rakudrama
Copy link
Member

A string fits in a number of words. What is stored in the bytes of the last word beyond the length (or two words with heap alignment)? If it is deterministic and consistent (e.g. zero-fill or part of the bit-pattern for null), the words can be compared for the last word too, leaving no per-character loop.

Increasingly esoteric optimizations:

If the length is stored immediately before the characters, the block comparison can start with the length word, since if the lengths are different, the block comparison will fail before accessing beyond the length of the shorter string. If the 'internal fragmentation' dead bytes are nondeterministic, it might still be better to do an unaligned load of the last characters rather than loop. In a short string, the word load at mem[string+characters_offset+length-8] might underflow the characters, but this would be ok provided the length is stored at the underflow. An alternative to per-character testing or unaligned load is to shift out the uninitialized bytes of the last word, the shift count being a function of the low bits of the byte-length. Maybe there is a SIMD idiom for this.

Strings also have a cached hash-code. If the hash-codes are different it may be because the strings are different, providing an early exit for long strings with a common prefix (e.g. Uri strings), or one of the hash-codes has not been computed. If two strings with different hash-codes are equal it means that one is computed and the other can have the hash-code set by copying from the string with the initialized hash-code.

@mkustermann If it is easy, can you add the call stack where you saw this in the dart2js profile?

@aam
Copy link
Contributor

aam commented Oct 13, 2022

thanks @mkustermann , sounds good

@mkustermann
Copy link
Member Author

@mkustermann If it is easy, can you add the call stack where you saw this in the dart2js profile?

From profile of dart2js's global analysis phase:

- 1.61%  DartWorker       [.] String.== (#4)                                                                                                                                                                                       ▒
   - 1.39% String.== (#4)                                                                                                                                                                                                             ▒
      - 0.91% PrivateName.matches                                                                                                                                                                                                     ▒
         - Selector.appliesUnnamed                                                                                                                                                                                                    ▒
            - 0.90% FunctionSetNode.query                                                                                                                                                                                             ▒
                 FunctionSet.query                                                                                                                                                                                                    ▒
                 FunctionSet.filter                                                                                                                                                                                                   ▒
                 JsClosedWorld.locateMembersInDomain                                                                                                                                                                                  ▒
               + JsClosedWorld.locateMembers                                                                                                                                                                                          ▒
      - 0.24% _Uri.port                                                                                                                                                                                                               ▒
         - _Uri.==                                                                                                                                                                                                                    ▒
            - 0.23% PrivateName.matches                                                                                                                                                                                               ▒
               - Selector.appliesUnnamed                                                                                                                                                                                              ▒
                  + 0.23% FunctionSetNode.query    

(Not sure why it shows _Uri.port - may be misattributed somwhow)

@aam aam self-assigned this Oct 28, 2022
@aam aam added the P1 A high priority bug; for example, a single project is unusable or has many test failures label Nov 1, 2022
copybara-service bot pushed a commit that referenced this issue Nov 11, 2022
…arison performance.

BUG=#50190
TEST=ci

Change-Id: I1cb93455283b19cf1a712132920b7d3e1dabcd8a
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/269380
Commit-Queue: Alexander Aprelev <[email protected]>
Reviewed-by: Siva Annamalai <[email protected]>
copybara-service bot pushed a commit that referenced this issue Nov 16, 2022
Follow-up to address comments on https://dart-review.git.corp.google.com/c/sdk/+/269380

BUG=#50190
TEST=ci

Change-Id: I5a979a4504e205f493ee71c74d405f1b65246781
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/269602
Reviewed-by: Stephen Adams <[email protected]>
Commit-Queue: Alexander Aprelev <[email protected]>
copybara-service bot pushed a commit that referenced this issue Nov 21, 2022
Bug: #50190
Change-Id: If9d350622217100a6882c10978e4434d51f4fba3
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/271081
Commit-Queue: Jonas Termansen <[email protected]>
Reviewed-by: Jonas Termansen <[email protected]>
Auto-Submit: William Hesse <[email protected]>
copybara-service bot pushed a commit that referenced this issue Nov 29, 2022
Ensure padding around data and length fields for these strings is zeroed-out, allowing for comparison to be done in word-sized chunks.
For configurations where the hash is not in the header, swap the length and the hash in string class so that the length is adjacent to the data.

x64
===
LongStringCompare.2.3000reps 109.2%
LongStringCompare.32.1000reps 526.4%
LongStringCompare.1024.30reps 670.1%
===

ia32
===
LongStringCompare.2.3000reps 90.68%
LongStringCompare.32.1000reps 274.5%
LongStringCompare.1024.30reps 211.4%
===

arm64
===
LongStringCompare.2.3000reps 89.28%
LongStringCompare.32.1000reps 454.5%
LongStringCompare.1024.30reps 677.7%
===

arm
===
LongStringCompare.2.3000reps 68.76%
LongStringCompare.32.1000reps 330.9%
LongStringCompare.1024.30reps 426.8%
===


BUG=#50190
TEST=string_equals_test

Change-Id: Ia4ab9bcb4daca5d4f403e0ece364bb6aafb68577
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/269600
Commit-Queue: Alexander Aprelev <[email protected]>
Reviewed-by: Ryan Macnak <[email protected]>
@aam aam closed this as completed Nov 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P1 A high priority bug; for example, a single project is unusable or has many test failures type-performance Issue relates to performance or code size
Projects
None yet
Development

No branches or pull requests

4 participants