Skip to content

[benchmark] Janitor Duty: Sweep Quadratic #22673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 20, 2019

Conversation

palimondo
Copy link
Contributor

@palimondo palimondo commented Feb 16, 2019

To enable robust performance measurements by minimizing the accumulated error, this splits the composite tests from DictionatyCopy and DictionaryFilter into smaller individual benchmarks by dictionary size. These benchmarks guard against the quadratic behavior (SR-3268). Also removes the older disabled HashQuadratic benchmark that was covering the same issue.

quadratictests

This was obsoleted by DictionaryCopy.
Split the composite tests from `DictionatyCopy` and `DictionaryFilter` into individual benchmarks by dictionary size. Lowered the workloads to run faster (more stable results).
@palimondo palimondo force-pushed the a-tall-white-fountain-played branch from bfd3c74 to 8a19a98 Compare February 17, 2019 06:11
@palimondo
Copy link
Contributor Author

palimondo commented Feb 17, 2019

@swift-ci please benchmark

Running with 8a19a985c61cce4dd1964b522049b6204f0c41e4 ([DNM] Disable randomized hashSeed (test Quadratic)) that turns off the randomized hashSeed to demonstrate that the rewritten tests are still able to catch the regression.

@swiftlang swiftlang deleted a comment from swift-ci Feb 17, 2019
@swift-ci
Copy link
Contributor

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
SetSubtractingInt100 151 167 +10.6% 0.90x (?)
ObjectiveCBridgeFromNSDictionaryAnyObjectToString 74000 80500 +8.8% 0.92x (?)
StringHashing_ascii 39 42 +7.7% 0.93x (?)
Improvement
ObjectiveCBridgeStubFromNSDateRef 5050 4190 -17.0% 1.21x (?)
ObjectiveCBridgeStubFromNSString 1020 903 -11.5% 1.13x (?)
ObjectiveCBridgeStubFromArrayOfNSString2 3730 3410 -8.6% 1.09x (?)
DictionaryRemove 3280 3030 -7.6% 1.08x (?)
Added
Dict.CopyKeyValue.16k 1137 1529 1268
Dict.CopyKeyValue.20k 3644 3717 3679
Dict.CopyKeyValue.24k 1252 1252 1252
Dict.CopyKeyValue.28k 1863 2376 2034
Dict.FilterAllMatch.16k 711 711 711
Dict.FilterAllMatch.20k 849 851 850
Dict.FilterAllMatch.24k 843 861 853
Dict.FilterAllMatch.28k 1333 1337 1335
Removed
DictionaryCopy 53497 58134 55079
DictionaryFilter 44472 44768 44581

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
DictionaryCopy.o 7885 10182 +29.1% 0.77x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
SetIsSubsetBox0 362 462 +27.6% 0.78x
ObjectiveCBridgeFromNSSetAnyObjectToStringForced 96000 110500 +15.1% 0.87x (?)
SetSubtractingInt100 159 183 +15.1% 0.87x
ObjectiveCBridgeFromNSDictionaryAnyObject 40300 45700 +13.4% 0.88x (?)
SetSubtractingInt0 73 81 +11.0% 0.90x (?)
ObjectiveCBridgeFromNSSetAnyObject 53400 59200 +10.9% 0.90x (?)
ObjectiveCBridgeFromNSDictionaryAnyObjectToString 73500 80500 +9.5% 0.91x (?)
ObjectiveCBridgeFromNSDictionaryAnyObjectToStringForced 80000 87500 +9.4% 0.91x (?)
SetSubtractingInt25 101 110 +8.9% 0.92x (?)
SetSymmetricDifferenceInt100 220 239 +8.6% 0.92x (?)
ObjectiveCBridgeStubNSDateRefAccess 343 371 +8.2% 0.92x (?)
Improvement
Set.isDisjoint.Box25 713 610 -14.4% 1.17x
DictionaryRemove 5790 5340 -7.8% 1.08x (?)
Added
Dict.CopyKeyValue.16k 1695 2069 1820
Dict.CopyKeyValue.20k 4302 4365 4329
Dict.CopyKeyValue.24k 2057 2074 2067
Dict.CopyKeyValue.28k 2920 3426 3102
Dict.FilterAllMatch.16k 733 734 734
Dict.FilterAllMatch.20k 871 873 872
Dict.FilterAllMatch.24k 871 888 881
Dict.FilterAllMatch.28k 1380 1382 1381
Removed
DictionaryCopy 73998 79288 75766
DictionaryFilter 46360 46391 46373

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
DictionaryCopy.o 6177 7958 +28.8% 0.78x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
SetIsSubsetBox0 879 1715 +95.1% 0.51x
SetIsSubsetInt0 633 733 +15.8% 0.86x (?)
Improvement
Set.isDisjoint.Box25 2611 2040 -21.9% 1.28x
ObjectiveCBridgeStubFromNSDateRef 5480 4510 -17.7% 1.22x (?)
DictionarySwapOfObjects 22200 18800 -15.3% 1.18x (?)
DictionarySwapAtOfObjects 17600 14920 -15.2% 1.18x (?)
ObjectiveCBridgeStubFromNSDate 7470 6450 -13.7% 1.16x (?)
SetExclusiveOr_OfObjects 37210 33090 -11.1% 1.12x
SetSymmetricDifferenceBox0 3721 3311 -11.0% 1.12x
DictionarySwap 4780 4352 -9.0% 1.10x (?)
SetUnion_OfObjects 25940 23910 -7.8% 1.08x (?)
SetUnionBox0 2594 2392 -7.8% 1.08x (?)
Added
Dict.CopyKeyValue.16k 6090 6692 6294
Dict.CopyKeyValue.20k 26754 27239 26962
Dict.CopyKeyValue.24k 6917 7033 6956
Dict.CopyKeyValue.28k 8536 9281 8821
Dict.FilterAllMatch.16k 4733 4862 4777
Dict.FilterAllMatch.20k 24734 24899 24808
Dict.FilterAllMatch.24k 4824 4901 4853
Dict.FilterAllMatch.28k 6166 6198 6178
Removed
DictionaryCopy 266292 271700 268214
DictionaryFilter 214044 216822 214972
Benchmark Check Report
⚠️ Dict.FilterAllMatch.28k execution took at least 1333 μs.
Decrease the workload of Dict.FilterAllMatch.28k by a factor of 2 (10), to be less than 1000 μs.
⚠️Ⓜ️ Dict.FilterAllMatch.28k has very wide range of memory used between independent, repeated measurements.
Dict.FilterAllMatch.28k mem_pages [i1, i2]: min=[757, 756] 𝚫=1 R=[39, 41]
⚠️ Dict.CopyKeyValue.20k execution took at least 3640 μs.
Decrease the workload of Dict.CopyKeyValue.20k by a factor of 4 (10), to be less than 1000 μs.
⚠️Ⓜ️ Dict.CopyKeyValue.20k has very wide range of memory used between independent, repeated measurements.
Dict.CopyKeyValue.20k mem_pages [i1, i2]: min=[368, 368] 𝚫=0 R=[78, 78]
⚠️ Dict.CopyKeyValue.16k execution took at least 1119 μs.
Decrease the workload of Dict.CopyKeyValue.16k by a factor of 2 (10), to be less than 1000 μs.
⚠️Ⓜ️ Dict.CopyKeyValue.16k has very wide range of memory used between independent, repeated measurements.
Dict.CopyKeyValue.16k mem_pages [i1, i2]: min=[369, 369] 𝚫=0 R=[39, 40]
⚠️ Dict.CopyKeyValue.28k execution took at least 1860 μs.
Decrease the workload of Dict.CopyKeyValue.28k by a factor of 2 (10), to be less than 1000 μs.
⚠️Ⓜ️ Dict.CopyKeyValue.28k has very wide range of memory used between independent, repeated measurements.
Dict.CopyKeyValue.28k mem_pages [i1, i2]: min=[756, 756] 𝚫=0 R=[39, 0]
⚠️Ⓜ️ Dict.FilterAllMatch.24k has very wide range of memory used between independent, repeated measurements.
Dict.FilterAllMatch.24k mem_pages [i1, i2]: min=[369, 369] 𝚫=0 R=[39, 0]
⚠️ Dict.CopyKeyValue.24k execution took at least 1247 μs.
Decrease the workload of Dict.CopyKeyValue.24k by a factor of 2 (10), to be less than 1000 μs.
⚠️Ⓜ️ Dict.CopyKeyValue.24k has very wide range of memory used between independent, repeated measurements.
Dict.CopyKeyValue.24k mem_pages [i1, i2]: min=[368, 368] 𝚫=0 R=[39, 39]
⚠️Ⓜ️ Dict.FilterAllMatch.20k has very wide range of memory used between independent, repeated measurements.
Dict.FilterAllMatch.20k mem_pages [i1, i2]: min=[369, 369] 𝚫=0 R=[39, 39]
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@palimondo palimondo force-pushed the a-tall-white-fountain-played branch from 8a19a98 to 8e67642 Compare February 17, 2019 07:12
@palimondo
Copy link
Contributor Author

@swift-ci benchmark

@palimondo
Copy link
Contributor Author

@swift-ci please smoke test

@swift-ci
Copy link
Contributor

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
DataAppendDataLargeToLarge 38600 51200 +32.6% 0.75x (?)
Improvement
StringBuilderLong 1370 1230 -10.2% 1.11x (?)
Added
Dict.CopyKeyValue.16k 990 1387 1123
Dict.CopyKeyValue.20k 1106 1106 1106
Dict.CopyKeyValue.24k 1245 1246 1246
Dict.CopyKeyValue.28k 1893 2403 2064
Dict.FilterAllMatch.16k 715 715 715
Dict.FilterAllMatch.20k 781 782 781
Dict.FilterAllMatch.24k 848 849 848
Dict.FilterAllMatch.28k 1394 1440 1410
Removed
DictionaryCopy 53472 58353 55193
DictionaryFilter 44538 44639 44583

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
DictionaryCopy.o 7885 10182 +29.1% 0.77x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
SetSubtractingInt25 102 111 +8.8% 0.92x (?)
ObjectiveCBridgeStubNSDateRefAccess 343 371 +8.2% 0.92x (?)
Improvement
DataAppendDataLargeToLarge 51000 37400 -26.7% 1.36x (?)
Added
Dict.CopyKeyValue.16k 1587 1961 1717
Dict.CopyKeyValue.20k 1745 1762 1754
Dict.CopyKeyValue.24k 1946 1967 1960
Dict.CopyKeyValue.28k 3050 3572 3225
Dict.FilterAllMatch.16k 742 745 744
Dict.FilterAllMatch.20k 811 812 811
Dict.FilterAllMatch.24k 879 887 882
Dict.FilterAllMatch.28k 1449 1464 1455
Removed
DictionaryCopy 73938 78856 75645
DictionaryFilter 46272 46457 46371

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
DictionaryCopy.o 6177 7958 +28.8% 0.78x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Improvement
ObjectiveCBridgeStubFromNSDateRef 5480 4510 -17.7% 1.22x (?)
ObjectiveCBridgeStubFromNSDate 7460 6450 -13.5% 1.16x (?)
Added
Dict.CopyKeyValue.16k 4662 5314 4881
Dict.CopyKeyValue.20k 5548 5683 5636
Dict.CopyKeyValue.24k 6502 6699 6570
Dict.CopyKeyValue.28k 8619 9410 8884
Dict.FilterAllMatch.16k 3235 3280 3252
Dict.FilterAllMatch.20k 3774 3816 3788
Dict.FilterAllMatch.24k 4376 4424 4392
Dict.FilterAllMatch.28k 6096 6120 6110
Removed
DictionaryCopy 297362 305285 300035
DictionaryFilter 214145 214664 214334
Benchmark Check Report
⚠️ Dict.FilterAllMatch.28k execution took at least 1392 μs.
Decrease the workload of Dict.FilterAllMatch.28k by a factor of 2 (10), to be less than 1000 μs.
⚠️Ⓜ️ Dict.FilterAllMatch.28k has very wide range of memory used between independent, repeated measurements.
Dict.FilterAllMatch.28k mem_pages [i1, i2]: min=[757, 757] 𝚫=0 R=[39, 39]
⚠️ Dict.CopyKeyValue.20k execution took at least 1105 μs.
Decrease the workload of Dict.CopyKeyValue.20k by a factor of 2 (10), to be less than 1000 μs.
⚠️Ⓜ️ Dict.CopyKeyValue.20k has very wide range of memory used between independent, repeated measurements.
Dict.CopyKeyValue.20k mem_pages [i1, i2]: min=[368, 368] 𝚫=0 R=[39, 40]
⚠️Ⓜ️ Dict.CopyKeyValue.16k has very wide range of memory used between independent, repeated measurements.
Dict.CopyKeyValue.16k mem_pages [i1, i2]: min=[369, 368] 𝚫=1 R=[39, 40]
⚠️ Dict.CopyKeyValue.28k execution took at least 1888 μs.
Decrease the workload of Dict.CopyKeyValue.28k by a factor of 2 (10), to be less than 1000 μs.
⚠️Ⓜ️ Dict.CopyKeyValue.28k has very wide range of memory used between independent, repeated measurements.
Dict.CopyKeyValue.28k mem_pages [i1, i2]: min=[756, 755] 𝚫=1 R=[39, 40]
⚠️Ⓜ️ Dict.FilterAllMatch.24k has very wide range of memory used between independent, repeated measurements.
Dict.FilterAllMatch.24k mem_pages [i1, i2]: min=[369, 369] 𝚫=0 R=[0, 39]
⚠️Ⓜ️ Dict.FilterAllMatch.16k has very wide range of memory used between independent, repeated measurements.
Dict.FilterAllMatch.16k mem_pages [i1, i2]: min=[369, 369] 𝚫=0 R=[39, 0]
⚠️ Dict.CopyKeyValue.24k execution took at least 1244 μs.
Decrease the workload of Dict.CopyKeyValue.24k by a factor of 2 (10), to be less than 1000 μs.
⚠️Ⓜ️ Dict.CopyKeyValue.24k has very wide range of memory used between independent, repeated measurements.
Dict.CopyKeyValue.24k mem_pages [i1, i2]: min=[368, 368] 𝚫=0 R=[39, 39]
⚠️Ⓜ️ Dict.FilterAllMatch.20k has very wide range of memory used between independent, repeated measurements.
Dict.FilterAllMatch.20k mem_pages [i1, i2]: min=[369, 369] 𝚫=0 R=[0, 39]
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

Change the test to work after removal of `HashQuadratic`.
@palimondo
Copy link
Contributor Author

palimondo commented Feb 18, 2019

@swift-ci please test

Running full tests to verify the adjusted Benchmark_O tests pass after the removal of HashQuadratic.

@palimondo palimondo changed the title [benchmark] Janitor Duty: Swipe Quadratic [benchmark] Janitor Duty: Sweep Quadratic Feb 18, 2019
@palimondo
Copy link
Contributor Author

@eeckstein @lorentey Please review 🙏

@swift-ci

This comment has been minimized.

@palimondo
Copy link
Contributor Author

@swift-ci smoke test linux platform

@palimondo
Copy link
Contributor Author

palimondo commented Feb 18, 2019

@eeckstein I believe this is now mergeable: the full test that runs Benchmark_O.test.md passed on mac os and the previously broken Linux smoke test (unrelated to this PR) now also passed (full test there is not required as the benchmark validation does not run on Linux).

As for the new benchmarks, the first benchmark run with disabled hashSeed randomization demonstrates how the 20k variants (in particular) successfully catch the quadratic behavior even though they run in substantially shorter time then the 2 removed benchmarks. Many of these did not make it under 1000 μs, but that's OK, they are all below 2000 μs — 5 times lower than the 10k scheduler quantum. These are much more robust against system noise than before.

@eeckstein
Copy link
Contributor

I'll let @lorentey review this

Copy link
Member

@lorentey lorentey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code reorganization looks good to me! I don't much mind losing the old variants here -- although I guess they did have some value in release-to-release comparisons, I expect we have other benchmarks for insertion at least.

I have one question about the results, though: If I read the benchmark results correctly, the Quadratic Surprise doesn't show up in the filter benchmarks. I can see the bump in Dict.CopyKeyValue.20k, but there is no corresponding increase for Dict.FilterAllMatch.20k -- all the filter results seem to be roughly the same.

The implementation of filter will change at some point soon, but I'm wondering if 16k--28k is large enough to trigger the issue.

@palimondo
Copy link
Contributor Author

I have also noticed the filter has been much flatter in response, but the quadratic behavior is still fully visible in the -Onone build. The filter in optimized builds had less pronounced peak at the low end since PR #19213 when you switched it to work directly with native dictionary.

@lorentey
Copy link
Member

Ah, that makes sense 👍

@palimondo
Copy link
Contributor Author

@lorentey Thank you!

@palimondo palimondo merged commit 4ca08a5 into swiftlang:master Feb 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants