Add regex benchmarker #491

rctcwyvrn · 2022-06-15T22:24:41Z

Simple benchmarking setup reusing the timing code from swift-collections-benchmark

top level code is real weird, let's not talk about it

lorentey · 2022-06-16T04:33:19Z

It may make sense to move the benchmark target out of the top-level package and into a hidden sub-package in a subdirectory, like with swift-collections.

This will let you be more flexible about bringing in extra dependencies for benchmarking without burdening the top-level package with them.

Granted, this isn't a package that is designed to be a dependency of anything else, so it may not matter that much!

rxwei · 2022-06-16T15:41:35Z

Package.swift

+            name: "RegexBenchmark",
+            dependencies: [
+                .product(name: "ArgumentParser", package: "swift-argument-parser"),
+                "_RegexParser",


I don't think you need this dependency

milseman

Let's get this in asap and we'll fix things in a follow up PR

milseman · 2022-06-16T15:19:03Z

Sources/RegexBenchmark/Benchmark.swift

+    var times: [Time] = []
+
+    // initial run to make sure the regex has been compiled
+    benchmark.run()


Future work: We'll want to know how much time is spent compiling vs not

milseman · 2022-06-16T15:19:56Z

Sources/RegexBenchmark/Benchmark.swift

+
+    // return median time
+    times.sort()
+    return times[samples/2]


Future work: provide the times as a type which the caller can ask for the median of.

milseman · 2022-06-16T15:22:20Z

Sources/RegexBenchmark/CLI.swift

+  var specificBenchmarks: [String] = []
+
+  @Option(help: "Run only once for profiling purposes")
+  var profile = false


Hmm.... Sometimes it helps profiling to run many samples since you'll see what warm behavior looks like and the hot parts become more pronounced. What's the difference between this flag and a sample count of 1?

milseman · 2022-06-16T15:30:05Z

Sources/RegexBenchmark/CLI.swift

+    benchmark.addBacktracking()
+    benchmark.addCSS()
+    benchmark.addFirstMatch()
+    return benchmark


Could we make it an array or some other way to make registration easier? Seems like we could then filter that array when creating the runner

milseman · 2022-06-16T15:32:56Z

Sources/RegexBenchmark/Suite/Backtracking.swift

+    let basicBacktrack = Benchmark(
+      name: "BasicBacktrack",
+      regex: try! Regex(r),
+      ty: .allMatches,


Go ahead and give the ty argument label a full name. That is ok as a argument name, but the label should clarify the use site.

milseman · 2022-06-16T15:38:51Z

Sources/RegexBenchmark/Suite/Backtracking.swift

+      ty: .first,
+      target: s
+    )
+


Hmm... we want whole-match benchmarks, match-from-front benchmarks, first-match, and all-matches (which is repeated first-match calls). We also want to make the NSRegularExpression equivalent of each, at least if there can be an equivalent.

milseman · 2022-06-16T15:41:59Z

Sources/RegexBenchmark/Suite/CssRegex.swift

+    let r = "--([a-zA-Z0-9_-]+)\\s*:\\s*(.*?):"
+
+    // sorry
+    let css = """


Move this string into its own file and give it a better internal name (or make it a static var on a CSS benchmark type). We could have an Inputs folder for these kinds of things

milseman · 2022-06-16T15:44:20Z

Sources/RegexBenchmark/Suite/ReluctantQuant.swift

+import RegexBuilder
+
+extension BenchmarkRunner {
+  mutating func addReluctantQuant() {


The regex can be .*? so that we can compare to NSRegularExpression

milseman · 2022-06-16T15:45:55Z

Sources/RegexBenchmark/Suite/ReluctantQuant.swift

+extension BenchmarkRunner {
+  mutating func addReluctantQuant() {
+    let size = 500000
+    let s = String(repeating: "a", count: size)


Future work: These are very, very micro (nano?) and would be trivialized through some analysis and optimizations that wouldn't necessarily improve the state of real regexes or components of real regexes. I wonder if we should distinguish these from more realistic ones.

milseman · 2022-06-16T15:47:33Z

@swift-ci please test

rctcwyvrn added 7 commits June 15, 2022 15:13

v1 benchmarker

98992b0

Remove top level code

3e650cb

top level code is real weird, let's not talk about it

Adjust benchmark loads for release builds (oops)

4ea430c

Benchmarking housekeeping

1384a69

Add basic CLI

fb7459d

Remove redundant string inits

4292b7a

Remove some comments

73b0482

rctcwyvrn requested review from milseman and lorentey June 15, 2022 22:24

rctcwyvrn marked this pull request as draft June 16, 2022 00:17

rxwei reviewed Jun 16, 2022

View reviewed changes

milseman approved these changes Jun 16, 2022

View reviewed changes

milseman marked this pull request as ready for review June 16, 2022 15:47

rctcwyvrn merged commit 005e0fb into swiftlang:main Jun 16, 2022

milseman mentioned this pull request Jun 23, 2022

[5.7] Merge basic performance enhancements and unmerged prior dependencies #499

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add regex benchmarker #491

Add regex benchmarker #491

rctcwyvrn commented Jun 15, 2022

lorentey commented Jun 16, 2022

rxwei Jun 16, 2022

milseman left a comment

milseman Jun 16, 2022

milseman Jun 16, 2022

milseman Jun 16, 2022

milseman Jun 16, 2022

milseman Jun 16, 2022

milseman Jun 16, 2022

milseman Jun 16, 2022

milseman Jun 16, 2022

milseman Jun 16, 2022

milseman commented Jun 16, 2022

Add regex benchmarker #491

Add regex benchmarker #491

Conversation

rctcwyvrn commented Jun 15, 2022

lorentey commented Jun 16, 2022

Choose a reason for hiding this comment

milseman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

milseman commented Jun 16, 2022