-
Notifications
You must be signed in to change notification settings - Fork 49
Add regex benchmarker #491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
top level code is real weird, let's not talk about it
It may make sense to move the benchmark target out of the top-level package and into a hidden sub-package in a subdirectory, like with swift-collections. This will let you be more flexible about bringing in extra dependencies for benchmarking without burdening the top-level package with them. Granted, this isn't a package that is designed to be a dependency of anything else, so it may not matter that much! |
name: "RegexBenchmark", | ||
dependencies: [ | ||
.product(name: "ArgumentParser", package: "swift-argument-parser"), | ||
"_RegexParser", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this dependency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get this in asap and we'll fix things in a follow up PR
var times: [Time] = [] | ||
|
||
// initial run to make sure the regex has been compiled | ||
benchmark.run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future work: We'll want to know how much time is spent compiling vs not
|
||
// return median time | ||
times.sort() | ||
return times[samples/2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future work: provide the times as a type which the caller can ask for the median of.
var specificBenchmarks: [String] = [] | ||
|
||
@Option(help: "Run only once for profiling purposes") | ||
var profile = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.... Sometimes it helps profiling to run many samples since you'll see what warm behavior looks like and the hot parts become more pronounced. What's the difference between this flag and a sample count of 1?
benchmark.addBacktracking() | ||
benchmark.addCSS() | ||
benchmark.addFirstMatch() | ||
return benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make it an array or some other way to make registration easier? Seems like we could then filter that array when creating the runner
let basicBacktrack = Benchmark( | ||
name: "BasicBacktrack", | ||
regex: try! Regex(r), | ||
ty: .allMatches, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go ahead and give the ty
argument label a full name. That is ok as a argument name, but the label should clarify the use site.
ty: .first, | ||
target: s | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... we want whole-match benchmarks, match-from-front benchmarks, first-match, and all-matches (which is repeated first-match calls). We also want to make the NSRegularExpression equivalent of each, at least if there can be an equivalent.
let r = "--([a-zA-Z0-9_-]+)\\s*:\\s*(.*?):" | ||
|
||
// sorry | ||
let css = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this string into its own file and give it a better internal name (or make it a static var on a CSS benchmark type). We could have an Inputs
folder for these kinds of things
import RegexBuilder | ||
|
||
extension BenchmarkRunner { | ||
mutating func addReluctantQuant() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex can be .*?
so that we can compare to NSRegularExpression
extension BenchmarkRunner { | ||
mutating func addReluctantQuant() { | ||
let size = 500000 | ||
let s = String(repeating: "a", count: size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future work: These are very, very micro (nano?) and would be trivialized through some analysis and optimizations that wouldn't necessarily improve the state of real regexes or components of real regexes. I wonder if we should distinguish these from more realistic ones.
@swift-ci please test |
Simple benchmarking setup reusing the timing code from
swift-collections-benchmark