Skip to content

Add regex benchmarker #491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 16, 2022
Merged

Add regex benchmarker #491

merged 7 commits into from
Jun 16, 2022

Conversation

rctcwyvrn
Copy link
Contributor

Simple benchmarking setup reusing the timing code from swift-collections-benchmark

@rctcwyvrn rctcwyvrn requested review from milseman and lorentey June 15, 2022 22:24
@rctcwyvrn rctcwyvrn marked this pull request as draft June 16, 2022 00:17
@lorentey
Copy link
Member

It may make sense to move the benchmark target out of the top-level package and into a hidden sub-package in a subdirectory, like with swift-collections.

This will let you be more flexible about bringing in extra dependencies for benchmarking without burdening the top-level package with them.

Granted, this isn't a package that is designed to be a dependency of anything else, so it may not matter that much!

name: "RegexBenchmark",
dependencies: [
.product(name: "ArgumentParser", package: "swift-argument-parser"),
"_RegexParser",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this dependency

Copy link
Member

@milseman milseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this in asap and we'll fix things in a follow up PR

var times: [Time] = []

// initial run to make sure the regex has been compiled
benchmark.run()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future work: We'll want to know how much time is spent compiling vs not


// return median time
times.sort()
return times[samples/2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future work: provide the times as a type which the caller can ask for the median of.

var specificBenchmarks: [String] = []

@Option(help: "Run only once for profiling purposes")
var profile = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.... Sometimes it helps profiling to run many samples since you'll see what warm behavior looks like and the hot parts become more pronounced. What's the difference between this flag and a sample count of 1?

benchmark.addBacktracking()
benchmark.addCSS()
benchmark.addFirstMatch()
return benchmark
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make it an array or some other way to make registration easier? Seems like we could then filter that array when creating the runner

let basicBacktrack = Benchmark(
name: "BasicBacktrack",
regex: try! Regex(r),
ty: .allMatches,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go ahead and give the ty argument label a full name. That is ok as a argument name, but the label should clarify the use site.

ty: .first,
target: s
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... we want whole-match benchmarks, match-from-front benchmarks, first-match, and all-matches (which is repeated first-match calls). We also want to make the NSRegularExpression equivalent of each, at least if there can be an equivalent.

let r = "--([a-zA-Z0-9_-]+)\\s*:\\s*(.*?):"

// sorry
let css = """
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this string into its own file and give it a better internal name (or make it a static var on a CSS benchmark type). We could have an Inputs folder for these kinds of things

import RegexBuilder

extension BenchmarkRunner {
mutating func addReluctantQuant() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex can be .*? so that we can compare to NSRegularExpression

extension BenchmarkRunner {
mutating func addReluctantQuant() {
let size = 500000
let s = String(repeating: "a", count: size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future work: These are very, very micro (nano?) and would be trivialized through some analysis and optimizations that wouldn't necessarily improve the state of real regexes or components of real regexes. I wonder if we should distinguish these from more realistic ones.

@milseman milseman marked this pull request as ready for review June 16, 2022 15:47
@milseman
Copy link
Member

@swift-ci please test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants