Support for custom metrics reported in the Benchmarks #735

adamsitnik · 2018-08-27T04:41:44Z

This PR enables two things:

executing every benchmark in an isolated process
reporting custom metrics per benchmark

Why should we run every benchmark in a separate process?

Because most of ML.NET benchmarks allocate a lot of memory which affect GC Generation sizes and affects final results (GC is self-tuning if we run all the benchmarks in the same process GC won't be able to find a solution that is great for all of the benchmarks)
Most of the ML.NET can have potential side effects. Example: running train benchmark after running predict benchmark in the same process can possibly affect the results. With new process per benchmark, we always start at the same place and have repeatable results.

Results when running all the benchmarks in the same process:

Type	Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
KMeansAndLogisticRegressionBench	TrainKMeansAndLR	2,134.265 ms	164.3370 ms	189.2507 ms	16000.0000	9000.0000	3000.0000	49949.23 KB
StochasticDualCoordinateAscentClassifierBench	TrainSentiment	2,130.503 ms	24.8173 ms	23.2141 ms	122000.0000	35000.0000	5000.0000	759772.8 KB
StochasticDualCoordinateAscentClassifierBench	TrainIris	834.229 ms	254.5284 ms	293.1152 ms	6000.0000	1000.0000	-	12173.28 KB
StochasticDualCoordinateAscentClassifierBench	PredictIris	2.472 ms	0.1202 ms	0.1384 ms	35.1563	15.6250	3.9063	123.24 KB
StochasticDualCoordinateAscentClassifierBench	PredictIrisBatchOf1	2.712 ms	0.3276 ms	0.3773 ms	35.1563	15.6250	3.9063	123.2 KB
StochasticDualCoordinateAscentClassifierBench	PredictIrisBatchOf2	2.370 ms	0.1334 ms	0.1482 ms	35.1563	15.6250	3.9063	123.31 KB
StochasticDualCoordinateAscentClassifierBench	PredictIrisBatchOf5	2.492 ms	0.1678 ms	0.1865 ms	35.1563	15.6250	3.9063	123.61 KB

When running every benchmark in a dedicated process:

Type	Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
KMeansAndLogisticRegressionBench	TrainKMeansAndLR	1,968.326 ms	84.3827 ms	97.1753 ms	16000.0000	9000.0000	3000.0000	50027.36 KB
StochasticDualCoordinateAscentClassifierBench	TrainIris	604.496 ms	238.4849 ms	274.6396 ms	59000.0000	1000.0000	-	76697.5 KB
StochasticDualCoordinateAscentClassifierBench	TrainSentiment	1,829.670 ms	10.9792 ms	10.2699 ms	123000.0000	35000.0000	6000.0000	759758.03 KB
StochasticDualCoordinateAscentClassifierBench	PredictIris	1.895 ms	0.0132 ms	0.0111 ms	35.1563	15.6250	3.9063	121.87 KB
StochasticDualCoordinateAscentClassifierBench	PredictIrisBatchOf1	1.941 ms	0.0145 ms	0.0121 ms	35.1563	15.6250	3.9063	119.94 KB
StochasticDualCoordinateAscentClassifierBench	PredictIrisBatchOf2	1.960 ms	0.0676 ms	0.0751 ms	35.1563	15.6250	3.9063	121.94 KB
StochasticDualCoordinateAscentClassifierBench	PredictIrisBatchOf5	1.870 ms	0.0043 ms	0.0036 ms	37.1094	17.5781	3.9063	120.35 KB

To run every benchmark in a standalone, dedicated process BenchmarkDotNet needs to be able to create, build and run new executable.

So far it was not possible out of the box due to MSBuild limitation. When Microsoft.ML.Benchmarks references native assembly and the auto-generated BenchmarkDotNet project references Microsoft.ML.Benchmarks the native dependencies are not copied to the output folder of the auto-generated project with benchmarks. This is why I had to implement ProjectGenerator which does that for us.

@eerhardt we had a conversation about making it possible for BenchmarkDotNet to compile ML.NET stuff a long time ago and the blocker was the native dependency.

The other thing are custom metrics. BenchmarkDotNet does not support it out of the box, I had to implement it. How it works:

If given type wants to report custom metrics it has to derive from WithExtraMetrics and implement IEnumerable<Metric> GetMetrics() method
WithExtraMetrics after running the benchmarks prints the custom metrics to console in child process
ExtraMetricColumn parses the output in parent process.

Sample results:

Type	Method	Extra Metric
KMeansAndLogisticRegressionBench	TrainKMeansAndLR	-
StochasticDualCoordinateAscentClassifierBench	TrainIris	-
StochasticDualCoordinateAscentClassifierBench	TrainSentiment	-
StochasticDualCoordinateAscentClassifierBench	PredictIris	AccuracyMacro: 0.98
StochasticDualCoordinateAscentClassifierBench	PredictIrisBatchOf1	AccuracyMacro: 0.98
StochasticDualCoordinateAscentClassifierBench	PredictIrisBatchOf2	AccuracyMacro: 0.98
StochasticDualCoordinateAscentClassifierBench	PredictIrisBatchOf5	AccuracyMacro: 0.98

Other changes: so far the benchmarks were using currentAssemblyLocation.Directory.Parent.Parent.Parent.Parent.FullName to get the path to folder with input files. I believe it's better to reference them as links in csproj and "copy to output directory if newer". This solution is cleaner and more futureproof.

/cc @eerhardt @danmosemsft @briancylui @KrzysztofCwalina

…r selected benchmarks, not all

…eliminated

… rely on hardcoded folder hierarchy

…ed in a dedicated process

…ot as decimal separator (and it fails for cultures with ",")

…osoft.ML.Runtime.IHost and BenchmarkDotNet.Engines.IHost..

shauheen · 2018-08-28T23:05:51Z

Thanks @adamsitnik , can you please associate this with the relevant issue?

eerhardt · 2018-08-28T23:46:26Z

test/Microsoft.ML.Benchmarks/Program.cs

-        public int PriorityInCategory => 1;
-        public UnitType UnitType => UnitType.Dimensionless;
+            // enforce Neutral Language as "en-us" because the input data files use dot as decimal separator (and it fails for cultures with ",")
+            Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;


This line of code is a bit surprising in a method that is supposed to return a data path. Maybe it would be better to do this in the Main method, or a GlobalSetup method?

@eerhardt I agree that I am breaking CQRS here. My only excuse is that I have named the method GetInvariantCultureDataPath so people can expect that.

I was thinking about moving it to a [GlobalSetup] method but I am afraid that people will don't follow this pattern in new benchmarks. By having it here I guarantee that whoever is going to use files will be using CultureInfo.InvariantCulture for reading these files.

I also wonder how ML.NET samples deal with the culture info problem. Does anybody know?

eerhardt

briancylui · 2018-08-28T23:56:50Z

test/Microsoft.ML.Benchmarks/Harness/Metrics.cs

+        [GlobalCleanup]
+        public void ReportMetrics()
+        {
+            foreach (var metric in GetMetrics())


nit: Would it improve perf to set var metrics = GetMetrics(); right before the foreach loop and then write the condition as var metric in metrics? Not sure...

@briancylui no, it would not.

Whenever you are not sure about something you can benchmark it with BenchmarkDotNet ;)

briancylui · 2018-08-29T00:00:32Z

test/Microsoft.ML.Benchmarks/Harness/ProjectGenerator.cs

+            var foldeWithAutogeneratedExe = Path.GetDirectoryName(artifactsPaths.ExecutablePath);
+            var folderWithNativeDependencies = Path.GetDirectoryName(typeof(ProjectGenerator).Assembly.Location);
+
+            foreach(var nativeDependency in Directory


nit: missing space between foreach and the succeeding (

briancylui · 2018-08-29T00:10:29Z

test/Microsoft.ML.Benchmarks/StochasticDualCoordinateAscentClassifierBench.cs

+                for (int bi = 0; bi < batch.Length; bi++)
+                {
+                    batch[bi] = _example;
+                }


Does this for loop change elements of _batches[i] or only elements of the local variable batch? Not an expert so not sure whether batch is a ref-type.

this is done on purpose, it's a Setup method

briancylui

Feel free to merge after responding to PR comments - I don't have write access so can't hit merge unfortunately. Not an expert in Benchmark.NET, but this PR looks good to me! Thanks @adamsitnik

briancylui · 2018-08-29T00:19:40Z

More reviewers are needed for this PR to be merged - my review doesn't count towards mergeability since I don't have write access.

adamsitnik · 2018-08-29T10:39:29Z

@briancylui @eerhardt thank you for your reviews! I don't have write access myself, so who could merge it?

@shauheen there is no issue, but there was an email thread. Do you want me to create an issue for that?

eerhardt · 2018-08-29T16:22:14Z

test OSX10.13 Debug

briancylui · 2018-08-29T21:12:45Z

test OSX10.13 Debug please
test public-CI please

# Conflicts: # build/Dependencies.props # test/Microsoft.ML.Benchmarks/KMeansAndLogisticRegressionBench.cs

adamsitnik added 11 commits August 26, 2018 16:45

simplify and cleanup the code, remove dead code

f18f927

use Target to specify that given setup method should be executed fo…

058ab89

…r selected benchmarks, not all

consume the result of Predict to make sure it does not get dead-code …

c09024f

…eliminated

reference input files from .csproj and copy them to output dir, don't…

1ea36e2

… rely on hardcoded folder hierarchy

every ML.NET benchmark allocates a lot of memory and should be execut…

a6ff27a

…ed in a dedicated process

make it possible for every type to report different metrics

da080f1

enforce current culture as "en-us" because the input data files use d…

4e90e06

…ot as decimal separator (and it fails for cultures with ",")

for our time consuming benchmarks 1 warmup iteration is enough

fe38aeb

workaround for the auto-generated code to avoid name coflict for Micr…

282b088

…osoft.ML.Runtime.IHost and BenchmarkDotNet.Engines.IHost..

add comment about why we need a custom toolchain

8c4e9b9

update BDN version to allow benchmarking with CoreRun

5bfa492

adamsitnik mentioned this pull request Aug 27, 2018

Add new benchmarks to test\Microsoft.ML.Benchmarks #722

Merged

danmoseley requested review from eerhardt and briancylui August 28, 2018 22:16

eerhardt reviewed Aug 28, 2018

View reviewed changes

eerhardt approved these changes Aug 28, 2018

View reviewed changes

briancylui reviewed Aug 28, 2018

View reviewed changes

briancylui reviewed Aug 29, 2018

View reviewed changes

briancylui approved these changes Aug 29, 2018

View reviewed changes

code review fix: spacing

ec3df9f

Merge remote-tracking branch 'upstream/master' into benchmarksPolishing

f209d96

# Conflicts: # build/Dependencies.props # test/Microsoft.ML.Benchmarks/KMeansAndLogisticRegressionBench.cs

safern approved these changes Aug 30, 2018

View reviewed changes

safern merged commit dfe9f3a into dotnet:master Aug 30, 2018

ghost locked as resolved and limited conversation to collaborators Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for custom metrics reported in the Benchmarks #735

Support for custom metrics reported in the Benchmarks #735

adamsitnik commented Aug 27, 2018

shauheen commented Aug 28, 2018

eerhardt Aug 28, 2018

adamsitnik Aug 29, 2018

eerhardt left a comment

briancylui Aug 28, 2018

adamsitnik Aug 29, 2018

briancylui Aug 29, 2018

briancylui Aug 29, 2018

adamsitnik Aug 29, 2018

briancylui left a comment

briancylui commented Aug 29, 2018

adamsitnik commented Aug 29, 2018

eerhardt commented Aug 29, 2018

briancylui commented Aug 29, 2018

Support for custom metrics reported in the Benchmarks #735

Support for custom metrics reported in the Benchmarks #735

Conversation

adamsitnik commented Aug 27, 2018

shauheen commented Aug 28, 2018

eerhardt Aug 28, 2018

Choose a reason for hiding this comment

adamsitnik Aug 29, 2018

Choose a reason for hiding this comment

eerhardt left a comment

Choose a reason for hiding this comment

briancylui Aug 28, 2018

Choose a reason for hiding this comment

adamsitnik Aug 29, 2018

Choose a reason for hiding this comment

briancylui Aug 29, 2018

Choose a reason for hiding this comment

briancylui Aug 29, 2018

Choose a reason for hiding this comment

adamsitnik Aug 29, 2018

Choose a reason for hiding this comment

briancylui left a comment

Choose a reason for hiding this comment

briancylui commented Aug 29, 2018

adamsitnik commented Aug 29, 2018

eerhardt commented Aug 29, 2018

briancylui commented Aug 29, 2018