Remove parsing perf bottleneck in WordEmbeddingsTransform #1608

adamsitnik · 2018-11-13T16:23:43Z

I am currently profiling ML.NET to find performance bottlenecks and places where .NET framework could do a better perf job.

I started with two the most time-consuming benchmarks from our public benchmark suite.

After some profiling, it turned out that parsing big text files in WordEmbeddingsTransform is a performance bottleneck.

In the histogram below the red box 2 is parsing.

I optimized the code and then parallelized it.

Before:

Method	Mean
WikiDetox_WordEmbeddings_OVAAveragedPerceptron	286.7 s
WikiDetox_WordEmbeddings_SDCAMC	184.1 s

After:

Method	Mean
WikiDetox_WordEmbeddings_OVAAveragedPerceptron	169.02 s
WikiDetox_WordEmbeddings_SDCAMC	65.32 s

The PR is #1599

The text was updated successfully, but these errors were encountered:

adamsitnik added the perf Performance and Benchmarking related label Nov 13, 2018

adamsitnik self-assigned this Nov 13, 2018

adamsitnik mentioned this issue Nov 13, 2018

Remove parsing perf bottleneck in WordEmbeddingsTransform #1599

Merged

shauheen added this to the 1118 milestone Nov 21, 2018

adamsitnik closed this as completed in feddc72 Nov 21, 2018

ghost locked as resolved and limited conversation to collaborators Mar 26, 2022

Provide feedback