Skip to content

Remove parsing perf bottleneck in WordEmbeddingsTransform #1608

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adamsitnik opened this issue Nov 13, 2018 · 0 comments
Closed

Remove parsing perf bottleneck in WordEmbeddingsTransform #1608

adamsitnik opened this issue Nov 13, 2018 · 0 comments
Assignees
Labels
perf Performance and Benchmarking related
Milestone

Comments

@adamsitnik
Copy link
Member

I am currently profiling ML.NET to find performance bottlenecks and places where .NET framework could do a better perf job.

I started with two the most time-consuming benchmarks from our public benchmark suite.

After some profiling, it turned out that parsing big text files in WordEmbeddingsTransform is a performance bottleneck.

In the histogram below the red box 2 is parsing.

image

I optimized the code and then parallelized it.

Before:

Method Mean
WikiDetox_WordEmbeddings_OVAAveragedPerceptron 286.7 s
WikiDetox_WordEmbeddings_SDCAMC 184.1 s

After:

Method Mean
WikiDetox_WordEmbeddings_OVAAveragedPerceptron 169.02 s
WikiDetox_WordEmbeddings_SDCAMC 65.32 s

The PR is #1599

/cc @shauheen

@adamsitnik adamsitnik added the perf Performance and Benchmarking related label Nov 13, 2018
@adamsitnik adamsitnik self-assigned this Nov 13, 2018
@shauheen shauheen added this to the 1118 milestone Nov 21, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
perf Performance and Benchmarking related
Projects
None yet
Development

No branches or pull requests

2 participants