Skip to content

Optimize natural_break on large inputs #562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 4, 2021

Conversation

alex-soklev
Copy link
Contributor

Optimized natural break on CPU.

  • Removed slow random sample generation with a much faster one.
  • Removed a slow and unnecessary nan removal
  • Optimized the _run_numpy_jenks_matrices by making it multithreaded.

Initial performance I benchmarked took 11.27s to calculate
New version takes 3.2s

Alexander Soklev and others added 3 commits September 29, 2021 21:55
A uv array is being calculated that strips all NaNs and Infs from the input data,
but then instead of the uv array we pass on the original data.
@thuydotm
Copy link
Contributor

thuydotm commented Oct 4, 2021

Thanks! LGTM

@thuydotm thuydotm merged commit e5a97cc into makepath:master Oct 4, 2021
thuydotm pushed a commit that referenced this pull request Oct 8, 2021
* Fixes in CPU version of normal_breaks.

* Made the code a bit more readable
* Fixed a bug where due to input data array dtype we could end up with
  numeric overflow and produce invalid results
* Removed parallel for as it produces wrong results.
  This algorithm is highly sequential and cannot be parallelized

* Fixes for the linter

* More fixes for the linter

* Another attempt to please the linter

* Fix a bug - we need to pass the full sample_data to _run_numpy_jenks instead of the filtered uv array.

Unfortunately this costs most of the optimization that I was hoping for.

* Add a determinism test for natural_breaks to the classify test suite

* Style changes to conform to code style

Co-authored-by: Alexander Soklev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants