Skip to content

[ENH] functional n_jobs parameter for knn classifier #2478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
baraline opened this issue Jan 1, 2025 · 12 comments · May be fixed by #2687
Open

[ENH] functional n_jobs parameter for knn classifier #2478

baraline opened this issue Jan 1, 2025 · 12 comments · May be fixed by #2687
Assignees
Labels
classification Classification package enhancement New feature, improvement request or other non-bug code enhancement

Comments

@baraline
Copy link
Member

baraline commented Jan 1, 2025

Describe the feature or idea you want to propose

Current n_jobs params is not doing anything in knn classifier.

Describe your proposed solution

Make use of it ! TBD how exactly. If not possible a warning should at least be raised.

Describe alternatives you've considered, if relevant

No response

Additional context

Was curious after looking at sequentia benchmark on why we were so slow...

@baraline baraline added enhancement New feature, improvement request or other non-bug code enhancement classification Classification package labels Jan 1, 2025
@Ramana-Raja
Copy link
Contributor

Ramana-Raja commented Jan 6, 2025

I think parallelization could potentially be applied in the _kneighbors method, but, as shown in the image below, it actually worsens execution time. This is likely because the task itself is too lightweight or quick to benefit from parallel processing. So, I’d say it’s safe to conclude that parallelization isn’t really useful here

As for the warning, I’m not entirely sure what you mean, but maybe we can just remove the n_jobs parameter and turn off multithreading?
image
image

Testing on data:
image

@baraline
Copy link
Member Author

baraline commented Jan 7, 2025

Thanks for the benchmarking job ! Could you try switching the joblib backend to threading and see if it makes a difference ? As the distance function are njit numba function we shouldn't need to use loky backend. Otherwise, defining the kneighbors function as a numba function with parallel=True might work better.
Additionally, using more computing intensive functions like dtw instead of euclidean might change results

@aadya940
Copy link
Contributor

aadya940 commented Jan 7, 2025

@baraline @Ramana-Raja How about parallelizing the loop in _predict in addition to what @Ramana-Raja has already done, it doesn't seem to have any looping dependencies. So could try?

@baraline
Copy link
Member Author

baraline commented Jan 7, 2025

It can work aswell yes, but then you need to balance number of threads (for kneighbor) per process (for sample or group of sample to predict) to find the right balance

@aadya940
Copy link
Contributor

aadya940 commented Jan 7, 2025

Maybe we can limit the number of threads per process with max cpu count divided by njobs?

@Ramana-Raja
Copy link
Contributor

Ramana-Raja commented Jan 7, 2025

@baraline @Ramana-Raja How about parallelizing the loop in _predict in addition to what @Ramana-Raja has already done, it doesn't seem to have any looping dependencies. So could try?

This approach also doesn’t seem to work as intended if the data is small. It actually ends up making the execution time worse, as you can see below
image
image
testing done with dtw:
image
But if the data is large it performs better:
image
image
testing done on data:
image

@aadya940
Copy link
Contributor

aadya940 commented Jan 7, 2025

@Ramana-Raja That behavior is expected. It occurs because the time required for process creation and context switching exceeds the compute time. To address this, analyze how the problem scales and plot a graph comparing execution times with and without parallelization. The intersection point will indicate the optimal input size where parallelism becomes beneficial. Ideally, this function should allow for dynamically switching between single-threaded and multithreaded execution.

@Ramana-Raja
Copy link
Contributor

@Ramana-Raja That behavior is expected. It occurs because the time required for process creation and context switching exceeds the compute time. To address this, analyze how the problem scales and plot a graph comparing execution times with and without parallelization. The intersection point will indicate the optimal input size where parallelism becomes beneficial. Ideally, this function should allow for dynamically switching between single-threaded and multithreaded execution.

This seems to depend on the CPU, right? The optimal data size might vary for different users with different CPU's. I think the best approach is to leave it configurable as a hyperparameter

@aadya940
Copy link
Contributor

aadya940 commented Jan 7, 2025

Makes sense, @baraline wdyt?

@baraline
Copy link
Member Author

baraline commented Jan 7, 2025

I think the simplest would be to offload such hasssle to the numba compiler. We could use the existing functions of the distance module (e.g. euclidean_pairwise_distance) to compute the distance matrix in parallel, by adding a n_jobs=1 optional parameter, using the set_num_threads numba function and making _euclidean_pairwise_distance and _euclidean_from_multiple_to_multiple_distance use parallel=True with prange on both loops. This would not change default behaviour but would allow us to make use of n_jobs without much work.

This would necessitate a change in such function for all distances though, @chrisholder what do you think ?

@TonyBagnall TonyBagnall changed the title [ENH] functionnal n_jobs parameter for knn classifier [ENH] functional n_jobs parameter for knn classifier Jan 10, 2025
@steenrotsman
Copy link
Contributor

I've tried two changes to KNeighborsTimeSeriesClassifier and profiled both:

  1. Use n_jobs in _predict
  2. Use dtaidistance instead of Aeon's dtw_distance

Image

Left and middle screen show parallelization and dtaidistance integration into class. Right screen shows profiling code. These are the results of profiling on my 11th Gen Intel Core i7-1165G7 @ 2.80GHz × 4 CPU:

name avg std min
Aeon ACSF1 (1) 110.2201 2.4291 107.7615
Aeon ACSF1 (8) 49.2053 1.0859 48.3538
Aeon ArrowHead (1) 1.8508 0.0506 1.7939
Aeon ArrowHead (8) 0.5634 0.0114 0.5545
Aeon GunPoint (1) 0.8112 0.0238 0.7918
Aeon GunPoint (8) 1.0206 1.6650 0.2700
DTAI ACSF1 (1) 9.7684 0.3365 9.5679
DTAI ACSF1 (8) 3.2668 0.0646 3.1645
DTAI ArrowHead (1) 0.2433 0.0058 0.2370
DTAI ArrowHead (8) 0.0840 0.0090 0.0744
DTAI GunPoint (1) 0.1359 0.0022 0.1332
DTAI GunPoint (8) 0.0551 0.0016 0.0530

Conclusion: dtaidistance helps more than parallelization, but parallelization certainly helps on larger data sets.
dtaidistance is implemented in C, but its performance gains are almost only due to it allowing an upper bound. I've parallelized in _predict instead of _kneighbors because the upper bound means _kneighbors is not embarrassingly parallel anymore, while _predict still is.

For the issue at hand, I'd put the parallelization in _predict and open a new issue to either include dtaidistance as a (soft) dependency or implement max_dist in dtw_distance and possibly others. Let me know what you think!

@MatthewMiddlehurst
Copy link
Member

Sounds good generally. There is some discussion to be had on how exactly to implement the soft dependency distance and if we want to go that route to begin with, but can leave that to its own issue.

There is a PR to parallelise the aeon distances in #2545. This is only being held up due to notebook issue, so it may be worth testing that also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
classification Classification package enhancement New feature, improvement request or other non-bug code enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants