[MRG] Add fast kernel classifier/regressor (see #11039) #11694

EigenPro · 2018-07-27T01:21:24Z

This pull request implements the feature in issue #11039

ToDo:

Fix several unittest failures
Add a user guide page
Add an example

jnothman · 2018-07-29T03:42:59Z

Please ping in a few weeks as we are focusing on releasing 0.20

amueller · 2018-08-08T12:55:55Z

maybe it's a tabs vs spaces thing? Sorry we still haven't released, but we'll look at this soon.

There's a bunch of errors related to

AttributeError: 'module' object has no attribute 'multi_dot'

Please check which version it was added, we might need a backport.

sklearn-lgtm · 2018-10-18T23:57:25Z

This pull request introduces 4 alerts when merging 46488c3 into 53069c2 - view on LGTM.com

new alerts:

1 for Comparison using is when operands support __eq__
1 for Unused local variable
1 for Unused import
1 for Implicit string concatenation in a list

Comment posted by LGTM.com

amueller · 2018-11-05T21:27:04Z

how long do the examples run?
Also, it might be interesting to add this to some of the comparisons in the benchmark folder?
Having some comparisons with SVC and kernel approximation would be nice.

EigenPro · 2018-11-05T23:50:58Z

We did compare the fast kernel method with SVC in three examples (mnist, noisy mnist, and synthetic). The training for SVC on one dataset (noisy mnist) can take as long as 27 minuets. Fast kernel normally completes training in 10~25% of time used by SVC. It also shows consistently better testing accuracy and nearly 10X speedup over SVC on testing. Note that we run these experiments on a server with one Intel Xeon E5-1620 CPU (4 cores).

@amueller as to the kernel approximation, do you mean kernel ridge regression? For now we have implemented a kernel classifier and a kernel regressor. Notably, our fast kernel regression and the kernel ridge regression (without ridge :) converge to the same optimal solution.

See results for noisy mnist here:
https://github.com/scikit-learn/scikit-learn/blob/b8e32885dbfd06a534be8d4c2a5c16233188a688/doc/images/fast_kernel_noisy_mnist.png

amueller · 2018-12-10T21:52:14Z

I meant using either sklearn.kernel_approximation.Nystroem or sklearn.kernel_approximation.RBFSampler (which implements Rahimi and Recht) and then RidgeClassifier.

EigenPro · 2019-01-15T21:33:37Z

Sorry for the late update. We have compared the performance of sklearn.kernel_approximation.Nystroem with sklearn.linear_model.RidgeClassifier and our method (run for 1 epoch) on the full MNIST. The result can be seen in the attached file. We can add this test to the source if needed.

mnist-nystrom-epro.pdf

GaelVaroquaux · 2019-02-26T13:26:10Z

I am sorry, but the paper behind the method is cited 23 times on scholar. This is far below our citation criterion for inclusion (https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms).

Hence, this method cannot be contributed to scikit-learn. It should be contributed as a package in scikit-learn contrib.

EigenPro · 2019-02-26T14:11:06Z

@GaelVaroquaux : I see your concern. So would you consider the other criterion for inclusion (https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms).

"A technique that provides a clear-cut improvement on a widely-used method will also be considered for inclusion."

The improvement in this MRG is clear-cut. It is a preconditioned iterative method that is theoretically guaranteed to improve the performance (see an arXiv version of the paper here https://arxiv.org/abs/1703.10622). We also show strong empirical evidence on various datasets (in both the paper and this MRG).

The methods we target to improve are kernel machines (e.g., SVM and kernel regression) which are widely used.

GaelVaroquaux · 2019-02-26T14:54:09Z

The sentence about providing a clear-cut improvement would be for a method that has feature parity, for instance doing the same thing but faster.

I see two problems with considering the inclusion of EigeinPro:

First, it exposes us to many similar requests. There are at least several dozens of papers a year that contribute a specific improvement to an established method and end up weakly used and weakly cited. To ensure the future of scikit-learn, we need to focus on a small number of methods, we just cannot address them all. This is why we focus on well-cited papers. It is an indication that their is a strong interest, albeit an imperfect indication. Of course, there is a chicken-and-egg problem: papers get well cited when they have an easily-accessible implementation, as in scikit-learn. This is why we created scikit-learn-contrib.
EigenPro is not merely a fast solver for a classic problem. It optimizes a squared-loss classifier, which is not one the most popular losses for classification. Reading the paper, it seems to me that the optimization strategy also introduces an implicit regularization in via the optimization. Hence, it corresponds to a new learning problem, the popularity of which needs to be established.

EigenPro · 2019-03-06T16:47:10Z

@GaelVaroquaux: After some discussions on your last reply, we would like to clarify a few points in regard to our method, EigenPro, to show that it is a a clear-cut improvement over solvers for kernel regression.

EigenPro is a fast solver for the classical problem of kernel regression, quite central to machine learning and statistics. Kernel regression is also implemented in scikit-learn
(https://scikit-learn.org/stable/modules/kernel_ridge.html). EigenPro solution is mathematically equivalent to the original regression problem but the algorithm is much faster due to preconditioning. At this time we are not aware of any other method with a comparable speed-up. Thus, we believe it is a "clear-cut improvement" over a classical and widely used method.

The use of the square loss for classification has a long history as well (see, e.g., http://cbcl.mit.edu/publications/ps/rlsc.pdf for a discussion, references and experimental results). The square loss typically performs as well or better than the hinge loss in terms of the test error (see the reference above or, for example, the Tables 1,2 in http://www.jmlr.org/proceedings/papers/v51/que16-supp.pdf). It is not clear that the hinge loss has any systematic advantage over the square loss for classification in kernel methods, perhaps it is being used primarily for historical reasons. While the cross-entropy (logistic) loss is commonly used with neural networks it is not a typical choice for kernel machines.

agramfort · 2019-03-11T20:45:56Z

a few remarks:

we cannot create a new module for a single estimator.
if it's an improvement to an existing estimator it should be an option like "solver" to an existing estimator.

my 2c

GaelVaroquaux · 2019-03-11T21:07:58Z

If it's a new faster solver, it must solve the same exact mathematical problem, as checked by convergence tests.

Reading the paper, I had the impression that the problem solved by EigenPro has additional regularization effects.

amueller · 2019-03-18T16:46:46Z

@GaelVaroquaux you mentioned a related paper that you considered more mature, can you remind me what that was?

GaelVaroquaux · 2019-04-23T20:07:22Z

@GaelVaroquaux you mentioned a related paper that you considered more mature, can you remind me what that was?

Shalev-Shwartz S, Singer Y, Srebro N, Cotter A. Pegasos: Primal estimated sub-gradient solver for svm. Mathematical programming. 2011 Mar 1;127(1):3-0.

amueller · 2019-08-06T17:03:21Z

closing as merged to scikit-learn-extra

add fast kernel classificaiton/regression method

89fb303

Alex7Li added 3 commits July 29, 2018 15:43

Changed where regressor was declared in fastkernelclassification.

5636dc0

Removed option to choose dtype. float32 is now only option.

d94af82

Updated example usage for FKC/FKR, they were switched.

b2bba20

Alex7Li and others added 18 commits August 10, 2018 23:16

Added whtiespace:

43af004

Fixed whitespace for FKC example comment (again)

09e6bab

Use float64 for the prediction in FKR

e1a88ed

Increased bandwidth for test cases, Ensure predict uses 64

5d39ce6

added space

e9dca9a

Begin creating documentation

64749c0

Begin creating documentation

2b3bd06

Begin creating documentation

df00cc9

DOC updated text for fastkernel

4b2a5d2

DOC: Update for fast kernel

a794313

merge example figures using subplot

1034310

Update example figure and text

772c433

Update example

e5b3f43

Update text for example

807d3a1

Combined test files for FastKernel, added comments

41a58c2

Fix test cases for fast_kernel

d377cb4

Added

9790843

Merge https://github.com/scikit-learn/scikit-learn into EigenPro

46488c3

Alex7Li added 5 commits October 22, 2018 23:35

Replaced max with min when calculating n_components

f7bca5c

Fixed typo in estimator_checks

c55b1e7

Fixed fastkernel test case

5897583

Removed multi_dot to be compatible with older versions of numpy

553372c

Added author comment

12c7154

Alex7Li added 4 commits November 1, 2018 13:25

Trying to fix estimator checks revision history again

744643a

Try again to fix estimator checks history

46490a3

Last try hopefully

8316d49

Change default n_epoch to 2

b8e3288

rth mentioned this pull request Mar 1, 2019

Add Fastfood algorithm scikit-learn-contrib/scikit-learn-extra#1

Merged

Update documentation

fb33e0c

Alex7Li added 9 commits April 23, 2019 16:17

Merge branch 'master' into EigenPro

e98a175

Update fast_kernel.py

76edff2

Update fast_kernel.py

024c6d4

Merge with __init__

510d682

Undo change to estimator checks

d41cc54

Finish renaming, add multioutput tag

1db4901

Changed gettags to a map, it was a set on accident

0be3280

Remove extra tabs in documentation

158ed97

Moved tabs again

8aa2cf3

Alex7Li mentioned this pull request May 21, 2019

[MRG] Add fast kernel classifier/regressor scikit-learn-contrib/scikit-learn-extra#13

Merged

amueller closed this Aug 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Add fast kernel classifier/regressor (see #11039) #11694

[MRG] Add fast kernel classifier/regressor (see #11039) #11694

EigenPro commented Jul 27, 2018

jnothman commented Jul 29, 2018 via email

amueller commented Aug 8, 2018 •

edited

Loading

sklearn-lgtm commented Oct 18, 2018

amueller commented Nov 5, 2018 •

edited

Loading

EigenPro commented Nov 5, 2018

amueller commented Dec 10, 2018

EigenPro commented Jan 15, 2019

GaelVaroquaux commented Feb 26, 2019

EigenPro commented Feb 26, 2019

GaelVaroquaux commented Feb 26, 2019

EigenPro commented Mar 6, 2019

agramfort commented Mar 11, 2019

GaelVaroquaux commented Mar 11, 2019

amueller commented Mar 18, 2019

GaelVaroquaux commented Apr 23, 2019 via email

amueller commented Aug 6, 2019

[MRG] Add fast kernel classifier/regressor (see #11039) #11694

[MRG] Add fast kernel classifier/regressor (see #11039) #11694

Conversation

EigenPro commented Jul 27, 2018

jnothman commented Jul 29, 2018 via email

amueller commented Aug 8, 2018 • edited Loading

sklearn-lgtm commented Oct 18, 2018

amueller commented Nov 5, 2018 • edited Loading

EigenPro commented Nov 5, 2018

amueller commented Dec 10, 2018

EigenPro commented Jan 15, 2019

GaelVaroquaux commented Feb 26, 2019

EigenPro commented Feb 26, 2019

GaelVaroquaux commented Feb 26, 2019

EigenPro commented Mar 6, 2019

agramfort commented Mar 11, 2019

GaelVaroquaux commented Mar 11, 2019

amueller commented Mar 18, 2019

GaelVaroquaux commented Apr 23, 2019 via email

amueller commented Aug 6, 2019

amueller commented Aug 8, 2018 •

edited

Loading

amueller commented Nov 5, 2018 •

edited

Loading