FEA Add macro-averaged mean absolute error #780

AurelienMassiot · 2020-11-24T12:58:00Z

Reference Issue

As detailed in the issue #18901 I wrote in Scikit-Learn main repository, Macro-Averaged MAE should be added to imbalanced-learn repository instead.

For ordinal classification, we can use multiple metrics, for example: MAE, MSE... As we would use for regression.
But when these classes are imbalanced, one way to deal with imbalance is to use macro-averaged MAE instead, as described on StackExchange and in the original paper.

The macro-averaged MAE is like the "classic" MAE, except we compute each MAE for each class and average them, giving equal weights to MAEs. Note that macro-averaged MAE == micro-averaged (or classic) MAE when class are balanced.

To illustrate this, let's consider:

y_true_balanced = np.array([1, 1, 1, 2, 2, 2])
y_true_imbalanced = np.array([1, 1, 1, 1, 1, 2])
y_pred = np.array([1, 2, 1, 2, 1, 2])

mean_absolute_error(y_true_balanced, y_pred)
>> 0.33
mean_absolute_error(y_true_imbalanced, y_pred)
>> 0.33
macro_averaged_mean_absolute_error(y_true_balanced, y_pred)
>> 0.33
macro_averaged_mean_absolute_error(y_true_imbalanced, y_pred)
>> 0.2

Any other comments?

pep8speaks · 2020-11-24T12:58:06Z

Hello @AurelienMassiot! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file imblearn/metrics/_classification.py:

Line 777:17: W503 line break before binary operator

Comment last updated at 2021-02-08 23:15:32 UTC

codecov · 2020-11-24T13:19:29Z

Codecov Report

Merging #780 (26eeabe) into master (f40e654) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #780      +/-   ##
==========================================
- Coverage   98.55%   98.55%   -0.01%     
==========================================
  Files          89       89              
  Lines        5681     5680       -1     
  Branches      475      477       +2     
==========================================
- Hits         5599     5598       -1     
  Misses         81       81              
  Partials        1        1

Impacted Files	Coverage Δ
imblearn/metrics/__init__.py	`100.00% <100.00%> (ø)`
imblearn/metrics/_classification.py	`96.29% <100.00%> (+0.19%)`	⬆️
imblearn/metrics/tests/test_classification.py	`100.00% <100.00%> (ø)`
imblearn/utils/estimator_checks.py	`95.60% <0.00%> (-0.36%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f40e654...8a1d8bf. Read the comment docs.

glemaitre

I made a quick first pass.

doc/metrics.rst

imblearn/metrics/_classification.py

glemaitre · 2020-11-24T14:22:22Z

imblearn/metrics/_classification.py

+    all_mae = []
+    y_true = np.array(y_true)
+    y_pred = np.array(y_pred)
+    for class_to_predict in np.unique(y_true):


I think that we should introduce a label to be able to either give class that are not present in y_true or select a subset of labels as in precision-recall: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_recall_fscore_support.html#sklearn.metrics.precision_recall_fscore_support

Would it make sense?

I think it does not make sense because all we want is the MAE for each class really in y_true.
For example, if I have:
y_true = [1, 2, 2, 3]
I want the average MAE, which will be the MAE for classes 1,2,3. And if a class is missing in y_pred, it doesn't matter, for example in the following example:
y_pred= [1, 2, 2, 2]
my MAEs will be 0 for class 1, 0 for class 2, 1 for class 3, and the MA-MAE will then be 0.33.

WDYT?

imblearn/metrics/_classification.py

glemaitre · 2020-11-24T14:24:38Z

imblearn/metrics/tests/test_classification.py

+
+    ],
+)
+def test_macro_averaged_mean_absolute_error(y_true, y_pred, expected_ma_mae):


If we introduce labels, we will need another test with a bit more corner cases.
Otherwise, I think this is good.

See comment above for labels.

AurelienMassiot

See label answer

AurelienMassiot · 2020-11-24T14:56:14Z

imblearn/metrics/_classification.py

+    all_mae = []
+    y_true = np.array(y_true)
+    y_pred = np.array(y_pred)
+    for class_to_predict in np.unique(y_true):


I think it does not make sense because all we want is the MAE for each class really in y_true.
For example, if I have:
y_true = [1, 2, 2, 3]
I want the average MAE, which will be the MAE for classes 1,2,3. And if a class is missing in y_pred, it doesn't matter, for example in the following example:
y_pred= [1, 2, 2, 2]
my MAEs will be 0 for class 1, 0 for class 2, 1 for class 3, and the MA-MAE will then be 0.33.

WDYT?

AurelienMassiot · 2020-11-24T14:56:36Z

imblearn/metrics/tests/test_classification.py

+
+    ],
+)
+def test_macro_averaged_mean_absolute_error(y_true, y_pred, expected_ma_mae):


See comment above for labels.

AurelienMassiot · 2020-11-26T09:36:49Z

Could anyone make a final review ? :-)

glemaitre · 2020-11-26T10:11:49Z

@AurelienMassiot I promise you that it will be merged for the next release which will follow the release in scikit-learn 0.24
Right now I have to first work on the release of scikit-learn 0.24 because we intend to release in the next week.
If your time schedule is short for completing the PR, do not worry too much, I will push the necessary fix and merge this PR.

AurelienMassiot · 2020-11-26T10:21:23Z

Thanks @glemaitre ! This is not urgent, good luck for the release of scikit-learn ;-).

glemaitre · 2021-02-08T23:28:51Z

Thanks @AurelienMassiot Good to go

add macro-averaged mean absolute error

4bc1285

fix linting

601f6bf

GitName added 2 commits November 24, 2020 14:26

fix doctest

2db5794

fix docstring

05da8da

AurelienMassiot mentioned this pull request Nov 24, 2020

Macro-averaged MAE scikit-learn/scikit-learn#18901

Closed

fix float approximation in docstring

de54ace

glemaitre reviewed Nov 24, 2020

View reviewed changes

review

26eeabe

AurelienMassiot commented Nov 24, 2020

View reviewed changes

glemaitre added this to the 0.7 milestone Nov 26, 2020

glemaitre modified the milestones: 0.7, 0.8 Nov 26, 2020

glemaitre self-assigned this Feb 3, 2021

glemaitre self-requested a review February 8, 2021 21:56

glemaitre added 2 commits February 9, 2021 00:14

revie

d04635e

Merge remote-tracking branch 'origin/master' into pr/AurelienMassiot/780

8a1d8bf

glemaitre changed the title ~~[MRG] ENH: macro-averaged mean absolute error~~ FEA Add macro-averaged mean absolute error Feb 8, 2021

glemaitre merged commit 0b48def into scikit-learn-contrib:master Feb 8, 2021

warnbergg mentioned this pull request Jul 6, 2021

[ENH] Add macro-averaged mean squared error #846

Open

dhimmel mentioned this pull request Aug 10, 2023

Good metrics for model evaluation? related-sciences/nxontology-ml#9

Closed

FEA Add macro-averaged mean absolute error #780

FEA Add macro-averaged mean absolute error #780

Uh oh!

Conversation

AurelienMassiot commented Nov 24, 2020

Reference Issue

Any other comments?

Uh oh!

pep8speaks commented Nov 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2021-02-08 23:15:32 UTC

Uh oh!

codecov bot commented Nov 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre Nov 24, 2020

Choose a reason for hiding this comment

Uh oh!

AurelienMassiot Nov 24, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre Nov 24, 2020

Choose a reason for hiding this comment

Uh oh!

AurelienMassiot Nov 24, 2020

Choose a reason for hiding this comment

Uh oh!

AurelienMassiot left a comment

Choose a reason for hiding this comment

Uh oh!

AurelienMassiot Nov 24, 2020

Choose a reason for hiding this comment

Uh oh!

AurelienMassiot Nov 24, 2020

Choose a reason for hiding this comment

Uh oh!

AurelienMassiot commented Nov 26, 2020

Uh oh!

glemaitre commented Nov 26, 2020

Uh oh!

AurelienMassiot commented Nov 26, 2020

Uh oh!

glemaitre commented Feb 8, 2021

Uh oh!

Uh oh!

pep8speaks commented Nov 24, 2020 •

edited

Loading

codecov bot commented Nov 24, 2020 •

edited

Loading