Skip to content

Commit 1f43fd2

Browse files
saldanhadyuanx749lucyleeow
authored andcommitted
DOC: Updates to Macro vs micro-averaging in plot_roc.py (#29845)
Co-authored-by: Xiao Yuan <[email protected]> Co-authored-by: Lucy Liu <[email protected]>
1 parent ea8a725 commit 1f43fd2

File tree

1 file changed

+20
-4
lines changed

1 file changed

+20
-4
lines changed

examples/model_selection/plot_roc.py

+20-4
Original file line numberDiff line numberDiff line change
@@ -218,6 +218,12 @@
218218
# Obtaining the macro-average requires computing the metric independently for
219219
# each class and then taking the average over them, hence treating all classes
220220
# equally a priori. We first aggregate the true/false positive rates per class:
221+
#
222+
# :math:`TPR=\frac{1}{C}\sum_{c}\frac{TP_c}{TP_c + FN_c}` ;
223+
#
224+
# :math:`FPR=\frac{1}{C}\sum_{c}\frac{FP_c}{FP_c + TN_c}` .
225+
#
226+
# where `C` is the total number of classes.
221227

222228
for i in range(n_classes):
223229
fpr[i], tpr[i], _ = roc_curve(y_onehot_test[:, i], y_score[:, i])
@@ -441,7 +447,17 @@
441447
# global performance of a classifier can still be summarized via a given
442448
# averaging strategy.
443449
#
444-
# Micro-averaged OvR ROC is dominated by the more frequent class, since the
445-
# counts are pooled. The macro-averaged alternative better reflects the
446-
# statistics of the less frequent classes, and then is more appropriate when
447-
# performance on all the classes is deemed equally important.
450+
# When dealing with imbalanced datasets, choosing the appropriate metric based on
451+
# the business context or problem you are addressing is crucial.
452+
# It is also essential to select an appropriate averaging method (micro vs. macro)
453+
# depending on the desired outcome:
454+
#
455+
# - Micro-averaging aggregates metrics across all instances, treating each
456+
# individual instance equally, regardless of its class. This approach is useful
457+
# when evaluating overall performance, but note that it can be dominated by
458+
# the majority class in imbalanced datasets.
459+
#
460+
# - Macro-averaging calculates metrics for each class independently and then
461+
# averages them, giving equal weight to each class. This is particularly useful
462+
# when you want under-represented classes to be considered as important as highly
463+
# populated classes.

0 commit comments

Comments
 (0)