Imbalanced Domains and Rare Event Detection
Performance Evaluation
(Torgo et. al.) LIDTA2020 September, 2020 38 / 127
An Example from Regression
Forecasting Stock Market Returns
Very high or low returns (% variations of prices) are interesting
Near-zero returns are very common but uninteresting for traders unable
to cover transaction costs
Examples:
I Forecasting a future return of 3% and then it happens -5% is a very
bad error!
I Forecasting a return of 3% and then it happens 11% has the same
error amplitude but it is not a serious error
I Forecasting 0.2% for a true value of 0.4% is reasonably accurate but
irrelevant!
I Forecasting -7.5% for a true value of -8% is a good an useful prediction
Because near 0 returns are very common a model that always
forecasts 0 is hard to beat in terms of Mean Squared Error. But this
model is useless!
(Torgo et. al.) LIDTA2020 September, 2020 41 / 127
Metrics and the Available Information
Di↵erent applications may involve di↵erent type of information on the
user preferences
This may have an impact on the metrics you can and/or should
calculate
Independently, there are two classes of metrics: scalar and graphical
(Torgo et. al.) LIDTA2020 September, 2020 42 / 127
Evaluation with Full Utility Information
Utility Matrices
Table where each entry speciﬁes the
cost (negative beneﬁt) or beneﬁt of
each type of prediction
Pred.
c1 c2 c3
Obs.
c1 B1,1 C1,2 C1,3
c2 C2,1 B2,2 C2,3
c3 C3,1 C3,2 B3,3
Models are then evaluated by the total utility of their predictions, i.e.
the sum of the beneﬁts minus the costs.
Similar setting for regression using Utility Surfaces (Ribeiro, 2011)
R. Ribeiro (2011). “Utility-based Regression”. PhD on Computer Science, Univ. Porto.
(Torgo et. al.) LIDTA2020 September, 2020 43 / 127
The Precision/Recall Framework
Classiﬁcation
Problems with two classes
One of the classes is much less frequent and it is also the most
relevant
Preds.
Pos Neg
Obs.
Pos True Positives (TP) False Negatives (FN))
Neg False Positives (FP) True Negatives (TN)
(Torgo et. al.) LIDTA2020 September, 2020 44 / 127
The Precision/Recall Framework
Classiﬁcation - 2
Preds.
P N
Obs.
P TP FN
N FP TN
Precision - proportion of the signals
(events) of the model that are
correct
Prec =
TP
TP + FP
Recall - proportion of the real
events that are captured by the
model
Rec =
TP
TP + FN
(Torgo et. al.) LIDTA2020 September, 2020 45 / 127
The F-Measure
Combining Precision and Recall into a single measure
Useful to have a single measure - e.g. optimization within a search
procedure
Maximizing one of them is easy at the cost of the other (it is easy to
have 100% recall - always predict “P”).
What is di cult is to have both of them with high values
(Torgo et. al.) LIDTA2020 September, 2020 46 / 127
The F-Measure
Combining Precision and Recall into a single measure
Useful to have a single measure - e.g. optimization within a search
procedure
Maximizing one of them is easy at the cost of the other (it is easy to
have 100% recall - always predict “P”).
What is di cult is to have both of them with high values
The F-measure is a statistic that is based on the values of precision
and recall and allows establishing a trade-o↵ between the two using a
user-deﬁned parameter ( ),
F =
( 2 + 1) · Prec · Rec
2 · Prec + Rec
where controls the relative importance of Prec and Rec. If = 1
then F is the harmonic mean between Prec and Rec; When ! 0 the
weight of Rec decreases. When ! 1 the weight of Prec decreases.
(Torgo et. al.) LIDTA2020 September, 2020 46 / 127
The G-Mean and Adjusted G-Mean
Gm =
r
TP
TP + FN
⇥
TN
TN + FP
=
p
sensitivity ⇥ speciﬁcity
AGm =
⇢ Gm+Speciﬁcity⇥Nn
1+Nn
sensitivity 0
0 sensitivity = 0
where Nn is the proportion of majority class examples in the data set.
M. Kubat and S. Matwin. “Addressing the curse of imbalanced training sets: one-sided
selection.” In Proc. of 14th Int. Conf. on Machine Learning, 1997, Nashville, USA, pp.179-186
R. Batuwita and V. Palade. “A new performance measure for class imbalance learning.
Application to bioinformatics problems.” In ICMLA’09, pp.545–550. IEEE, 2009.
(Torgo et. al.) LIDTA2020 September, 2020 47 / 127
Metrics for Multiclass Imbalance Problems
(i) is the relevance of class i.
Di↵erent ways to obtain () depending on the available domain
information (Branco, 2017).
Rec = 1
CP
i=1
(i)
CP
i=1
(i) · recalli Prec = 1
CP
i=1
(i)
CP
i=1
(i) · precisioni
F = (1+ 2)·Prec ·Rec
( 2·Prec )+Rec
AvF = 1
CP
i=1
(i)
CP
i=1
(i)·(1+ 2)·precisioni ·recalli
( 2·precisioni )+recalli
CBA =
CP
i=1
(i) ·
mati,i
max
CP
j=1
mati,j ,
CP
j=1
matj,i
!
P. Branco, L. Torgo, and R. Ribeiro. ”Relevance-based evaluation metrics for multi-class
imbalanced domains.” PAKDD. Springer, Cham, pp.698-710 (2017).
(Torgo et. al.) LIDTA2020 September, 2020 48 / 127
The Precision/Recall Framework
Regression
For forecasting rare extreme values, the concepts of Precision and Recall
were also adapted to regression (Torgo and Ribeiro, 2009; Branco, 2014),
prec =
P
(ˆyi )>tR
(1 + U(ˆyi , yi ))
P
(ˆyi )>tR
(1 + (ˆyi ))
rec =
P
(yi )>tR
(1 + U(ˆyi , yi ))
P
(yi )>tR
(1 + (yi ))
L. Torgo and R. P. Ribeiro (2009). “Precision and Recall for Regression”. In: Discovery
Science’2009. Springer.
P. Branco (2014). “Re-sampling Approaches for Regression Tasks under Imbalanced Domains”.
MSc on Computer Science, Univ. Porto.
(Torgo et. al.) LIDTA2020 September, 2020 49 / 127
Summary of Scalar Metrics for Imbalanced Domains
Adapted from:
P. Branco, L. Torgo and R. Ribeiro. “A Survey of Predictive Modeling on Imbalanced
Domains”. In: ACM Comput. Surv. 49-2, 1–31 (2016).
P. Branco (2018). ”Utility-based Predictive Analytics”. PhD on Computer Science, Univ. Porto.
(Torgo et. al.) LIDTA2020 September, 2020 50 / 127
ROC curve and Precision-Recall Curve
Classiﬁcation
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
False Positive Rate
TruePositiveRate
random
classifier
A
B
C
AUC−ROC(A) = 0.87
AUC−ROC(B) = 0.83
AUC−ROC(C) = 0.64
Ideal Model
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Recall
Precision
AB
C
AUC−PR(A) = 0.97
AUC−PR(B) = 0.80
AUC−PR(C) = 0.76
Ideal Model
Taken from:
P. Branco (2018). ”Utility-based Predictive Analytics”. PhD on Computer Science, Univ. Porto.
(Torgo et. al.) LIDTA2020 September, 2020 51 / 127
ROC curve and Precision-Recall Curve
Regression
Taken from:
R. Ribeiro (2011). “Utility-based Regression”. PhD on Computer Science, Univ. Porto.
(Torgo et. al.) LIDTA2020 September, 2020 52 / 127
Summary of Graphical Metrics for Imbalanced Domains
Adapted from:
P. Branco, L. Torgo and R. Ribeiro. “A Survey of Predictive Modeling on Imbalanced
Domains”. In: ACM Comput. Surv. 49-2, 1–31 (2016).
P. Branco (2018). ”Utility-based Predictive Analytics”. PhD on Computer Science, Univ. Porto.
(Torgo et. al.) LIDTA2020 September, 2020 53 / 127