Imbalanced Domains and Rare Event Detection Performance Evaluation (Torgo et. al.) LIDTA2020 September, 2020 38 / 127 An Example from Regression Forecasting Stock Market Returns Very high or low returns (% variations of prices) are interesting Near-zero returns are very common but uninteresting for traders unable to cover transaction costs Examples: I Forecasting a future return of 3% and then it happens -5% is a very bad error! I Forecasting a return of 3% and then it happens 11% has the same error amplitude but it is not a serious error I Forecasting 0.2% for a true value of 0.4% is reasonably accurate but irrelevant! I Forecasting -7.5% for a true value of -8% is a good an useful prediction Because near 0 returns are very common a model that always forecasts 0 is hard to beat in terms of Mean Squared Error. But this model is useless! (Torgo et. al.) LIDTA2020 September, 2020 41 / 127 Metrics and the Available Information Di↵erent applications may involve di↵erent type of information on the user preferences This may have an impact on the metrics you can and/or should calculate Independently, there are two classes of metrics: scalar and graphical (Torgo et. al.) LIDTA2020 September, 2020 42 / 127 Evaluation with Full Utility Information Utility Matrices Table where each entry specifies the cost (negative benefit) or benefit of each type of prediction Pred. c1 c2 c3 Obs. c1 B1,1 C1,2 C1,3 c2 C2,1 B2,2 C2,3 c3 C3,1 C3,2 B3,3 Models are then evaluated by the total utility of their predictions, i.e. the sum of the benefits minus the costs. Similar setting for regression using Utility Surfaces (Ribeiro, 2011) R. Ribeiro (2011). “Utility-based Regression”. PhD on Computer Science, Univ. Porto. (Torgo et. al.) LIDTA2020 September, 2020 43 / 127 The Precision/Recall Framework Classification Problems with two classes One of the classes is much less frequent and it is also the most relevant Preds. Pos Neg Obs. Pos True Positives (TP) False Negatives (FN)) Neg False Positives (FP) True Negatives (TN) (Torgo et. al.) LIDTA2020 September, 2020 44 / 127 The Precision/Recall Framework Classification - 2 Preds. P N Obs. P TP FN N FP TN Precision - proportion of the signals (events) of the model that are correct Prec = TP TP + FP Recall - proportion of the real events that are captured by the model Rec = TP TP + FN (Torgo et. al.) LIDTA2020 September, 2020 45 / 127 The F-Measure Combining Precision and Recall into a single measure Useful to have a single measure - e.g. optimization within a search procedure Maximizing one of them is easy at the cost of the other (it is easy to have 100% recall - always predict “P”). What is di cult is to have both of them with high values (Torgo et. al.) LIDTA2020 September, 2020 46 / 127 The F-Measure Combining Precision and Recall into a single measure Useful to have a single measure - e.g. optimization within a search procedure Maximizing one of them is easy at the cost of the other (it is easy to have 100% recall - always predict “P”). What is di cult is to have both of them with high values The F-measure is a statistic that is based on the values of precision and recall and allows establishing a trade-o↵ between the two using a user-defined parameter ( ), F = ( 2 + 1) · Prec · Rec 2 · Prec + Rec where controls the relative importance of Prec and Rec. If = 1 then F is the harmonic mean between Prec and Rec; When ! 0 the weight of Rec decreases. When ! 1 the weight of Prec decreases. (Torgo et. al.) LIDTA2020 September, 2020 46 / 127 The G-Mean and Adjusted G-Mean Gm = r TP TP + FN ⇥ TN TN + FP = p sensitivity ⇥ specificity AGm = ⇢ Gm+Specificity⇥Nn 1+Nn sensitivity 0 0 sensitivity = 0 where Nn is the proportion of majority class examples in the data set. M. Kubat and S. Matwin. “Addressing the curse of imbalanced training sets: one-sided selection.” In Proc. of 14th Int. Conf. on Machine Learning, 1997, Nashville, USA, pp.179-186 R. Batuwita and V. Palade. “A new performance measure for class imbalance learning. Application to bioinformatics problems.” In ICMLA’09, pp.545–550. IEEE, 2009. (Torgo et. al.) LIDTA2020 September, 2020 47 / 127 Metrics for Multiclass Imbalance Problems (i) is the relevance of class i. Di↵erent ways to obtain () depending on the available domain information (Branco, 2017). Rec = 1 CP i=1 (i) CP i=1 (i) · recalli Prec = 1 CP i=1 (i) CP i=1 (i) · precisioni F = (1+ 2)·Prec ·Rec ( 2·Prec )+Rec AvF = 1 CP i=1 (i) CP i=1 (i)·(1+ 2)·precisioni ·recalli ( 2·precisioni )+recalli CBA = CP i=1 (i) · mati,i max CP j=1 mati,j , CP j=1 matj,i ! P. Branco, L. Torgo, and R. Ribeiro. ”Relevance-based evaluation metrics for multi-class imbalanced domains.” PAKDD. Springer, Cham, pp.698-710 (2017). (Torgo et. al.) LIDTA2020 September, 2020 48 / 127 The Precision/Recall Framework Regression For forecasting rare extreme values, the concepts of Precision and Recall were also adapted to regression (Torgo and Ribeiro, 2009; Branco, 2014), prec = P (ˆyi )>tR (1 + U(ˆyi , yi )) P (ˆyi )>tR (1 + (ˆyi )) rec = P (yi )>tR (1 + U(ˆyi , yi )) P (yi )>tR (1 + (yi )) L. Torgo and R. P. Ribeiro (2009). “Precision and Recall for Regression”. In: Discovery Science’2009. Springer. P. Branco (2014). “Re-sampling Approaches for Regression Tasks under Imbalanced Domains”. MSc on Computer Science, Univ. Porto. (Torgo et. al.) LIDTA2020 September, 2020 49 / 127 Summary of Scalar Metrics for Imbalanced Domains Adapted from: P. Branco, L. Torgo and R. Ribeiro. “A Survey of Predictive Modeling on Imbalanced Domains”. In: ACM Comput. Surv. 49-2, 1–31 (2016). P. Branco (2018). ”Utility-based Predictive Analytics”. PhD on Computer Science, Univ. Porto. (Torgo et. al.) LIDTA2020 September, 2020 50 / 127 ROC curve and Precision-Recall Curve Classification 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 False Positive Rate TruePositiveRate random classifier A B C AUC−ROC(A) = 0.87 AUC−ROC(B) = 0.83 AUC−ROC(C) = 0.64 Ideal Model 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Recall Precision AB C AUC−PR(A) = 0.97 AUC−PR(B) = 0.80 AUC−PR(C) = 0.76 Ideal Model Taken from: P. Branco (2018). ”Utility-based Predictive Analytics”. PhD on Computer Science, Univ. Porto. (Torgo et. al.) LIDTA2020 September, 2020 51 / 127 ROC curve and Precision-Recall Curve Regression Taken from: R. Ribeiro (2011). “Utility-based Regression”. PhD on Computer Science, Univ. Porto. (Torgo et. al.) LIDTA2020 September, 2020 52 / 127 Summary of Graphical Metrics for Imbalanced Domains Adapted from: P. Branco, L. Torgo and R. Ribeiro. “A Survey of Predictive Modeling on Imbalanced Domains”. In: ACM Comput. Surv. 49-2, 1–31 (2016). P. Branco (2018). ”Utility-based Predictive Analytics”. PhD on Computer Science, Univ. Porto. (Torgo et. al.) LIDTA2020 September, 2020 53 / 127