Explanation of rare events Need for explanation of outliers A user need to understand why an instance is detected as an outlier For many applications, explanation (interpretation, description, outlying property detection, characterization) of outliers is as important as identification Outlier factor (degree) and ranking is only quantitative information Not only for high-dimensional data we need qualitative information Based also on ODD v5.0: Outlier Detection De-constructed ACM SIGKDD 2018 Workshop keynote speeches, namely Making sense of unusual suspects - Finding and Characterizing Outliers (Ira Assent) and Outlier Description and Interpretation (Jian Pei) (Torgo et. al.) LIDTA2020 September, 2020 97 / 127 How to generate explanation? Compare with inlying data as well as confirmed outlying data Find outlier explanatory component / outlying property / outlier context / outlier characteristic Help domain expert in verifying outliers and understanding how the outlier method works (Torgo et. al.) LIDTA2020 September, 2020 98 / 127 What is meaningfull explanation A method for finding of explanation must be helpful for a user, namely easy to understand. E.g. the smallest subset of attributes e cient, scalable Most frequent approaches visual look for a subset of attributes where each outlier has its own explanatory subspace (Torgo et. al.) LIDTA2020 September, 2020 99 / 127 Finding the most important attributes For an object q, find the subspaces where q is most unusual compared to the rest of the data A 3D space {x, y, z} and all its 2D projections. {x, z} is an explanatory subspace (Micenkova 2015) (Torgo et. al.) LIDTA2020 September, 2020 100 / 127 Strongest, weak and trivial outliers Knorr and Ng 1998 Non-trivial outliers P is a non-trivial outlier in space A if P is not an outlier in any subspace of A. Strongest outlier The space A containing one or more outliers is called a strongest outlying space if no outlier exist in any subspace of A. Any P that is an outlier in A is called a strongest outlier. Any non-trivial outlier that is not strongest is called weak outlier. (Torgo et. al.) LIDTA2020 September, 2020 101 / 127 Example: NHL ice hockey players Knorr and Ng 1999 5-D space {A, B, C, D, E} of power-play goals, short-handed goals, game-winning goals, game-tying goals, and game played Lattice representation (Torgo et. al.) LIDTA2020 September, 2020 102 / 127 Explaining outliers by subspace separability (Micenkova and Ng 2013) Cannot derive explanatory subspace just by analyzing vicinity of the point in full space ) need to consider di↵erent subspace projections no monotonicity property for outliers wrt. subspaces need for heurstics because of exponential complexity, look for a subspace A where the outlier factor is high and the dimension of A is low separability - instance outlierness is related to its separability from the rest of the data B. Micenkov´a, R. T. Ng, X. H. Dang, and I. Assent. Explaining outliers by subspace separability. In IEEE ICDM 2013 (Torgo et. al.) LIDTA2020 September, 2020 103 / 127 Outlierness as accuracy of classification (Micenkova and Ng 2013) separablity as error at classification. Assume that the data follows a distribution f original data = inlierclass; outlier + artificial points = outlierclass use standard feature selection methods to find explanatory subspaces Measuring outlierness by separability. p1, p2 are points from the distribution f (x) and the normal distributions gp1(x) and gp2(x) were artificially generated. (Torgo et. al.) LIDTA2020 September, 2020 104 / 127 RF-OEX: Analysis of Random Forest two methods: 1. search for frequent branches and 2. reduction of trees NEZVALOV´A, Leona et al. Class-Based Outlier Detection: Staying Zombies or Awaiting for Resurrection? In Proceedings of IDA 2015. (Torgo et. al.) LIDTA2020 September, 2020 105 / 127 RF-OEX Examples of explanation Form: (Condition, certainty factor) Zoo dataset Instance number: 64, Class: mammal eggs=true, 0.51 toothed=false, 0.49 Iris dataset Instance number: 19, Class: Iris-setosa sepallength >= 5.5 && sepalwidth < 4, 0.53 sepallength >= 5.5, 0.47 (Torgo et. al.) LIDTA2020 September, 2020 106 / 127 Recent work Beyond Outlier Detection: LookOut for Pictorial Explanation, ECML PKDD. (Gupta et al. 2018) Explaining anomalies in groups with characterizing subspace rules.Data Mining and Knowledge Discovery (2018) 32 (Macha and Akoglu 2018) Oui! Outlier Interpretation on Multi-dimensional Data viaVisual Analytics Eurographics Conference on Visualization (EuroVis) (Xun Zhao et al. 2019) Sequential Feature Explanation for Anomaly Detection. ACM Transactions on Knowledge Discovery from Data, Vol. 13, No. 1, (Siddiqui et al. 2019) Towards explaining anomalies. A deep Taylor decomposition of one-class models. Pattern Recognition 101 (2020) 1071098 (Kau↵mann et al. 2020) (Torgo et. al.) LIDTA2020 September, 2020 107 / 127