Explanation of rare events
Need for explanation of outliers
A user need to understand why an instance is detected as an outlier
For many applications, explanation (interpretation, description,
outlying property detection, characterization) of outliers is as
important as identiﬁcation
Outlier factor (degree) and ranking is only quantitative information
Not only for high-dimensional data we need qualitative information
Based also on ODD v5.0: Outlier Detection De-constructed ACM SIGKDD 2018
Workshop keynote speeches, namely Making sense of unusual suspects - Finding and
Characterizing Outliers (Ira Assent) and Outlier Description and Interpretation (Jian
Pei)
(Torgo et. al.) LIDTA2020 September, 2020 97 / 127
How to generate explanation?
Compare with inlying data as well as conﬁrmed outlying data
Find outlier explanatory component / outlying property / outlier
context / outlier characteristic
Help domain expert in verifying outliers and understanding how the
outlier method works
(Torgo et. al.) LIDTA2020 September, 2020 98 / 127
What is meaningfull explanation
A method for ﬁnding of explanation must be
helpful for a user, namely easy to understand. E.g. the smallest
subset of attributes
e cient, scalable
Most frequent approaches
visual
look for a subset of attributes where each outlier has its own
explanatory subspace
(Torgo et. al.) LIDTA2020 September, 2020 99 / 127
Finding the most important attributes
For an object q, ﬁnd the subspaces where q is most unusual compared to
the rest of the data
A 3D space {x, y, z} and all its 2D projections. {x, z} is an explanatory subspace
(Micenkova 2015)
(Torgo et. al.) LIDTA2020 September, 2020 100 / 127
Strongest, weak and trivial outliers
Knorr and Ng 1998
Non-trivial outliers
P is a non-trivial outlier in space A if P is not an outlier in any subspace
of A.
Strongest outlier
The space A containing one or more outliers is called a strongest outlying
space if no outlier exist in any subspace of A.
Any P that is an outlier in A is called a strongest outlier.
Any non-trivial outlier that is not strongest is called weak outlier.
(Torgo et. al.) LIDTA2020 September, 2020 101 / 127
Example: NHL ice hockey players
Knorr and Ng 1999
5-D space {A, B, C, D, E} of power-play goals, short-handed goals,
game-winning goals, game-tying goals, and game played
Lattice representation
(Torgo et. al.) LIDTA2020 September, 2020 102 / 127
Explaining outliers by subspace separability
(Micenkova and Ng 2013)
Cannot derive explanatory subspace just by analyzing vicinity of the
point in full space ) need to consider di↵erent subspace projections
no monotonicity property for outliers wrt. subspaces
need for heurstics because of exponential complexity,
look for a subspace A where the outlier factor is high and the
dimension of A is low
separability - instance outlierness is related to its separability from the
rest of the data
B. Micenkov´a, R. T. Ng, X. H. Dang, and I. Assent. Explaining outliers by subspace
separability. In IEEE ICDM 2013
(Torgo et. al.) LIDTA2020 September, 2020 103 / 127
Outlierness as accuracy of classiﬁcation
(Micenkova and Ng 2013)
separablity as error at classiﬁcation. Assume that the data follows a
distribution f
original data = inlierclass; outlier + artiﬁcial points = outlierclass
use standard feature selection methods to ﬁnd explanatory subspaces
Measuring outlierness by separability. p1, p2 are points from the distribution f (x) and
the normal distributions gp1(x) and gp2(x) were artiﬁcially generated.
(Torgo et. al.) LIDTA2020 September, 2020 104 / 127
RF-OEX: Analysis of Random Forest
two methods: 1. search for frequent branches and 2. reduction of trees
NEZVALOV´A, Leona et al. Class-Based Outlier Detection: Staying Zombies or Awaiting
for Resurrection? In Proceedings of IDA 2015.
(Torgo et. al.) LIDTA2020 September, 2020 105 / 127
RF-OEX
Examples of explanation
Form: (Condition, certainty factor)
Zoo dataset
Instance number: 64, Class: mammal
eggs=true, 0.51
toothed=false, 0.49
Iris dataset
Instance number: 19, Class: Iris-setosa
sepallength >= 5.5 && sepalwidth < 4, 0.53
sepallength >= 5.5, 0.47
(Torgo et. al.) LIDTA2020 September, 2020 106 / 127
Recent work
Beyond Outlier Detection: LookOut for Pictorial Explanation, ECML
PKDD. (Gupta et al. 2018)
Explaining anomalies in groups with characterizing subspace rules.Data
Mining and Knowledge Discovery (2018) 32 (Macha and Akoglu 2018)
Oui! Outlier Interpretation on Multi-dimensional Data viaVisual Analytics
Eurographics Conference on Visualization (EuroVis) (Xun Zhao et al.
2019)
Sequential Feature Explanation for Anomaly Detection. ACM Transactions
on Knowledge Discovery from Data, Vol. 13, No. 1, (Siddiqui et al. 2019)
Towards explaining anomalies. A deep Taylor decomposition of one-class
models. Pattern Recognition 101 (2020) 1071098 (Kau↵mann et al. 2020)
(Torgo et. al.) LIDTA2020 September, 2020 107 / 127