Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Data Streams P. Kosina LIAAD-INESC Porto, FI MU Brno Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Contents Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Traditional (Off-line) Data Mining • ’Data entered by people into the computers’ • Deals with limited training set size, the whole learning set is available • Stationary distribution • Speed of processing data is not the top priority factor • Aims to make use of all the available data to create the best model possible • Problems: high variance and over-fitting Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Traditional (Off-line) Data Mining Movie on DVD - you can analyze (evaluate) it, random access... Was it good or bad? Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Data Streams • ’Computers feed data into computers’ • Streams are potentially unbounded in size • Examples are usually incoming very fast • May have non-stationary distribution • Processing speed per example is very important • Utilize summarized information from data rather then all the information possible • Represent most of the real problems • TCP/IP traffic, GPS data, emails, sensor networks etc. Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Data Streams TV series episode - summarized information from previous episodes, sequential access (next episode to be aired) Good or bad? You can evaluate the episode or what you have seen so far. Whole series only when finished. Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Illustrative Example • Gama (2010) • Electrical power distribution networks use sensors placed along network which produce streams of data at high speed • Load forecasting is a very important task • Planning for different time horizons • Support economic decisions to buy/sell electricity (price variation depending on time of trade) Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Illustrative Example • Relevant data mining tasks • Cluster analysis • Identify profiles: urban, rural, industrial etc. • Predictive analysis • Predict the value measured by each sensor for different time horizons • Predict peaks in demand • Monitoring evolution • Detect changes in the behavior of sensors • Detect failures and abnormal activities • Identify peaks in demand • Identify critical points in load evolution Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Basic Approaches and Models Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Simple Statistics from Data Streams • Incremental mean • Only two variables in memory to maintain • number of observations n • sum of values seen so far n t=0 xt xn = (n−1)×xn−1+xn n • Incremental standard deviation • One more variable • sum of the squares n t=0 x2 t σn = ( n t=0 x2 t )− ( n t=0 xt )2 n n−1 • You can directly apply these statistics to compute incremental Na¨ıve Bayes Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Simple Statistics from Data Streams • Correlation coefficient • Given two streams x and y we need to maintain • sum of each stream n t=0 xt and n t=0 yt • sum of the squares n t=0 x2 t and n t=0 y2 t • sum of the crossproduct n t=0(xt × yt ) corr(a, b) = n t=0(xt ×yt )− n t=0 xt × n t=0 yt n n t=0 x2 t − n t=0 x2 t n n t=0 y2 t − n t=0 y2 t n Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Sliding Windows • Situations where recent past is more interesting than complete history • FIFO structure: element xt observed, element xt−w forgotten; w is window size • Two basic types • Sequence based • size defined by number of observations • Timestamp based • size defined by duration: elements with timestamp within a time interval T • Necessary to maintain all the examples within the window in memory • e.g., S = 150 t=101 x2 t updated as Si+1 = Si + x2 151 − x2 101 Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Sliding Windows • Many variations of windows • Landmark window • ’growing’ window • typically, landmark is a new day • Natural tilted time window • Logarithmic tilted time window • different granularity of stored information (different levels of approximation) • recent history more precise (more points), less points for older • Adaptive sliding window Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Sliding Windows Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Reducing Data Size • Alternative to sliding windows are data reduction techniques • Sampling • Common method to reduce data size and computational costs • How to obtain an unbiased sample of data? • random sampling • reservoir sampling • ... • Synopsis and histograms • Data transformation: Wavelets, Discrete Fourier Transformation Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Data Stream Classifiers • It must require small constant time per record • It must use only a fixed amount of main memory • It must be able to build a model using at most one scan of the data • It must make a usable model available at any point in time since it may never be done processing Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Data Stream Classifiers • It should ideally produce a model that is equivalent (or nearly identical) to the one that would be obtained by corresponding ordinary database mining algorithm, operating without the above constraints. • When the data-generating phenomenon is changing over time (e.g. when concept drift is present), the model at any time should be up-to-date, but also include all the information from the past that has not become outdated. Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Decision Trees • Tree - directed acyclic graph • Node • Decision node - has two or more successors, condition base on attribute values • Leaf node - class label • High degree of interpretability • Divide-and-conquer strategy • Complex problem recursively divided into simpler sub-problems • Solutions combined to provide solution of the complex problem Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Decision Trees - tree structure Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Decision Trees • Conditions in decision nodes include a splitting attribute, branches are defined by operator (e.g., ≤, =, ∈ etc.) and value from attribute’s domain • e.g., Color = green • Test creates hyperplane orthogonal to axis of tested attribute and parallel to all the others • Each leaf corresponds to a region (hyper-rectangle) • Regions are mutually exclusive and exhaustive (i.e., fully cover the instance space) Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Decision Trees - instance space Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Hoeffding Trees a.k.a Very Fast Decision Trees • Domingos (2000) • Tree grows over time • Each incoming example traverses from root to leaf and updates sufficient statistic • Statistics for heuristic function to compute the merit of split tests • Leaf node is replaced by decision node when indicated by statistical support • i.e., enough evidence that one split is superior to the others Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Fruit Data Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re VFDT Statistics - Nominal Attributes • υAi =< n1, n2, . . . , nk >, • υAi vector for value i of attribute A of counts for each class 1, 2, . . . , k Table: Attribute Color Class Value Apple Plum Strawberry Red 20 0 10 Blue 0 10 0 Green 10 0 0 • Summary of observed instances • And provides counts necessary to compute entropy (or other measures) Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re VFDT Statistics - Numerical Attributes • Numerical attributes are very common and are more challenging to handle • Different methods for deciding cut point with different demands • On-line methods avoid sort operation • Examples of Techniques • Gaussian distribution • Discriminant analysis • Exhaustive method Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re VFDT Statistics - Exhaustive method • Gama et al. (2003) • Each leaf contains binary tree structure for each attribute A • Node in binary tree is identified by value i of the attribute • Node has two vectors of dimension k, VE and VH, containing counts (for each class) of values ≤i and >i that passed via the node • To evaluate merit of given attribute it computes information gain for each possible cut point Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re VFDT Statistics - Exhaustive method computation • Minimize: info(Aj (i)) = P(Aj ≤ i) × iLow(Aj (i)) + P(Aj > i) × iHigh(Aj (i)) i is the value of cut point, iLow(Aj (i)) information of Aj ≤ i and iHigh(Aj (i)) information of Aj > i • iLow(Aj (i)) = − K P(K = k|Aj ≤ i) × log2(P(K = k|Aj ≤ i)) • iHigh(Aj (i)) = − K P(K = k|Aj > i) × log2(P(K = k|Aj > i)) Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re VFDT Statistics - Exhaustive method algorithms Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Hoeffding Bound • After n independent observations of a real-valued variable r with range R, HB ensures with confidence 1 − δ that the true mean of r is at least r − where r is the observed mean. • Statistical guarantee that one split attribute is better than others • It decides sufficient sample size of observed examples • Independent of distribution, but requires more examples • H(·) evaluation function, R range (R = 1 for probability,R = log2(#classes) for information gain), confidence 1 − δ • = R2 ln(1/δ) 2n • After n observed instances in leaf l, xa and xb are the attributes with highest H(·) respectively Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Hoeffding Bound • If H(xa) − H(xb) > , xa is the best split attribute with confidence 1 − δ otherwise need to observe more examples • Problem 1: two attributes are equally ’important’ and constantly achieve the same H(·) • Solution: tie-breaking constant. Split for small enough • Problem 2: computationally demanding evaluate split with every example • Solution: compute after number of observations. Not likely to change with every example. Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Extensions • Popular classifier ⇒ many variation, extensions • Change reaction (Hulten et al., 2001) • Functional leaves (Gama et al., 2003) • Exploit information in the leaves to use Na¨ıve Bayes prediction • Option trees (Holmes et al., 2003) • ...and many more Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Change Detection Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Introduction to Drift • Previous assumptions: examples generated at random from some stationary probability distribution • Processes generating data in real world are dynamic and may change due to some external or internal influence • ’Modern’ approaches focus on adaptation • Examples • Seasonal changes • Wear of tools in industrial processes • Gained resistance to certain type of antibiotics in medicine Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Specification of the Problem • Data stream as sequences < S1, S2, . . . , Sm, . . . > where each Si is generated by some stationary distribution Di • Sequence S is context • Problem is to detect change points • Transition phase - examples from both distributions appear • Example from Di is noise for Di+1 • Noise vs. change, combine/balance robustness and sensitivity • Persistence of distribution Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Change Characteristics • Can be caused by change in one or more attributes • Or hidden changes (not reflected directly) • Changes ca be of various rates - commonly two basic types • Concept drift - gradual change • Concept shift - abrupt/sudden change Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Types of Approaches Data management ’Forgetting’ Detection methods ’Detectors’ Decision model management ’Dynamic ensembles’ Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Data Management • Information in memory for maintaining consistent model • Full memory • Statistics over all examples • Impact of examples weighted by age • e.g., exponential wλ(x) = exp(−λtx ), λ decay factor, if 0 all x have equal weight • Gradual forgetting • Partial memory • Only most recent examples • Sliding windows (fixed, adaptive...) • small - fast adaptation to changes • large - slower adaptation, better performance for long stable contexts • Abrupt forgetting Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Drift Detection Methods • Previous methods are blind • Detection methods are explicit, provide information about • Time points when change appeared • Small time windows in which drift occurred • Two examples of DDM in predictive learning • Statistical Process Control • Page-Hinkley test Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Statistical Process Control (SPC) • Grant and Leavenworth (1996); Gama et al. (2004) • Hypothesis: item distribution of data is stationary • Error-rate decreases with increasing number of examples • Error-rate increases significantly ⇒ not stationary distribution Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Statistical Process Control (SPC) • pt is error-rate, st is standard deviation, pmin, smin respective values corresponding to min(pt + st) • Out-of-Control pt + st > pmin + α × smin • In-Control pt + st < pmin + β × smin • Warning pmin + α × smin > pt + st > pmin + β × smin • Constants α and β define confidence level (α = 3 and β = 2 for 99% and 95% resp.) Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Statistical Process Control (SPC) Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Page-Hinkley • Hinkley (1970) • Change detection in signal processing • Cumulative difference between error-rate values and its mean till n: mn = n t=1 (pt − pt − δ) where pt = 1 t t i=1 pi and parameter δ is magnitude of change not raising alarm. • Minimum: Mn = min (mt, t = 1 . . . n) • Change reported when: mn − Mn > λ Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Page-Hinkley Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Decision model management • Data generated from multiple distributions ⇒ multiple decision models • How many models keep in memory? • Which models keep in memory? • Dynamic Weighted Majority (DWM) algorithm Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re DWM • Kolter and Maloof (2003) • Ensemble weighted predicting using majority vote • Models use same algorithm but trained on different data • Each model has associated weight • Incorrect prediction decreases weight of model • Dynamically creates and removes models based on the performance • New model is added when global prediction is incorrect • Remove models when weight drops below given threshold Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Evaluation of Stream Mining Algorithms Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Adaptive Learning Evaluation Methods • Different goals • High accuracy • Efficiency • Time • Space • Resources - battery consumption • Change detection quality • Detection delay • False positive/negative drift signals Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Traditional Evaluation vs. Stream Evaluation • Off-line training phase terminates within ’reasonable’ time, learned model is the ’final product’ • Testing after training process is finished • Various methods and metrics • On-line task can never be finished learning • Must be interactive • Outputs from learned model are demanded during the process • Performance is required to be evaluated throughout the process Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re On-line Evaluation • Gama et al. (2009) • Holdout evaluation • Independent examples kept for test set • Current model is evaluated using holdout set periodically • Prequential (Predictive Sequential) evaluation • With every training example, prediction is made before training the model (using only attribute values) • Cumulative loss is computed S = n t=1 L(yt, ˆyt) • Than train the model with given example • Prequential evaluation can employ fading factor α to elevate the information about most recent performance or compute prequential error using a window of recent examples Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Fading Factor Prequential • Prequential error with correction factor Et = St Bt = Lt +α×St−1 1+α×Bt−1 where B1 = 1 Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re RAM Hours • Bifet et al. (2010) • Stream algorithms aim to achieve high time and space efficiency (more often than off-line algorithms) because of • Limited resources • Big data • Time and space measure in one metric • Every GB of RAM deployed for 1 hour • Cloud computing rental cost inspiration Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Application Examples Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Sensors • Gama and Gaber (2007) • Usually small devices with very limited resources • Memory • Computational power • Power consumption (battery life) • Prone to failures or to providing otherwise corrupted data • Usually set to execute many readings per second (but highly depend on type of application) • Processing can be done (to some extend) on the device to reduce communication costs Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Sensors in Predicting Natural Disasters • Include avalanche, drought, earthquake, flood, hurricane, tsunami, volcano, wildfire... • Different sensors remote sensors, satellite images, radars, earth observing sensors • Various related - measure temperature, rainfall, wind, land cover... • Represent huge volumes of data incoming at fast rate • Needs to be processed in real time to provide timely prediction of natural disaster threat Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Food Sales • Having too much of perishable goods ⇒ loss when not sold • Empty shelves ⇒ unsatisfied customers • Static model is not appropriate • Holidays • Season, temperature • Events (sport, cultural) • Advertisement, promotions • Multiple models • Contextual (meta-learning) approach • External contextual data (weather etc.) Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Marketing Applications • Recommendation • User preferences and interests evolve • Influenced by media, season, work etc. • Web clickstream analysis Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Security Applications • Intrusion detection / Fraud detection • Millions of connections/transactions need to be processed • New types of attacks appear • Crime volume prediction • Planning support Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Spam Detection • Enormous amounts of Spam sent/received every day • Feedback from users (’Report unrecognized Spam’) • Spam content is evolving (Viagra, fake Rolex, help a guy from Nigeria, etc.) Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Summary • Data stream setting • Most data generated nowadays are continuously incoming, potentially unbounded in size • Changes over time • Characteristics of data stream mining algorithms • Sub-linear space, sequential access to data, constant time per example, any-time • Very Fast Decision Tree • Reaction to changes • Importance of adaptation • Different techniques • Algorithm assessment during the process • Variety of applications (many to be explored yet!) Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re Conclusion Stream mining is the way to go! Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re References I Bifet, A., G. Holmes, B. Pfahringer, and E. Frank (2010). Fast perceptron decision tree learning from evolving data streams. In Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II, PAKDD’10, Berlin, Heidelberg, pp. 299–310. Springer-Verlag. Domingos, P. (2000). Mining high-speed data streams. pp. 71–80. ACM Press. Gama, J. (2010). Knowledge Discovery from Data Streams. Chapman and Hall/CRC. Gama, J. and M. Gaber (2007). Learning from Data Streams – Processing techniques in Sensor Networks. Springer. Gama, J., P. Medas, G. Castillo, and P. Rodrigues (2004). Learning with drift detection. In SBIA Brazilian Symposium on Artificial Intelligence, pp. 286–295. Springer. Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re References II Gama, J., R. Rocha, and P. Medas (2003). Accurate decision trees for mining high-speed data streams. In Proceedings of the Ninth International Conference on Knowledge Discovery and Data Mining. ACM Press, New York, NY. Gama, J., R. Sebastiao, and P. P. Rodrigues (2009). Issues in evaluation of stream learning algorithms. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’09, New York, NY, USA, pp. 329–338. ACM. Grant, E. and R. Leavenworth (1996). Statistical Quality Control. McGraw-Hill. Hinkley, D. (1970). Inference about the change point from cumulative sum-tests. Biometrika 58, 509–523. Holmes, G., R. Kirkby, and B. Pfahringer (2003). Mining data streams using option trees. Technical report, States of America. Introduction Basic Approaches and Models Change Detection Evaluation of Stream Mining Algorithms Application Examples Re References III Hulten, G., L. Spencer, and P. Domingos (2001). Mining time-changing data streams. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 97–106. ACM, New York, NY, USA. Kolter, J. Z. and M. A. Maloof (2003). Dynamic weighted majority: A new ensemble method for tracking concept drift. In Proceedings of the 3th International IEEE Conference on Data Mining, pp. 123–130. IEEE Computer Society.