Short introduction to stream analysis using MOA Massive Online Analysis Martin Juřen Fakulta informatiky Masarykova univerzita Brno, 2011 Martin Juřen (Masarykova univerzita) < * ► s -00.0 Brno, 2011 1/10 Content O Data stream evaluation Q Work with MOA • Basic steps • Data stream generators • Data stream classifiers • Data stream clustering • Data stream algorithm evaluation Q MOA Demonstration Q Bibliography Martin Jufen (Masarykova univerzita) MOA Data stream evaluation Requirements on algorithms • Process an example at a time, inspect it only once • Limited amount of memory and time q Predict at any time Martin Jufen (Masarykova univerzita) < * ► s -00.0 Brno, 2011 3/10 Work with MOA Basic steps using MOA Basic steps Q Choose and configure data stream generator Q Choose an configure an algorithm O Choose an configure an evaluation method □ Martin Jufen (Masarykova univerzita) MOA Work with MOA Data stream generators Generators ArffFileStream Input from file ConceptDriftStream It generates stream with concept drift FilteredStream:AddNoiseFilter It generates stream with noise AgrawalGenerator Based on text: Rakesh Agrawal, Tomasz Imielinski, Arun Swami: Database Mining: A Performance Perspective. IEEE Transactions on Knowledge and Data Engineering, 1993. Some other generators And their concept drift variants Martin Jufen (Masarykova univerzita) < * ► s -00.0 Brno, 2011 5/10 Work with MOA Data stream classifiers Classifiers Majority class most frequently observed class Hoeffding tree (and variants) This algorithm stands on the fact that small sample could be enough to find an optimal splitting attribute. Naive Bayes Decision Stump single-level decision trees Martin Jufen (Masarykova univerzita) < * ► s -00.0 Brno, 2011 6/10 Work with MOA Data stream clustering Clustering CobWeb Not quite data stream algorithm CluStream Temporal extensions of cluster feature vector, micro-clusters. Storing in snapshots in pyramidal pattern. ClusTree Parameter-free algorithm. Capable to detect concept drift. Martin Jufen (Masarykova univerzita) < * ► s -00.0 Brno, 2011 7/10 Evaluation Work with MOA Data stream algorithm evaluation Holdout Periodically testing the model with one test set Test-Then-Train The model is firstly tested by new data until then it is trained. Martin Juřen (Masarykova univerzita) < * ► s -00.0 Brno, 2011 8/10 Demonstration MOA Demonstration • It is a new software, there is a lot of bugs. • Written in Java, Open source project • It could be linked to WEKA • Goal: running experiments, evaluating algorithms, algorithm comparison Martin Juřen (Masarykova univerzita) < * ► s -00.0 Brno, 2011 9/10 Bibliography Bibliography • MOAteam: moa.cs.waikato.ac.nz • Biffet A., Kirkby R., Krannen P., Reutemann P.: Massive Online Analysis: Manual. Online at moa.cs.waikato.ac.nz, May 2011. • Biffet A., Kirkby R.: Tutorial 1. Introduction to MOA: {Mjassive {Ojnline {Ajnalysis. Online at moa.cs.waikato.ac.nz, January 2011. Martin Jufen (Masarykova univerzita) ► < * ► s -00.0 Brno, 2011 10/10