Outline Introduction Main Work Evaluation Conclusions
Tracking Recurring Concepts with Meta-Learners
J. Gama1 P. Kosina2
1LIAAD-INESC Porto, FEP-University of Porto
2Faculty of Informatics, Masaryk University, Brno
Fourteenth Portuguese Conference on Artificial Intelligence
Outline Introduction Main Work Evaluation Conclusions
Contents
Introduction
Main Work
Evaluation
Conclusions
Outline Introduction Main Work Evaluation Conclusions
Introduction
* Meta-learning
* Information about relation between tasks/domains and
learning strategies
* Finding proper model
* Data streams
* Real world problems
* Continuous data
* Concept drift
* Change over time
* Recurrent concepts
* Seasonal change
Outline Introduction Main Work Evaluation Conclusions
Drift Detection
* Distribution of data is stationary
* Error-rate decreases with increasing number of examples
* Error-rate increases - warning/drift is reported
* warning
pi + si  pmin + 2  smin
* drift
pi + si  pmin + 3  smin
* where pi is the error-rate and si is standard deviation
Outline Introduction Main Work Evaluation Conclusions
Motivation
* Presence of delay
* Between arrival of example and obtaining label
* Unlabeled items are usualy unused
* Could we use just attributes to predict change
* Referees
Outline Introduction Main Work Evaluation Conclusions
Referee
* What is a referee
* A meta-learning model (level 1 classifier)
* Makes decisions about performance of primary (level 0)
classifier
* How it learns
* Examples with new class labels
* false when level 0 prediction is incorrect
* true when level 0 prediction is correct
Outline Introduction Main Work Evaluation Conclusions
Overview of Learning the Referee
Outline Introduction Main Work Evaluation Conclusions
Method Strategy
* One referee for one concept model
* Before concept drift - ask referees
* After warning level is reached
* Proactive approach
* Select historical (in advance) - does not need class label
* or continue and learn new one
* After concept drift store old model with referee
Outline Introduction Main Work Evaluation Conclusions
Overview of Strategy
Outline Introduction Main Work Evaluation Conclusions
Problems
* Distribution of referee's examples = error-rate of level 0
classifier
* Skewness of data
* Classes were not very discriminative
* Mean of attributes
* Better to start new classifier that use wrong one
Outline Introduction Main Work Evaluation Conclusions
Evaluation - data
* SEA Concepts
* Frequently used benchmark dataset with concept drift
* 3 attributes  2 relevant (sum > threshold)
* 4 different concepts (thresholds) repeated twice
* 120,000 examples
Outline Introduction Main Work Evaluation Conclusions
Evaluation - data
* Hyperplane
* Represents continuously moving hyperplane in d-dimensional
space
* Recurrence?
* LED data
* Proteins
* STAGGER
* Intrusion
Outline Introduction Main Work Evaluation Conclusions
Evaluation Hypothesis
* After drift detection a new model always takes place
* Referees are asked and older model could be re-used
* Models itself are asked and older model could be re-used
Outline Introduction Main Work Evaluation Conclusions
Evaluation - referees
Outline Introduction Main Work Evaluation Conclusions
Evaluation - notes
* Re-used models were from similar concepts  difference in
error was not very significant
* Detection was faster
* 4 times re-used and 3 times drift was sooner (183.5 examples
on average)
* Considering all the warning phases, number of examples in
them was decreased by 80 on average (9.25 %)
Outline Introduction Main Work Evaluation Conclusions
Evaluation - true models
Outline Introduction Main Work Evaluation Conclusions
Evaluation - notes
* Manually re-used models
* Slower increase of error-rate
* Early warning  learning from examples of previous concept
* Drift times were not better than with referee (except the last
one)
Outline Introduction Main Work Evaluation Conclusions
Evaluation - Hyperplane
Outline Introduction Main Work Evaluation Conclusions
Conclusions
* It is not easy task to estimate performance without class
labels - unusable for certain types of data
* We worked with only one classifier, ensemble could improve
performance
* Pros
* Can detect change faster
* Can improve accuracy
* Cons
* Wrong decision can lead to considerable decrease in accuracy
Outline Introduction Main Work Evaluation Conclusions
Thank you for your attention!