ANALYSIS OF RESOLUTION PROOFS (CURRENT STATE) Karel Vaculík IA008 Computational Logic INTRODUCTION Source: VRÁBEL, Patrik. Webové prostředí pro vstup rezolučních důkazů [online]. 2013 [cit. 2013-11-26]. Diplomová práce. Masarykova univerzita, Fakulta informatiky. DATA SOURCES  Spring 2013: IB101 Introduction to Logic ~ 400 students  Fall 2013: IA008 Computational Logic ~ 65 students CURRENT STATE OF ANALYSIS  General overview  Graph mining (Use of time component in future work) GENERAL OVERVIEW GENERAL OVERVIEW  Summary of entities: *without deleted nodes and edges GRAPHS NODES* EDGES* SPRING 1385 13259 11709 FALL 410 4213 3764 TOTAL 1795 17472 15473 GENERAL OVERVIEW  Errors  Simple scripts for finding errors 1 Zle priradený typ klauzúl k typu rezolúcie 2 Chýba druhý rodič 3 Rezolvovanie na dvoch literáloch súčasne (nebo i na více literálech) 4 Opakovanie rovnakého literálu v množine 5 Rezolvovanie na rovnakom literály 6 Preklep, alebo vlastné literály 7 Rezolvovanie v rámci jednej klauzuly 8 Žiadna rezolúcia iba spojenie množín GENERAL OVERVIEW  Errors  Simple scripts for finding errors COUNT % OF GRAPHS SPRING 321 16.10 FALL 66 23.18 TOTAL 387 21.56 1 Zle priradený typ klauzúl k typu rezolúcie 2 Chýba druhý rodič 3 Rezolvovanie na dvoch literáloch súčasne (nebo i na více literálech) 4 Opakovanie rovnakého literálu v množine 5 Rezolvovanie na rovnakom literály 6 Preklep, alebo vlastné literály 7 Rezolvovanie v rámci jednej klauzuly 8 Žiadna rezolúcia iba spojenie množín *Each error type counted in graph only once GRAPH MINING GRAPH MINING – INTRODUCTION  Typical data structure for learning algorithms:  Resolution proofs: attribute 1 Attribute 2 ... Attribute n Class val 11 val 12 ... val 1n class i1 ... ... ... ... ... val m1 val m2 ... val mn class im PREPROCESSING  Create graphs from tables with nodes and edges  Replace malformed nodes with “unallowed” PREPROCESSING  Weakly connected components: PREPROCESSING  Keep graphs with 1 component  Keep only binary trees  New summary: BINARY TREES ORIG. GRAPHS SPRING 1130 1385 FALL 336 410 TOTAL 1466 1795 PREPROCESSING  Append to each tree:  List of errors  Correctness flag (no error)  Quetion type (SLD, linear, general, ...)  Used clause type (set, ordered list) CLASSIFICATION  Using updated application from VACULÍK, Karel. Dolování z grafů pro podporu výuky [online]. 2013 [cit. 2013-11-27]. Diplomová práce. Masarykova univerzita, Fakulta informatiky.  New data structure for learning algorithms: pattern 1 pattern 2 ... pattern m Addit. attr. 1 ... Addit. attr. k Class true false ... false Val 11 ... Val 1k class i1 ... ... ... ... ... ... ... ... false true ... true Val mn ... Val mk class im CLASSIFICATION  Patterns – two approaches:  Frequent subgraphs  Generalized subgraphs FREQUENT SUBGRAPHS  Algorithm for frequent subgraph mining  Specifically, mining from unordered rooted trees  Given the minimum support, find all frequent subtrees in a set of trees  Example (min. sup. 25%):  Application employs Sleuth algorithm (Mohammed J. Zaki, Efficiently Mining Frequent Embedded Unordered Trees. Fundamenta Informaticae, 66(1-2):33-52. Mar/Apr 2005.) GENERALIZED SUBGRAPHS  Find all 3-node subgraphs representing the resolution step  Generalize found subtrees and merge them  Filter out infrequent subtrees EXPERIMENTS  All binary trees (1466)  Classes:  Incorrect proof (187)  Correct proof (1279)  Additional attributes:  Clause type (set, ordered list)  Resolution type (SLD, linear, general, ...)  Generalized patterns (min support 0%)  Oversampling EXPERIMENTS  Classification algorithms  J48  Naive Bayes  SMO  IBk  Evaluation: 10-fold cross validation  Best result so far:  J48  Accuracy: 96.9% FUTURE WORK  Additional analyses  Other graph representations  Sequence mining