Facility management and mining spatio-temporal data Luboš Popelínský^1 and Petr Glos^2 ^1Knowledge Discovery Lab, Faculty of Informatics, Masaryk University ^2Institute of Computer Science, Masaryk University Botanická 68a, 60200 Brno popel@fi.muni.cz, glos@ics.muni.cz Abstract. We present ongoing project on mining spatio-temporal frequent patterns from facility management data. We introduce data on facility management in MU. After introduction of spatio-temporal first-order patterns we focus on two tasks[INS: that :INS] are of importance for facility management[INS: :INS] [DEL: :DEL] - mining frequent patterns and mining rare events in spatio-temporal data. Key words: facility management, data mining, spatio-temporal data, frequent patterns, association rules 1 Facility management The definition of FM provided by the European Committee for Standardization (CEN) and ratified by BSI British Standards is: “Facilities management is the integration of processes within an organization to maintain and develop the agreed services which support and improve the effectiveness of its primary activities”. A definition provided by the International Facility Management Association (IFMA) http://www.ifma.org/ facility management is a profession that encompasses multiple disciplines to ensure functionality of the built environment by integrating people, place, process and technology. Masaryk University maintains a digital version of building passport that currently consists of approximately 200 buildings and 17,000 rooms. For their representation in a geodatabase, several distinct constructions (building primitives) were defined and every building is made up of these objects. The resulting passport data is available to university employees, students and even for public via the internet/intranet as well as it is used by other university's information systems. The university plans to create also a technological passport, which is closely related to the building passport. Additionally, the building passport is used to generate 3D models of the buildings. Masaryk University also implements Building Management System (BMS) based on BACNet open protocol. BMS provides a means of storing a historical building operation data in the relational database. BMS database contains data of building environment (e.g. room temperature, room humidity, room air pressure difference), status of technologies (e.g. run/stop, fault) and consumptions (electric energy, cold and hot water, gas). The common goal of facility management methods is increasing of building operation cost. We need to recognize dependencies and relationships between BMS data to find out where and how we can cut down operation cost. It is not surprising that a need for deep analysis of this data arised. Besides of visual analytics tools, and in collaboration with them, we aim at applying data mining methods - frequent patterns mining and association rules mining [5, 6]. 2 Mining frequent patterns Frequent patterns (also called large itemsets) have been originaly defined as propositional formulas that are true for at least a given fraction of items in a database [1]. This fraction is called a minimal support. The example is a set of baskets of consumers in a supermarket. In this case, the frequent patterns brings an information about products that appear frequently together in those baskets. A frequent pattern in predicate logic is a logical conjunction of elementary formulas (atoms) that is frequent for a given data. Here we focus on spatio-temporal logic which extends a predicate logic with temporal operators (e.g. AFTER, BEFORE, ALWAYS, SOMETIMES) and spatial functions (e.g. LEFT-TO, INCLUDED, SOUTH-OF). An example (from windstorm data analysis) is a formula „AFTER a wind K, in the period 1971-72 ALWAYS a wind was strong“ . 3 Spatiotemporal frequent patterns For this work, spatiotemporal data are supposed to be a sequence of events. An event has a unique identifier and is connected with an explicit time instant. In the case that the data not contain an explicit time attribute, it can be substituted with an order of this event in the sequence. At least one attribute must be spatial. It can be x- and y-coordinates (e.g. in windstorm data coordinates of a place) or identifier of an area (e.g. the name of a district). There is no limit on the number of events with the equal time stamp. We also allow attributes of complex type: not only atomic but also of the type of list [9]. A domain knowledge is a set of predicate definitions. A spatiotemporal pattern (or shortly pattern) is a conjunction of non-spatiotemporal and spatiotemporal atoms. Negation is not allowed in a pattern. A non-spatiotemporal atom is either of the form Attribute Operator Value where an operator is '=' for categorical attributes and '=', '=<', '<' for numerical attribute, or is defined by a predicate from domain knowledge that does not have a temporal attribute as its argument. A spatiotemporal atom can be temporal – NEXT, ALWAYS, SOMETIMES – or spatial , e.g.. dc(X,Y) (X is disconnected from Y). The problem of mining spatiotemporal maximal frequent patterns is then to find all frequent spatiotemporal patterns, i.e. those that cover at least M examples (M is usually called a minimal support that cannot be further refined without decreasing support below M. In [9] we introduced a new version of refinement specialization operator for efficient mining in spatiotemporal data. It has been implemented in ILP system RAP. RAP [3] is a tool for mining first-order maximal frequent patterns that employs different search strategies for mining long patterns. Frequent patterns learned with RAP has been successfully used as new features for knowledge discovery in mining medical data (STULONG), in information extraction from biomedical text and as well as for classification of small pieces of text in reports on flood. 4 Spatial co-location patterns A co-location pattern is a group of spatial features/events that are frequently co-located in the same region [8]. More formally, a set of spatial features form a pattern if, for each spatial feature, at least s% instances of that feature form a clique with some instance of all the rest features in the pattern for a given neighborhood relationship. The parameter s is called the participation index. A neighborhood relation can be defined as a distance (e.g. Euclidean), as a topological relation or something else (see [7] for various neighborhood relations). Mining spatial co-location patterns differs from frequent pattern mining. Instead of item set and a minimal support in frequent pattern mining we have spatial feature set and spatial interestingness measure. 5 Mining rare events In facility management data mining it is also important to find patterns that are rare but important for precaution. Such event is e.g. fire (or an increase of a temperature), fast repeated switching on/off of a device, or water pipe disruption. In this case it is not a frequent pattern what we are looking for but rather a frequent correlation – mostly spatial or temporal – of two or more attributes. In [8] a novel method based on Apriori [1] algorithm has been described. They introduce a new measure, maximal participation ratio, which allows finding spatial co-location patterns in the presence of rare spatial features. 6 Finding spatio-temporal patterns in facility management data We will adapt two approaches described above to facility management data. We will focus both to mining frequent patterns and to mining rare events. In the case of frequent patterns mining the first goal is choose/develop appropriate spatiotemporal logic that will be a refinement of the general case introduced in [9]. It contains a definition of neighborhood relations (and consequently spatial relations) that are the most appropriate for facility management data, general enough and efficient to evaluate. Multirelational data mining methods [4] seems be the most convenient because they, first, allow to use domain knowledge in a natural way, and, second, can be easily incorporated with e.g. constraint logic programming. We will also explore how the existing methods for mining spatiotemporal patterns [3] are related to, or can be modified for, mining co-location patterns. We will look also for search strategies and post-processing methods that allow adapting existing algorithms for finding rare events in efficient way. We will extent the concept of maximal participation index introduced in [8] for learning in more powerful spatiotemporal logic. Following [2] we will look for pattern and rule measures that enable to limit search space and filter the most interesting patterns. References PDF 1. Agrawal R., Srikant R. Fast Algorithms for Mining Association Rules. In Proc. 20th Int. Conf. Very Large Data Bases, VLDB 1994, pp. 487-499. [My 2. Azevedo, P. and Jorge A. M. (2007) “Comparing Rule Measures for Predictive Association Rules”, in Proceedings of ECML’07 pp 510-517. 3. Blat'ák J., Popelínský, L.: Mining first-order maximal frequent patterns. Neural Network World 5, 4, pp. 381-390 4. Džeroski S., Lavrač N. Relational data mining. Springer Verlag 2001. 5. Glos, P. Building and Technology Passport of Masaryk University. In Proceedings of 2008 ESRI International User Conference. San Diego, California : ESRI Press, 2008 6. Glos, P. Using ArcGIS for Visualizing Historical Data from BMS. In Proceedings of 2009 ESRI International User Conference. San Diego California : ESRI Press, 2009 7. Ester M. et al. Spatial Data Mining: A Database Approach. Advances in spatial databases: 5th international symposium, SSD '97, Berlin, 1997. 8. Huang Y., Pei J., Xiong H. Mining C-Location Patterns with Rare Events from Spatial Data Sets. Geoinformatika 10, 2006, pp. 239-260. 9. L. Popelínský, J. Blat'ák: Toward mining of spatiotemporal maximal frequent patterns. In Proceedings of ECML/PKDD Workshop on Mining Spatio-Temporal Data (MSTD), Porto 2005. 10. Simoff S. J., Böhlen, M. H., Mazeika A. (eds.) (2008) Visual Data Mining. LNCS 4404 Springer Verlag 2008.