Information Sciences 269 (2014) 35-47 ELSEVIER Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins INFORMATION SCIENCES A novel approach for change detection of remotely sensed images using semi-supervised multiple classifier system Moumita Roya, Susmita Ghosha, Ashish Ghosh b* # CrossMark a Department of Computer Science and Engineering, Jadavpur University, Kolkata, India b Center for Soft Computing Research, Indian Statistical Institute, Kolkata, India ARTICLE INFO Article history: Received 17 July 2013 Received in revised form 23 December 2013 Accepted 25 January 2014 Available online 14 February 2014 Keywords: Change detection Elliptical basis function neural network Fuzzy fc-nearest neighbor classifier Multilayer perceptron Ensemble classifier Semi-supervised learning ABSTRACT In this article, a novel approach using ensemble of semi-supervised classifiers is proposed for change detection in remotely sensed images. Unlike the other traditional methodologies for detection of changes in land-cover, the present work uses a multiple classifier system in semi-supervised (leaning) framework instead of using a single weak classifier. Iterative learning of base classifiers is continued using the selected unlabeled patterns along with a few labeled patterns. Ensemble agreement is utilized for choosing the unlabeled patterns for the next training step. Finally, each of the unlabeled patterns is assigned to a specific class by fusing the outcome of base classifiers using some combination rule. For the present investigation, multilayer perceptron (MLP), elliptical basis function neural network (EBFNN) and fuzzy fc-nearest neighbor (fe-nn) techniques are used as base classifiers. Experiments are carried out on multi-temporal and multi-spectral images and the results are compared with the change detection techniques using MLP, EBFNN, fuzzy fe-nn, unsupervised modified self-organizing feature map and semi-supervised MLP. Results show that the proposed work has an edge over the other state-of-the-art techniques for change detection. © 2014 Elsevier Inc. All rights reserved. 1. Introduction Change detection is a process of detecting temporal effects of multi-temporal images [1,2]. This process is used for finding out changes in land covers over time by analyzing remotely sensed images of a geographical area captured at different time instants. Changes can occur due to natural hazards (e.g., disaster, earthquake), urban growth, deforestation, etc. [1-5]. Change detection is one of the most challenging tasks in the field of pattern recognition and machine learning [6]. Change detection can be viewed as an image segmentation problem, where two groups of pixels are to be formed, one for the changed class and the other for the unchanged one. Process of change detection can be broadly classified into two categories: supervised [7-9] and unsupervised [10-17]. Supervised techniques have certain advantages like they can explicitly recognize the kind of changes occurred and are robust to different atmospheric and light conditions of acquisition dates. Various methodologies exist in the literature to carry out supervised change detection, e.g., post classification [1,8,18], direct multi-date classification [1], kernel based methods [9], etc. Having several advantages, applicability of supervised methods in change detection is poor due to mandatory requirement of sufficient amount of ground truth information, collection of which is expensive, hard and monotonous too. On the contrary, in unsupervised approach [10-17], there is no need of * Corresponding author. Tel.: +91 33 2575 3110/3100: fax: +91 33 2578 3357. E-mail address: ash@isical.ac.in (A. Ghosh). http://dx.doi.Org/10.1016/j.ins.2014.01.037 0020-0255/© 2014 Elsevier Inc. All rights reserved. 36 M. Roy et al. /'Information Sciences 269 (2014) 35-47 additional information like ground truth. Due to depletion of labeled patterns, unsupervised techniques seem to be compulsory for change detection. Unsupervised change detection process can be of two types: context insensitive (spectral based) [1,12] and context sensitive (spatial based) [10,11,13-16,19]. In change detection, it may so happen that the category information of a few labeled patterns could be collected easily by experts [20]. However, if the number of these labeled patterns is small, then this information may not be sufficient for developing any supervised method. In such a scenario, knowledge of labeled patterns, though not much in amount, may be completely unutilized if unsupervised approach is carried out. Under this circumstance, semi-supervised approach [21,22] can be opted instead of unsupervised or supervised ones. Semi-supervision uses a small amount of labeled patterns with abundant unlabeled ones for learning, and integrates the merits of both supervised and unsupervised strategies to make full utilization of the collected patterns. Semi-supervision has been used successfully for improving the performance of clustering and classification [23-26] when sufficient amount of labeled data are not present. Semi-supervised approaches were explored for the use of multiple classifier system (MCS) [27-30]. Many applications in real life domains, i.e., change detection, medical image analysis, face recognition suffer from the problem of unavailability of labeled information. Therefore, semi-supervised MCS are required and have been studied in past [27-30] (see Table 1). As to the knowledge of the authors, no such applications exists in change detection domain using semi-supervised MCS. This motivated us to explore the capacity of ensemble classifier embedded with semi-supervision framework to improve the performance of change detection process when a few labeled patterns are available. In the proposed method, merits of both semi-supervised learning and ensemble learning are integrated in a single platform for detecting changes from remotely sensed images. The traditional algorithms [1,8,9,18,31] for change detection is mainly relaying on a single classifier in either supervised or semi-supervised framework. Unlike this, in the present work, a set of semi-supervised classifiers is used for change detection. In the present investigation, multilayer perceptron (MLP) [32], elliptical basis function neural network (EBFNN) [32-34] and fuzzy k-nearest neighbor techniques (k-nn) [35] are used as the base classifiers. To assess the effectiveness of the proposed method, experiments are carried out on two multi-temporal and multi-spectral images of Mexico area and Island of Sardinia. The present study concludes that the proposed semi-supervised MCS is better suited for the task of change detection than the other state-of-the-art techniques. The rest of the article is organized into four sections. Section 2 describes the proposed methodology. Description of the data sets used to carry out the investigation is provided in Section 3. In Section 4, implementation details and experimental results are discussed. Conclusion is drawn in Section 5. 2. The proposed algorithm In the present work, an ensemble of semi-supervised classifiers is proposed for change detection. The contribution of the present work is twofold: at first an algorithm is designed to integrate semi-supervised learning and ensemble learning in a single platform and then the proposed algorithm is used for the betterment of change detection process when a few labeled patterns are available. Unlike the other state-of-the-art techniques in the literature of semi-supervised multiple classifier system (i.e. co-training [29], tri-training [30], co-forest [44]), the proposed algorithm during the iterative learning process utilizes the agreement between all the networks in the ensemble for collecting the most confident labeled patterns. Table 1 Tabular representation for state-of-art techniques. Application area Technique and tools Water quality prediction Image recognition Sentiment classification Prediction of chaotic time series Cardiac arrhythmia Classification Hyperspectral image classification Classification of multi-annual remote sensing data Remotely sensed image segmentation Greedy ensemble selection family of algorithms for ensembles of regression models was used for searching the best subset of regressors by taking some local greedy decisions [36] A hybrid approach was used by combining type-2 fuzzy logic, modular neural networks and the Sugeno integral [37] A detailed comparative study for sentiment classification using ensemble techniques was carried out. Here, ensemble of classifiers were designed in three different level of combination, i.e., classification level, feature level and combination level [38] Ensemble of adaptive network based fuzzy inference system was used. Average and weighted average combination rules were applied for final decision [39] fuzzy K-Nearest Neighbors, Multi Layer Perceptron with Gradient Descent and momentum Backpropagation, and Multi Layer Perceptron with Scaled Conjugate Gradient Backpropagation were used as base classifiers. Finally, a Mamdani type fuzzy inference system was used as a combiner [40] Combination of two discriminative classifiers: sparse multinomial logistic regression and quadratic discriminant analysis were used. Initially, both the classifiers were trained using a few labeled samples. Then, the set of unlabeled samples was collected for next training step by combining the estimation obtained by both classifiers [41] Ensemble of classifiers (i.e., random forests) was trained on a multi-spectral image captured from an agricultural region. It was adapted to classify the image from another year in semi-supervised framework [42] Co-training strategy was used under variational Bayesian framework. There were two disjoint feature sets and each of them was used by a Gaussian mixture model. Co-training strategy in a bootstrap mode was utilized for estimation of parameters [43] M. Roy et al./Information Sciences 269 (2014) 35-47 37 As already mentioned, multilayer perceptron (MLP) [32], elliptical basis function neural network (EBFNN) [32-34] and fuzzy k-nearest neighbor techniques (k-nn) [35] are used as the base classifiers. Here, a few labeled patterns are required for semi-supervised learning. These labeled patterns can be collected in many ways. In the proposed method, for experimental purpose, an equal amount of labeled patterns, from both the classes (changed and unchanged), are picked up randomly from the ground truth. For the labeled patterns, the target values, support values and the membership values are assigned to either [1,0] or [0,1] depending on their class labels. Detailed description of the proposed change detection technique is presented in subsequent sections. 2.1. Generation of input pattern The difference image DI = {lmn, 1 ^ m ^ p, 1 ^ n ^ q} is produced by the Change vector analysis technique [1] from two co-registered and radiometrically corrected y-spectral band images Yi and Y2, each of size p x q, of the same geographical area captured at different times and T2. Here, gray value of the difference image DI at spatial position (m, n), denoted as lmn, is calculated as, lmn = (int)^lx{lU^)-L^2))\ (1) where tmn{Y\) and tmn(Y2) are the gray values of the pixels at the spatial position (m, n) in the ith band of the images Yi and Y2, respectively. From the difference image DI, the input pattern for a particular pixel position is generated by considering the gray value of the said pixel as well as those of its neighboring ones to exploit (spatial) contextual information from neighbors. In the present methodology, 2nd order neighborhood system [31 ] is used. Here, each input pattern consists of two features: (i) gray value of its own and (ii) average of the gray values of its neighboring pixels including its own value. They-dimensional input pattern of the (m,n)th pixel position of DI is denoted byXmn = [xmnl,xmn2,... ,xmny]. Here, a mapping algorithm is used to normalize the feature values of the input pattern in [0,1 ]. The ith feature value (i = 1,2,... ,y) of the y-dimensional input pattern, Xmn, is normalized as xmn,i — - —; j *-max *-min where, cmax and cmin, respectively, are the maximum and the minimum gray values of the DI. 2.2. Support value estimation using EBFNN Elliptical basis function neural network [34] uses the full covariance matrix to improve the performance of conventional radial basis function network with a few basis functions. The network consists of three layers: one input layer, one hidden layer and one output layer. The center of each basis function is initialized by the mean value of the labeled patterns from the corresponding class and the center are kept fixed for each training step. There are as many neurons in the hidden layer as the number of basis functions and no weighted connection is present between the neurons in the input layer and the hidden layer. In the present work, only two basis functions are used corresponding to both the changed and unchanged classes. The network is trained by updating the weights using the Least Mean Square algorithm (LMS) [32] to minimize the error between the target value (or soft class label) and the predicted output value of input patterns. Initially, the network is trained by a few labeled patterns only. After convergence of the training of the network, each of the unlabeled patterns is passed through the EBFNN (trained) to predict the output values for both the classes. Let, jir{m, n) = \pr-i (m, n),fir2 (m, n)\ be the two degrees of support, estimated by the EBFNN classifier, where fir-[(m,n) and fir2(m,n) are the support values of the (m,n)th pattern in the unchanged and changed classes, respectively. The output value of the (m, n)th unlabeled pattern for the ith class (here, i = 1 or 2), yrf(m, n) is assigned to jir^m, n). The unlabeled pattern is more likely to belong to the class for which its support value is higher. 2.3. Support value estimation using MLP MLP [31,32] has one input layer, one output layer and one or more hidden layers. MLP is trained by updating the weight using backpropagation algorithm [32] to minimize the error between the target value and the predicted output value of the patterns. Let a be the number of layers in the network and the jth neuron in the rth layer (r ^ 1) receives total input of the form vmrj(m,n) from (r-l)th layer. Then, the jth neuron in the rth layer (r 1) produces an output of the for ymrj(m, n) = (1+mp(_1t)m!-(m n)). Initially, the connection weight is updated by using the labeled patterns only. After convergence, each of the unlabeled patterns is tested by the trained MLP to predict the output values for both the changed and the unchanged classes. Let, fim(m, n) = [fim^ (m,n), jim2 (m, n)\ be the two degrees of support, estimated by the network, where jim^ (m, n) and jim2{m, n) are the support values of the (m, n)th pattern in the unchanged and changed classes, respectively. The output value of the (m, n)th unlabeled pattern for the ith class (here, i = 1 or 2),ymf~r,{m, n) is assigned to jimi{m,n). 38 M. Roy et al. /'Information Sciences 269 (2014) 35-47 2.4. Support value estimation using fuzzy k-nn Fuzzy k-nn classifier assigns membership values in both the classes for each of the unlabeled patterns. Here, these membership values are treated as the predicted support values. Let, jik{m,n) = [fik^(m,n), fik2(m,n)] be the two degrees of support of the (m, n)th unlabeled pattern, where fiki (m, n) and jik2 (m, n) are the membership values, which are estimated by the classifier for the unchanged and changed classes, respectively. For each unlabeled pattern, its k nearest labeled patterns are determined. To search for the k number of nearest neighbors, instead of using all the labeled patterns, we considered only those which lie within a neighborhood (window) around that unlabeled pattern. This reduces time requirement for searching. Let Wbe the set of k nearest labeled patterns of the (m, n)th unlabeled pattern. The membership value jiki{m, n) of the (m, n)th unlabeled pattern in the ith class (here, i = 1 or 2) is calculated as iikl{e,m/PC-^\\2m~l)) fiki(m,n) = ■ (V\\xmn-xef\\2^ XefeW where fm is a parameter, called fuzzifier, which determines the weighting factor of the distance to control the neighbor's contribution to the membership value. 2.5. Estimation of soft class label using 'maximum' combination rule First, the support values (or, output values) for each of the unlabeled patterns are obtained with the base classifiers using only a few labeled patterns (same labeled patterns for all the base classifiers). Here, the two degrees of support for a particular pattern (m, n) using the three base classifiers are organized into a matrix, called decision profile {DP(m, n)) [45]. The matrix DP(m,n) is represented as follows: DP(m, n) /d„(m,n) d12(m,n) d2x(m,n) d22(m,n) \d31(m,n) d32(m,n) / (iri{m,n) fimi(m,ri) \ fik^ (m, n) fir2{m,n) fim2{m,n) fik2{m,n) (4) Now, the soft class labels (or, target value) for each of the unlabeled patterns are calculated using 'maximum' combination rule [46] on these support values. The soft class label is assigned to the unlabeled patterns because we do not want to commit about the class label at this moment. The target value of the (m, n)th unlabeled pattern in the ith class (target^m, n)) is calculated as follows: targett(m,n) = max{dij(m,n),d2i(m,n),d3j(m,n)}. (5) Here, the estimated target value in the ith class for the (m, n)th unlabeled pattern is normalized as targetj(m,n) = targeti(m,n)/Y^=]targeti(m,n) to make the summed up target values in the two classes to 1. 2.6. Selection of unlabeled patterns for the next training step After this, most confident unlabeled patterns are selected for the next training step. Here a method has been suggested to obtain a set of most confident unlabeled patterns (denoted as, U). The (m, n)th unlabeled pattern is selected as the most confident pattern for the ith class (here, i = 1 or 2) and placed into the set U, if it satisfies all the following conditions: (i) the computed support value in the ith class is always larger than those which are obtained for other classes for the pattern in case of all the base classifiers; (ii) the support value in the ith class is always greater than af for the said pattern for all the base classifiers; and (iii) the absolute difference between the estimated target values of the pattern in both the classes is greater than [it. By conditions (i) and (ii), high agreement between all the base classifiers has been taken into consideration. Condition (iii) is used to avoid selection of confusing unlabeled patterns. To increase diversity of the base classifiers, three mutually exclusive sets (of the same size) for the unlabeled patterns (denoted as, UR, UM and UK) for (iterative) training of each of the base classifiers are generated by randomly selecting the most confident unlabeled patterns from the set of U. 2.7. Semi-automatic computation of the parameters af and p{ After estimating soft class label values for unlabeled patterns, the most confident unlabeled patterns for the ith class are collected using two important parameters, i.e. af and [it. In the present work, a technique has been suggested for (semi)-auto-matic estimation of these parameters. For computing both the parameters, at the onset, a set of unlabeled patterns are selected for which the estimated soft class label value is maximum in ith class. M. Roy et al./Information Sciences 269 (2014) 35-47 39 Then, for calculating the value of af, average, minimum and maximum (denoted by avgth minti and maxti, respectively) of the estimated soft class label in ith class for the selected unlabeled patterns are computed. Now, if the average value is nearer to the minimum value then it implies that the estimated soft class label value for most of the unlabeled patterns in the ith class are closer to the minimum value. In this case, if the alphat is assigned higher value than the average, then there is a high chance of selection of a few number of most-confident unlabeled patterns. In this situation, the value of alphat is fixed at an average value. On the other hand, to avoid too much selection, if the average value is nearer to maximum value, then alphat is kept fixed at (avgtt + maxti)/2 to set a higher value than the average. In case of obtaining the [it, the difference between the estimation soft class label value in both the classes is computed for each of the selected unlabeled patterns in ith class. Now pt is fixed at average of the computed difference value for ith class. 2.8. Iterative learning of ensemble classifiers until convergence After the selection of most confident unlabeled patterns, learning of EBFNN and MLP are then carried out again using the labeled patterns along with the unlabeled patterns (from the UR and UM, respectively). For the next training step, the centers, co-variance matrices and smoothness parameters of all the basis functions for EBFNN are also updated using the unlabeled patterns (in the set UR) along with a few labeled patterns. Now, the support values of all the unlabeled patterns in both the classes are re-estimated using EBENN, MLP and fuzzy k-nn classifier. In case of fuzzy k-nn classifier, the membership values in both the classes for all the unlabeled patterns are estimated again considering the labeled patterns as well as the unlabeled patterns from the set UK. Here, the membership values for the selected unlabeled patterns in both the classes are assigned using the estimated target values of the unlabeled patterns in the previous training step. In the present investigation, it has been noticed that the membership values of the selected unlabeled patterns participate to estimate the membership values of the other unlabeled patterns, as well as, at the same time its own membership values are also estimated using the other selected unlabeled patterns along with the labeled patterns. Therefore, the membership values for the selected unlabeled patterns are updated after calculating the membership values for all the unlabeled patterns. At the end of each training step, itr, the sum of square error (£itr) between the estimated target values at itrth and (itr - l)th training steps is calculated as, Z<*= E ^(tfma^(m,n)-tfina^-1\m,n)y-, (6) m=l,n=l i=l where the estimated target value of the (m,n)th pattern in ith class at the itrth training step, targetj(m,n) is stored in tfinat«(m,n). Iterative learning of the ensemble classifier, re-estimation of the target values of the unlabeled patterns, updating the value of