¡¡Chinese Journal of Computers   Full Text
  TitleAn Analysis of Diversity Measures in Clustering Ensembles
  AuthorsLUO Hui-Lan1),2) KONG Fan-Sheng1) LI Yi-Xiao1)
  Address1)(Institute of Artificial Intelligence, Zhejiang University, Hangzhou 310027)
2)(School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, Jiangxi 341000)
  Year2007
  IssueNo.8(1315¡ª1324)
  Abstract &
  Background
Abstract The diversity of an ensemble is known to be an important factor in determining its performance. There are a number of ways to quantify diversity in ensembles of classifiers, while little research has been done in clustering ensembles. This paper compares seven diversity measures of clustering ensembles with regard to their possible use in ensemble design. Five experiments have been designed to examine the relationships between the accuracy of the clustering ensembles and the measures of diversity under conditions of difference ensemble methods, different ensemble size and different data distributions respectively. Experiments show the relationships between these diversity measures and ensemble performances are not monotonous. However, when constructing ensembles with moderate ensemble size by suitable clustering algorithms for a given data set with uniform cluster distribution, the correlation coefficients between the diversity measures and ensemble performances are relatively high. Finally, the authors give some useful suggestions about the usefulness of diversity measures in building clustering ensembles.

keywords ensemble learning; clustering ensemble; diversity; measure

background Data clustering is a difficult inverse problem, and as such is ill-posed when prior information about the underlying data distributions is not well defined. Numerous clustering algorithms are capable of producing different partitions of the same data that capture various distinct aspects of the data. The exploratory nature of clustering tasks demands efficient methods that would benefit from combining the strengths of many individual clustering algorithms. This is the focus of research on clustering ensembles, seeking a combination of multiple partitions that provides improved overall clustering of the given data.
One challenging issue of the problem of combining multiple clusterings is the choice of the generation method of the component partitions for the ensemble. Diversity among the member clusterings is deemed to be important when constructing a clustering ensemble. Numerous algorithms have been proposed to construct a good clustering ensemble by seeking the diversity among them. However, there is no generally accepted definition of diversity, and measuring the diversity explicitly is very difficult. While a number of ways are known to quantify diversity in ensembles of classifiers, little research has been done in clustering ensembles.
This paper focuses on the research of the diversity measures of clustering ensembles.