¡¡Chinese Journal of Computers   Full Text
  TitleOne-Cluster Clustering Based Data Description
  AuthorsCHEN Bin1),2) FENG Ai-Min1) CHEN Song-Can1) LI Bin2),3)
  Address1)(College of Information Science and Technology, Nanjing University of Aeronautics & Astronautics, Nanjing 210016)
2)(Information Engineering College, Yangzhou University, Yangzhou, Jiangsu 225009)
3)(State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093)
  Year2007
  IssueNo.8(1325¡ª1332)
  Abstract &
  Background
Abstract In this paper, a one-cluster clustering based data description method (OCCDD) is proposed for one-class classification. It operates as follows: when training, one-cluster Possibilistic C-Means (P1M) algorithm is firstly performed on the training target samples, then the memberships to the target class of all samples are obtained, a threshold of memberships is set to form the data description. When testing, the memberships of the samples for testing are computed, the samples with less membership than the threshold are thought as the outliers, otherwise as the target objects. The proposed method has the same parameter configuration as the prevalent methods: Support Vector Data Description (SVDD) and Parzen-window method, and leads to an alternative one-class classifier. It is worthy to point out that: although as a special example of traditional PCM algorithm, P1M can obtain a globally optimal solution while traditional PCM generally could not. Moreover, the globally optimal property is of great importance for the practical implementation.

keywords data description; clustering; one-cluster; PCM; one-class classification

background The project is partially supported by the National Science Foundations of China (grant No.60603029), the Science Foundations of Jiangsu Province (grant Nos.BK2004052, BK2007074) and High School Science Foundations of Jiangsu Province (grant No.06KJB520132). The project is to design a novel classifier for One-Class Classification and apply it to many fields, that is, handwritten digit recognition, intrusion detection and fault detection, etc. To meet such demand, this paper employs the One-Cluster Possibilistic C-Means algorithm for clustering and formulates a data description for One-Class Classification, thus proposes a new One-Class Classifier based on clustering. The proposed method has similar parameter configuration to Support Vector Data Description and Parzen windows method, and has comparable performance in some UCI Benchmarks datasets and USPS handwritten digits, but it is very efficient in training. Moreover, it provides a new methodology of tailoring the traditional clustering algorithm to new One-Class problems, thus widens the range for designing One-Class Classifier. The project is based on the previous research on intrusion detection with Self-Organizing Map, and Grey theory. Now a prototype system for intrusion detection has been developed by their graduated members. However, the generalization performance of the intrusion detector is still weak, and the robustness is still not strong enough. Another object of this project is to design a one-class classifier with good generalization performance and robustness to outliers. The proposed method in this paper is a good trial to realize these goals. It has shown comparative performance to SVDD and Parzen windows methods, and relatively strong robustness to outliers due to the intrinsic robustness of Possibilistic 1-Means algorithm.