¡¡Chinese Journal of Computers   Full Text
  TitleA Chinese Web Page Classifier Based on Support Vector Machine and Unsupervised Clustering
  AuthorsLI Xiao-Li LIU Ji-Min SHI Zhong-Zhi
  Address(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080)
  Year2001
  IssueNo.1(62-68)
  Abstract &
  Background
This paper presents a new algorithm that combines Support Vector Machine (SVM) and unsupervised clustering. After analyzing the characteristics of web pages, it proposes a new vector representation of web pages and applies it to web page classification. Given a training set, the algorithm clusters positive and negative examples respectively by the unsupervised clustering algorithm (UC), which will produce a number of positive and negative centers. Then, it selects only some of the examples to input to SVM according to ISUC algorithm. At the end, it constructs a classifier through SVM learning. Any text can be classified by comparing the distance of clustering centers or by SVM. If the text nears one cluster center of a category and far away from all the cluster centers of other categories, UC can classify it rightly with high possibility, otherwise SVM is employed to decide the category it belongs. The algorithm utilizes the virtues of SVM and unsupervised clustering. The experiment shows that it not only improves training efficiency, but also has good precision.
keywords support vector machine, clustering, text classification