| ¡¡ | Chinese Journal of Computers Full Text |
| Title | A Chinese Web Page Classifier Based on Support Vector Machine and Unsupervised Clustering |
| Authors | LI Xiao-Li LIU Ji-Min SHI Zhong-Zhi |
| Address | (Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080) |
| Year | 2001 |
| Issue | No.1(62-68) |
| Abstract & Background | This paper presents a new algorithm that combines Support Vector Machine (SVM) and unsupervised clustering. After analyzing the characteristics of web pages, it proposes a new vector representation of web pages and applies it to web page classification. Given a training set, the algorithm clusters positive and negative examples respectively by the unsupervised clustering algorithm (UC), which will produce a number of positive and negative centers. Then, it selects only some of the examples to input to SVM according to ISUC algorithm. At the end, it constructs a classifier through SVM learning. Any text can be classified by comparing the distance of clustering centers or by SVM. If the text nears one cluster center of a category and far away from all the cluster centers of other categories, UC can classify it rightly with high possibility, otherwise SVM is employed to decide the category it belongs. The algorithm utilizes the virtues of SVM and unsupervised clustering. The experiment shows that it not only improves training efficiency, but also has good precision.
keywords support vector machine, clustering, text classification |