| ¡¡ | Chinese Journal of Computers Full Text |
| Title | An Approximate Markov Blanket Feature Selection Algorithm |
| Authors | CUI Zi-Feng1) XU Bao-Wen1) ZHANG Wei-Feng2) XU Jun-Ling1) |
| Address | 1)(School of Computer Science & Engineering, Southeast University, Nanjing 211189) 2)(School of Computer, Nanjing University of Posts & Telecommunication, Nanjing 210003) |
| Year | 2007 |
| Issue | No.12(2074¡ª2081) |
| Abstract & Background | Abstract Feature selection(FS) can effectively improve the speed and accuracy of classification. The traditional FS approaches usually score a single feature, do not evaluate feature subset. Based on the research in feature relevance, features can be further divided into four categories: Strong relevance, weak relevance, irrelevance and redundancy. The paper proposes a forward selection algorithm¡ªAn approximate Markov Blanket(MB) feature selection by theory of MB and Chi-Square test, which obtain an approximate optimal feature subset. Experiments on the datasets suggest that, compared with original feature set, the feature subset obtained by the proposed approach is much less than original feature set and performance on actual classification is better than or as good as that by original feature set. Meanwhile, when used in high dimension feature space such as text categorization, compared with other traditional feature selection approaches: OCFS, DF, CHI, IG, the performance obtained by the proposed method is obviously superior to that of others on 20 Newsgroup dataset. keywords feature selection; relevance; Markov Blanket; CHI-Square test; categorization background The work in the paper is a part of the project "Feature Extraction in Email and Spam Email Recognition based on Agent", supported by the National Natural Science Foundation of China under grant of No.60503020. Feature selection focus on finding as small feature subspace as possible to substitute for original space. Many induction learning methods do not have any loss on performance in subspace; meanwhile, feature selection can make those learning methods more effective and efficient, and save store space. Now feature selection is facing challenge in high dimension space, such as text categorization in information retrieval. Many current popular methods are good at select single feature which has better capability to separate class than other features. However, most of the methods just consider the relation between feature and class, leaving behind the relation among features, which will result in redundancy. For this problem, the authors proposed a new approach based on Markov Blanket theory to remove the redundancy. |