计算机学报

	Chinese Journal of Computers Full Text
Title	Merging-Branches Impact on Decision Tree Induction
Authors	WANG Xi-Zhao YANG Chen-Xiao
Address	(College of Mathematics and Computer Science, Hebei University, Baoding, Hebei 071002)
Year	2007
Issue	No.8(1251—1258)
Abstract & Background	Abstract Since inductive bias exists during the process of selection of expanded attributes, attributes with more values are usually preferred to be selected. It consequently results in a decision tree with large scale and with poor generalization capability. Therefore it is necessary to simplify the decision tree including pre-pruning and post-pruning. This paper focuses on the pre-pruning. A new strategy of pre-pruning is given, that is, at the process of tree growth, two branches (or more) from the same node are merged into one branch and then the tree growth process continues. This paper investigates the impact of merging branches on decision tree induction. The main concerns are whether the comprehensibility, the size and the generalization accuracy of a decision tree can be improved if an appropriate merging strategy is selected and applied. Based on information gain, this paper analyzes the complexity of a decision tree before and after merging branches, and designs two algorithms of merging branches, SSID (based on the proportion of positive samples) and MCID (based on the most gain compensation). Experimental results show that with respect to the comprehensibility and the generalization capability, either SSID or MCID is significantly superior to the frequently used See5 system (the improved version of C4.5). keywords decision tree induction; induction bias; pruning; merging branches; information gain; gain compensation background The paper investigates an old but very important issue in the filed of Machine Learning. The investigated issue, which belongs to the area of inductive learning, is how to simplify a decision tree by merging two or more branches such that the rules extracted from the tree has better generalization capability. It is a very difficult problem and so far no more results on the branch-merging have been found from International Journals. This paper gives an analysis of the relationship between merging two branches and the entropy of related nodes and, to some extent, it solves the dependence of generalization capability on the tree size (i.e., the number of leaves). The investigated issue is coming from a project of National Natural Science Foundation of China (60473045，60573069), which aims to improve the generalization capability of weighted fuzzy production rules by adjusting weight values. The research of the group in the area of inductive learning has lasted over 10 years. So far the group has published more than 100 papers and 3 books related to machine learning and developed 3 practical Expert Systems.