计算机学报

	Chinese Journal of Computers Full Text
Title	Quick Attribute Reduction Algorithm with Hash
Authors	LIU Yong XIONG Rong CHU Jian
Address	(State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou 310027) (Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou 310027)
Year	2009
Issue	No.8(1493—1499)
Abstract & Background	Abstract This paper presents the concept and property of inconsistency from the inconsistent condition of decision system. It also presents the relationship between the positive region and inconsistent records. A hash based algorithm calculating positive region has been presented and its temporal complexity decreases to O(｜U｜). Based on the characteristics of inconsistency, a new attribute measure has been introduced, then a corresponding reduction algorithm with twice-hash is presented, and its temporal complexity is O(｜C｜2｜U/C｜), this paper also proves this algorithm is complete. The efficiency of the algorithms is proved by the experiments. Keywords rough set; positive region; reduction; Hash; inconsistency Background The attribute reduction is one of the most important concepts in rough set theory. With the attribute reduction, the irrelevant attributes can be removed from the original attribute set and the remained attributes can keep the discriminability as same as the original attribute set and also keep the internal semantic correlations among attributes. The attribute reduction has been widely used in data mining, which normally needs to deal with very large data set, so a fast and easy attribute reduction algorithm is especially important when implementing the reduction concept in data mining applications with huge data sets. The mainly reduction algorithms fall in two categories: the discernibility matrix based approaches and positive region based approaches. The previous approaches are almost not able to implement in large data set condition due to the huge spatial requirement; and the latter approaches need to calculate the positive region iteratively. The procedure of positive region is a calculation intensive and tactical task. With the increasing of the attribute dimension and the number of instances, the processing time for positive region has grown tremendously. So it is significant to present a fast positive region method in attribute reduction algorithm. The approach in this paper belongs to the latter. This paper addresses the problem to design fast and easy attribute reduction algorithm which is suit for the large data set. Firstly, we propose the concept of inconsistency and also present the properties of inconsistency, based on the inconsistency, we present a hash based quick positive region calculation method with linear temporal complexity. Based on the characteristics of inconsistency, a new attribute measure for attribute reduction has been introduced, then a corresponding reduction algorithm with twice-hash is presented, and its temporal complexity is O(｜C｜2｜U/C｜). Finally, comparable experiments between our algorithm and other two fast reduction algorithms are carried out under UCI data set. The experimental results show the efficiency of our novel, fast hash based attribute reduction algorithm. This paper is supported by National Nature Science Foundation of China (grant Nos.60803053, 60675049), Nature Science Foundation of Zhejiang Province (grant No.Y106414), National High Technology Research and Develop Program (863 Program) of China (grant No.2008AA04Z209), Excellent Postdoctoral Science Foundation of China (grant No.20081459), and Defense Advanced Research Foundation of the General Armaments Department of the PLA (grant No.9140A06050609JW0402).