计算机学报

	Chinese Journal of Computers Full Text
Title	Distributed Neural Network Learning Algorithm Based on Hebb Rule
Authors	TIAN Da-Xin1),2) LIU Yan-Heng1),2) LI Bin3) WU Jing1),2)
Address	1)(College of Computer Science and Technology, Jilin University, Changchun 130012) 2)(Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012) 3)(College of Mathematics, Jilin University, Changchun 130012)
Year	2007
Issue	No.8(1379—1388)
Abstract & Background	Abstract In the fields of knowledge discovery and data mining the amount of data available for building classifiers or regression models is growing very fast. Therefore, there is a great need for scaling up inductive learning algorithms that are capable of handling very-large datasets and, simultaneously, being computationally efficient and scalable. In this paper a distributed neural network based on Hebb rule is presented to improve the speed and scalability of inductive learning. The speed is improved by doing the algorithm on disjoint subsets instead of the entire dataset. To avoid the accuracy being degraded as compared to running a single algorithm with the entire data, a growing and pruning policy is adopted, which is based on the analysis of completeness and risk bounds of competitive Hebb learning. In the experiments, the accuracy of the algorithm is tested on a small benchmark (circle-in-the-square) and compared with SVM, ARTMAP and BP neural network. The performance on the large dataset (USCensus1990Data) is evaluated on the data from UCI repository. keywords scaling up; data partition; Hebb rule; distributed learning; competitive learning background The knowledge discovery and data mining community has challenged itself to develop inductive learning algorithms that scale up to large data sets. Many diverse techniques have been proposed and implemented for scaling up inductive algorithms. The three main approaches are: design a fast algorithm, partition the data and use a relational representation. Two important characters of neural network are: distributed, knowledge representation is distributed across many processing units; parallel, computations take place in parallel across these distributed representations. Although its knowledge representation is distributed, its learning algorithm is concentrated, since it requires all the training data to be submitted to the network one by one until the network is stable after one or more epochs. Thus for many realistic problems and databases such as astronomy data, biomedical data, bioinformatics data, etc. it is clearly untenable. This paper presents a distributed neural network learning algorithm. This research is supported by the National Natural Science Foundation of China under grant No.60573128 and the National Research Foundation for the Doctoral Program of Higher Education of China No.20060183043.