| ¡¡ | Chinese Journal of Computers Full Text |
| Title | An Efficient Genetic Algorithm for System-Level Diagnosis |
| Authors | DENG Wei YANG Xiao-Fan WU Zhong-Fu |
| Address | (College of Computer Science, Chongqing University, Chongqing 400044) |
| Year | 2007 |
| Issue | No.7(1115¡ª1124) |
| Abstract & Background | Abstract Elhadef and Ayeb devised a genetic algorithm for the system-level diagnosis of multicomputers, where the fitness function is calculated by comparing the given syndrome with the syndrome randomly produced by the current guess fault set. One demerit of this algorithm is that this fitness function takes only one syndrome from many possible candidates, leading to a high probability of incorrect diagnosis. In the present paper, the authors describe a set of equations that govern the statuses of the units in a system. Based on this, the authors present a new genetic algorithm for the fault diagnosis of diagnosable systems by designing a novel fitness function. Theoretical analysis and simulation result both show that the algorithm is remarkably superior to the Elhadef-Ayeb¡¯s diagnosis algorithm in terms of the number of iterations. The initial population production process proposed by Elhadef and Ayeb is also justified. keywords system-level diagnosis; genetic algorithms; diagnosability; PMC model background The system-level diagnosis of multicomputers, which is intended for identifying faulty processors by conducting tests on processors and interpreting the test outcomes, is an important topic of research from fault-tolerant computing. The central task of system-level diagnosis is to develop efficient diagnosis algorithms. It is known that some typical fault identification problems are NP-hard. As a result, various heuristic diagnosis algorithms have been proposed. This paper presents a new diagnosis algorithm by elaborately designing the genetic heuristics. Both theoretical analysis and simulation result show that the algorithm is superior to a recently proposed diagnosis algorithm. The work of this paper is a constituent part of a large project entitled "Parallel Computing and Fault Tolerance", which is supported by Program of Educational Ministry of China for New Century Excellent Talents (Grant No.NCET-05-0759), Doctorate Foundation of Educational Ministry of China(20050611001), and Natural Science Foundation of Chongqing CSTC (2006BB2231, 2005BB2191). The research group has published more than 20 academic papers in distinguished international journals towards this direction of research. |