¡¡Chinese Journal of Computers   Full Text
  TitleChip Multiprocessor Execution Model for Soft Error Tolerance
  AuthorsGONG Rui DAI Kui WANG Zhi-Ying
  Address(School of Computer, National University of Defense Technology, Changsha 410073)
  Year2008
  IssueNo.11(2047¡ª2059)
  Abstract &
  Background
Abstract With the development of integrated circuit, microprocessors are more and more susceptible to soft errors. Two chip multiprocessor execution models for soft error tolerance are proposed in this paper. Dual Core Redundancy (DCR) executes two redundant threads of a given program on separate cores with certain slack. The store instructions can not be committed until they are compared. The redundant cores are enhanced with hardware implemented context saving and recovery, so that the soft errors can be recovered by re-execution from the last context saving point. The context saving point chosen in this paper can efficiently hide the saving latency. The load coherence between original and re-executions is guaranteed by special mechanism to avoid undesirable fault. Triple Core Redundancy (TCR) applies triple modular redundancy on core level to exploit the core resources for soft error masking. Three redundant threads are executed in TCR on separate cores. Once detecting soft errors, TCR can be reconfigured to mask the wrong results of corrupted core. The experimental results demonstrate that, compared to traditional soft error recovery execution model CRTR, DCR and TCR can reduce 57.5% and 54.2% inter-core communication bandwidth demand respectively. The performance loss of DCR caused by re-execution is 5.2%, while reconfiguration on TCR brings 1.3% performance overheads. The fault injection experiment shows that DCR can recover 99.69% soft errors, while TCR can mask all the SEU (Single Event Upset) faults.
Keywords chip multiprocessor; execution model; soft error recovery; soft error masking; dual core redundancy; triple core redundancy