¡¡Chinese Journal of Computers   Full Text
  TitleSoftware/Hardware Co-Design for 1-D FFT Optimization on Many-Core Architecture
  AuthorsZHOU Yong-Bin ZHANG Jun-Chao ZHANG Shuai ZHANG Hao
  Address(Key Laboratory of Computer System and Architecture, Institute of Computing Technology,Chinese Academy of Sciences, Beijing 100190)
  Year2008
  IssueNo.11(2005¡ª2014)
  Abstract &
  Background
Abstract As the increasing demand of high performance computing, many-core architecture becomes to the trend of future processor architecture. Fast Fourier Transform (FFT), both computing intensive and bandwidth intensive, is one of the most important applications of the high performance computing. For both software and hardware developers, it is a challenge to implement high efficiency and scalable FFT algorithm on many-core processor. Based on Godson-T processor, the authors developed an optimized implementation of 1-D FFT through implicitly matrix transpose hidden as well as overlapping computation and communication. The performance of optimized 1-D FFT algorithm achieves more than 3 times better and reduces almost 1/3 L2 Cache consumption. After the analysis of on-chip network congestion problem, the authors suggest that increasing the access bandwidth of L2 cache can alleviate the negative impact on on-chip network and L2 Cache which is brought by burst L2 Cache access. As a result, the performance and scalability of memory bandwidth limited applications, such as FFT, can be further improved.
Keywords many-core; Godson-T; fast Fourier transform; computation/communication overlapping(Key Laboratory of Computer System and Architecture, Institute of Computing Technology,Chinese Academy of Sciences, Beijing 100190)