¡¡Chinese Journal of Computers   Full Text
  TitleLoop Unrolling and Data-Path Generation of Sliding-Window Operation
  AuthorsDONG Ya-Zhuo1) LIU Ming-Zheng2) XIA Fei1) DOU Yong1)
  Address1)(School of Computer, National University of Defense Technology, Changsha 410073)
2)(Logistics Science Research Institute, Beijing 100071)
  Year2008
  IssueNo.6(989¡ª997)
  Abstract &
  Background
Abstract Window operations which are computationally intensive and data intensive are frequently used in image compression, pattern recognition and digital signal processing. Reconfigurable hardware boards provide a convenient and flexible solution to speed up these algorithms. Based on a memory and data schedule method as well as the method of data-path generation, this paper studies the effect of loop unrolling on the area, clock speed and throughput for sliding window operations. The results indicate that due to the unique design of the compilation framework, inner loop unrolling makes the controllers become more complicated than outer loop unrolling and increase more requirements of areas at the same time. However, outer loop unrolling demands more memory elements to keep the reused data. The clock speed begins to decrease when the number of RAM modules extends to a certain size, and the throughput increase in different degrees for different operations.
Keywords sliding-window operations; high level synthesis; loop unrolling; data-path; memory architecture
Background This paper is mainly supported by the National Natural Science Foundation of China(60633050). This project aims to deal with the urgent science computation requirements of our country, research the pivotal technologies of high-efficiency parallel computer architecture. Based on the analysis of some typical applications, a series of studies are carried through, including memory architecture, hardware acceleration and high level synthesis etc. More than 20 papers of the team are published in international conferences and journals including FPGA2005, ASP-DAC2007 and ASAP2007 etc. The authors¡¯ work is used in high level synthesis. High Level Synthesis(HLS) tools provide a bridge between the algorithm written in a high level language(Matlab, C, C++, etc) and a lower level Hardware Description Language(HDL). They concentrates on one class of applications called window operations. This kind of applications is widely used in signal, image and video processing and requires much computation and data manipulation. Reconfigurable hardware boards provide a convenient and flexible solution to speed up these algorithms. High level synthesis is increasingly recognized to be the key to reducing the complexity of hardware design. However, the memory structure has become the performance bottleneck. This paper presents a parameterized memory architecture for high level synthesis to automatically generate the hardware frames for all window processing applications, gives a design for three levels memory structure to realize inner-loop and outer-loop data reuse completely, and at the same time uses shifted registers to make hardware design simpler. Based on the memory and data schedule method as well as the method of data-path generation, this paper studies the effect of loop unrolling on the area, clock speed and throughput for sliding window operations.