计算机学报

	Chinese Journal of Computers Full Text
Title	Loop Kernel Pipelining Mapping onto Coarse-Grained Reconfigurable Architectures
Authors	WANG Da-Wei DOU Yong LI Si-Kun
Address	(College of Computer Science, National University of Defence Technology, Changsha 410073)
Year	2009
Issue	No.6(1089—1099)
Abstract & Background	Abstract Coarse-grained reconfigurable architectures provide flexible and efficient solution for data-intensive applications. Loop kernels of these applications always consume much execution time of the whole program. However, mapping loop kernels onto CGRAs is still hard for meeting performance/cost constraints. This paper proposes a novel approach for mapping loop kernels onto CGRAs with loop self-pipelining to solve the existing problems. The problem formulation is shown first. Then the resource sharing and pipelining of lspCGRAs are presented, together with its template standard. A field specific application driven mapping flow is described. Besides, a loop kernel pipelining mapping algorithm is proposed. The conclusions show that the proposed approach gains less resource utilization by 16.3% times and more throughputs by 169.1% times than previous advanced SPKM. Keywords reconfigurable computing; CGRA; data-intensive application; loop self-pipelining Background The problem this paper aims to solve belongs to high level design of reconfigurable SoC. Currently, reconfigurable computing has the efficiency of custom computation and flexibility of common computation, so it can accelerate most kinds of applications. CGRA can provide more speedup and less energy cost. However, although CGRA can provide high performance and flexibility, it still remains a hard problem to meeting the severe constraints on performance and cost while developing some field specific applications. It needs high throughput and parallel to handle DIA, while it is inclined to errors and time-consuming to map DIA onto CGRA manually. Besides, there are many loop kernels which consume much system resources. It’s necessary to speed up and optimize these critical loop kernels. Due to the insufficient work of mapping loop kernels onto CGRAs, this paper proposes a novel loop kernel pipelining mapping onto coarse-grained reconfigurable architectures. The authors developed a loop self-pipelining coarse-grained reconfigurable architecture. The lspCGRA uses fix instruction multiple dataflow, which can support loop self-pipelining and gain high computation throughput. Besides, the authors show the details about the share resource and pipelining features of lspCGRA templates. A novel loop kernel pipelining mapping approach is presented together. This paper is partly supported by the National Natural Science Foundation of China under project “Research on fixed instruction and multi-data stream computation model based coarse grained reconfigurable array architecture” with grant No.90307001 and “Embedded Stream Media Process SOC Design Platform and Design Technology” with grant No.90707003. With the support of above project, this research groups have developed 32bit RISC microprocessor and coarse grained reconfigurable architecture that accelerates applications through loop self-pipelining technique. This paper contributes on the field of high level design to support the whole SoC design flow and optimization. The approach of loop kernel pipelining mapping is proposed with its automatic design flow. It shows less resource occupation by 16.3% and higher throughput by 169.1% than advanced SPKM.