| ¡¡ | Chinese Journal of Computers Full Text |
| Title | An Optimized Strategy for Collective Communication in Data Parallelism |
| Authors | WANG Jue HU Chang-Jun ZHANG Ji-Lin LI Jian-Jiang |
| Address | (School of Information Engineering, University of Science and Technology Beijing, Beijing 100083) |
| Year | 2008 |
| Issue | No.2(318¡ª328) |
| Abstract & Background | Abstract Collective communication significantly influences the performance of data parallel applications. It is required often in two situations: One is array redistribution from phase to phase; another is data remapping after loop partition. Nevertheless, an important factor that influences the efficiency of collective communication is often neglected: When there is node contention and difference among message lengths during one particular communication step, a larger communication idle time may occur. In previous works, researchers can¡¯t completely avoid communication conflict and focus on some special cases. This paper is devoted to develop an universal and efficient communication scheduling strategy (CSS) concerning with the situation where array distributions are Block_Cyclic(k). Base on the proof for the recursive theorem of communication table elements, this strategy generates a communication scheduling table so that each column is a permutation of receiving node number in each communication step. And the messages with the close size are put into a communication step as near as possible. This indicates that the strategy not only avoids inter-processor contention, but it also minimizes real communication cost in each communication step. Finally, experimental results show that CSS has better performance than the general method and the implementation of MPI_Alltoallv. keywords parallel compiling; data parallelism; collective communication; array redistribution; distributed memory background The work in this paper belongs to communication optimization of parallel compilation. Recently, many researchers have concentrated on the problem of collective communication. Some studies have pay attention to efficiently generate communication message using compilation technique, while some have addressed the communication optimization. Many researches on communication scheduling, concentrate mainly on some special cases or dynamic redistribution from phase to phase. However, little researchers concern with communication optimization of array remapping by using array distribution pattern and assignment statements in loop. The authors pay attention to messages scheduling by making use of the compilation information provided by array subscripts, array distribution pattern and array access period from parallel application on single-layer switch network. Comparing the previous researches, the main contributions of the work are as follow: (1)Proposing the approach to schedule messages for data remapping after loop partition. (2)It can be applied to data redistribution from phase to phase. (3)Minimizing the overhead of communication schedule generation by the period theory of array access. It can be completely generated at compile-time phase, so there is no overhead at run-time phase. This work is supported by the Hi-Tech Research and Development Program (863) of China under grant No.2006AA01Z105, the National Natural Science Foundation of China under grant No.60373008 and the Ministry of Education of the People¡¯s Republic of China Major Foundation under grant No.106019. The objective of the Hi-Tech Research and Development Program (863) is to improve the productivity of parallel programmer using parallel compiling techniques. The National Natural Science Foundation of China pays attentions to OpenMP extension on clusters. The Ministry of Education of the People¡¯s Republic of China Major Foundation focuses on a difficult problem in parallel computing, i.e., out-of-core irregular problem. In communication optimization area, we have proposed a communication generation approach, which integrates the benefits from the algebra method and the integer lattice method, to derive an algebraic solution of communication set enumeration. The work in this paper focuses on improving efficiency of collective communication using compiling techniques. The authors will extend this technique to their prototype compiler for irregular applications. |