计算机学报

	Chinese Journal of Computers Full Text
Title	Multi-Modal Multi-View Video Coding Based on Correlation Analysis
Authors	JIANG Gang-Yi1),2) ZHANG Yun1),2) YU Mei1),3)
Address	1)(Faculty of Information Science and Engineering, Ningbo University, Ningbo, Zhejiang 315211) 2)(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080) 3)(National Key Laboratory of Machine Perception, Peking University, Beijing 100871)
Year	2007
Issue	No.12(2205—2211)
Abstract & Background	Abstract Multi-view video coding(MVC) should support view random access, temporal random access, spatial random access, low coding delay, view scalability, as well as high compression efficiency and low complexity. The correlation characteristics of the multi-view video signal is variant along the time axis, and it is influenced by video contents, illumination change, speed of moving objects and cameras, view interval, sampling frequency, and other factors. A multi-modal multi-view video coding(MMVC) scheme based on correlation analysis is proposed in this paper, which differs from the conventional MVC schemes with single prediction mode. Several MVC prediction structures with excellent performances have been properly integrated into the scheme, and are dynamically selected to encode current multi-view video according to correlation characteristics of the video. Experimental results show that the proposed MMVC scheme can reduce computational complexity and improve random access performance while maintaining high coding efficiency. keywords multi-view video coding(MVC); multi-mode; random access; computational complexity; correlation analysis; mode update background Multi-view video is a kind of new multimedia which provides stereoscopic impression and interactive function, and multi-view video coding(MVC) is one of the key techniques of 3D Audio-Visual(3DAV). Multi-view video signals show different temporal and interview correlations due to the influence of camera interval, illumination, motion of cameras and objects, and so on. In MVC, motion estimation and motion compensation is employed to eliminate temporal correlation within a single view video, while disparity estimation and compensation is adopted to reduce inter-view correlation among neighboring views. Various view-temporal predication structures are proposed for MVC based on the multi-reference frame prediction technique. Even though different temporal and spatial reference frames are selected by these predication modes, the prediction structure is fixed for each prediction mode and it does not adapt to variations of the correlation characteristics of the multi-view video signal. In order to achieve better rate-distortion performance, complex prediction structure is usually needed for prediction mode with a fixed structure. In other words, many reference frames including temporal and spatial frames are used without considering the correlation variation of the multi-view video sequence. This may result in a huge increase of computational complexity, and the ability of random access and partial decoding are also decreased. On the other hand, if a simple prediction structure is used, the correlation among the multi-view video signal may not be exploited sufficiently by the encoder so that high rate-distortion performance is unable to be achieved. Moreover, different from traditional video coding schemes, MVC should support view random access, temporal random access, spatial random access, low coding delay, view scalability, as well as high compression efficiency and low complexity. However, some of these requirements are conflicting to one another, which means that a prediction mode with fixed structure is not flexible to meet different requirements for multi-view video codec. This work is supported by the Natural Science Foundation of China(Nos.60472100,60672073), the Program for New Century Excellent Talents in Univesity(NCET-06-0537), Natural Science Foundation of Zhejiang Province (grant No.Y105577), and the Key Project of Chinese Ministry of Education (grant No.206059). The proposed multi-modal multi-view video coding scheme is based on analysis of temporal and interview correlations. Different prediction modes are designed for multi-view video signals with different correlation characteristics. Suitable predication mode is dynamically selected from the designed modes according to the correlation characteristic of current multi-view video signals. The goal of this project is to propose the framework of multi-modal multi-view video coding scheme based on analysis of temporal and interview correlations, the corresponding analysis method of temporal and interview correlations, the design of multiple prediction modes, mode selection and updating, etc, so as to obtain high smooth coding efficiency and optimize the integrative efficiency of the multi-view video codec.