计算机学报

	Chinese Journal of Computers Full Text
Title	Cross-Media Retrieval Method Based on Content Correlations
Authors	ZHANG Hong1),2) WU Fei2) ZHUANG Yue-Ting2) CHEN Jian-Xun1)
Address	1)(College of Computer Science & Technology, Wuhan University of Science & Technology, Wuhan 430081) 2)(Institute of Artificial Intelligence, Zhejiang University, Hangzhou 310027)
Year	2008
Issue	No.5(820—826)
Abstract & Background	Abstract Most traditional content-based multimedia retrieval methods are designed for multimedia data of single modality. Such methods include image retrieval, audio retrieval, video retrieval, etc. This paper proposes a novel cross-media retrieval approach, which can process multimedia data of different modalities and measure cross-media similarity, such as image-audio similarity. First statistical method is used to learn canonical correlations between low-level feature spaces of different modalities. Then, sub-space mapping is designed to build an isomorphic subspace and solve the heterogeneity problem between different low-level feature vectors. This subspace contains media objects of different modalities, and each media object is represented with isomorphic vector. Since canonical correlations among multimedia objects are furthest preserved during the mapping process, cross-media similarity can be estimated with defined distance metric. Furthermore, relevance feedback provided by users is utilized to learn prior knowledge and refine multimedia topology in the subspace. In this way cross-media similarity is more consistent with human perception with the incorporation of user interaction. Both image and audio data are selected for experiments and comparisons. Given the same visual and auditory features the new approach outperforms ICA, PCA and PLS methods both in precision and recall performance. Overall cross-media retrieval results between images and audios are very encouraging. Keywords cross-media retrieval; heterogeneity; canonical correlation; subspace mapping; relevance feedback Background Cross-media retrieval discussed in this paper is a new research topic in content-based multimedia analysis and retrieval area. Most researchers focus on how to calculate the similarity between two multimedia objects of the same modality. Cross-media similarity between multimedia objects of different modalities is difficult to measure because of content heterogeneity. This paper solves the problem of cross-media similarity measure with semi-supervised learning methods, and support user interaction in relevance feedback. This paper basically implements a primary cross-media retrieval system. The main limitation is that cross-media indexing strategies need to be incorporated when the size of multimedia database is huge. This work is supported by the National Natural Science Foundation of China (Nos.60533090, 60525108), Key Technology R&D Program (2006BAH02A13-4), the National High Technology Research and Development Program (863 Program) of China(2006AA010107), Program for Changjiang Scholars and Innovative Research Team in University(IRT0652,PCSIRT). Heterogeneous multimedia data stored in digital libraries and data centers is semi-structured or unstructured, and these multimedia data is connected from both semantic and content level. Above projects focus on intelligent processing and integrative retrieval techniques to better utilize multimedia resources. The research team focuses on multimedia semantic learning by content analysis, cross-media retrieval algorithms, multimedia database indexing, etc., and has published some papers. This paper focuses on the part of cross-media retrieval algorithm, which solves the problem of cross-media similarity measure.