¡¡Chinese Journal of Computers   Full Text
  TitleStudy of Informative Gene Selection for Tissue Classification Based on Tumor Gene Expression Profiles
  AuthorsLI Ying-Xin LI Jian-Geng RUAN Xiao-Gang
  Address(School of Electronic Information and Control Engineering, Beijing University of Technology, Beijing 100022)
  Year2006
  IssueNo.2(324¡ª330)
  Abstract &
  Background
Informative gene selection is of great importance in the analysis of microarray expression data because of its huge dimensionality and relatively small samples, and also provides a systemic and promising way to reveal the gene expression patterns of tumors with large scale gene expression profiles. In this paper, the authors analyze the Multi-Class tumor gene expression profile dataset, which contains 218 tumor samples spanning 14 common tumor types, as well as 90 normal tissue samples, to find a small subset of genes for distinguishing tumor from normal tissues. First, a Relief-based feature selection algorithm is applied to create candidate feature subsets and the one with the best classification performance is selected as the informative gene subset for classification. Then, a sensitivity analysis method based on the classifier of support vector machine with RBF kernel is employed to eliminate the redundant genes. As a result, 52 informative genes are selected as markers for making distinctions between different tumor tissues and their normal counterparts, and their expressions are analyzed to explore the tumor gene expression patterns. At the end of this paper, several methods for informative gene selection are also analyzed and compared to validate the feasibility and effectiveness of the method employed in this work.

keywords tumor; gene expression; informative genes; tissue classification; feature selection; support vector machine

background This work is a part of the project ¡°Study of Some Problems in Bioinformatics in View of Complex Systems¡± which aims to make a comprehensive understanding of biological data in a systemic way from the point of view of complex systems and is supported by the National Natural Science Foundation of China under grant Noª±60234020.
Informative gene selection is a key problem in the analysis of DNA microarray expression data since there are a large number of gene expression values per sample and a relatively small number of samples. The main purpose of informative gene selection is identifying genetic markers for classifying different tissue types, and it can thus be used for the selection of genes deregulated in tumors when the procedure of informative gene selection is applied to the dataset consisting of gene expression profiles of tumor tissues and their normal counterparts. At this point, this work addressed the problem of informative gene selection for distinguishing tumor from normal tissues using machine learning methods, and attempted to reveal the gene expression signatures of tumors by analyzing the expressions of the selected informative genes. This work may be of benefit for the study of mining bioinformation from gene expression profiles.