¡¡Chinese Journal of Computers   Full Text
  TitleCombining Position-Specific-Value Method and SVM for Remote Protein Classification
  AuthorsLI Yu-Gang1) ZHANG Fa2) LIU Zhi-Yong2)
  Address1)(Beijing Key Laboratory for Intelligent Information Technology, School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081)
2)(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080)
  Year2008
  IssueNo.1(43¡ª50)
  Abstract &
  Background
Abstract An important research topic in bioinformatics is to understand the meaning and function of each protein encoded in the genome. One of the most successful approaches to this problem is via sequence similarity with one or more proteins whose functions are known. The SVM based methods are among the most successful ones. Currently, one of the most accurate homology detection method is the SVM-pairwise method. This method combines the pairwise sequence similarity with Support Vector Machine. This paper presents an alternative for SVM-based protein classification. The method, SVM-PSV, uses a new sequence similarity kernel, the Position Specific Values (PSV) kernel, for use with Support Vector Machines (SVMs) to solve the protein classification problem. The resulting algorithm gives better recognizing accuracy in the comparison with state-of-art methods, including SVM-pairwise, in the experiments of the detection of the homology based on the SCOP database. In the respect of computational efficiency, this method is significantly better than the SVM-pairwise one.

keywords bioinformatics; kernel; PSV; SVM; SCOP

background The major aim of this paper is to investigate the accurate and the computing efficiencies of existing methods and lay down a sound base for further implementation. The authors find a new method of protein classification, whose computing efficiency is higher and shows a better accuracy in the experiments.
The project is supported in part by the National Natural Science Foundation of China, which aims at finding a more accurate protein structure predicting method.