¡¡Chinese Journal of Computers   Full Text
  TitleResearch on Parameterized Algorithms of the Individual Haplotyping Problem
  AuthorsXIE Min-Zhu1),2) CHEN Jian-Er1) WANG Jian-Xin1)
  Address1)(School of Information Science and Engineering, Central South University, Changsha 410083) 2)(College of Physics and Information Science, Hunan Normal University, Changsha 410081)
  Year2009
  IssueNo.8(1637¡ª1650)
  Abstract &
  Background
Abstract The Individual Haplotyping Problem is the computing problem of inducing an individual¡¯s haplotypes based on several optimal criteria from one¡¯s DNA sequence fragment data. Limited by current sequencing techniques, the maximum length of a fragment sequenced directly by DNA sequencers is not large, and the maximum number k1 of SNP sites covered by a fragment is usually smaller than 10. In order to save money and time, the maximum number k2 of fragments covering a SNP site is also small and about 10. Compared with the number n of SNPs of interest and the number m of fragments, k1 and k2 are both very small. Based on the above fact, this paper parameterizes MSR(Minimum SNP Removal) and MFR(Minimum Fragment Removal) models of the individual haplotyping problem, and introduces the corresponding exact algorithms to solve the gapless MSR and MFR problems in the time complexity O(nk1k2+mlogm+mk1) and O(mk22+mk1k2+mlogm+nk2) respectively. Compared with Bafna et al.¡¯s algorithms of time complexity O(mn2) and O(m2n+m3) respectively, the algorithms are more efficient and applicable. Keywords SNPs(Single-Nucleotide Polymorphisms); haplotype; parameterized algorithm; Minimum SNP Removal (MSR); Minimum Fragment Removal (MFR)
Background This work is supported by the National Natural Science Foundation of China (No.60773111), the National Basic Research Program(973 Program) of China (No.2008CB317107), Program for Changjiang Scholars and Innovative Research Team in University (No.IRT0661), Hunan Provincial Natural Science Foundation of China (No.09JJ3116), China Postdoctoral Science Foundation and Central South University Postdoctoral Science Foundation.
Haplotypes are widely used in complex diseases association study. However, it is time-consuming and expensive to determine haplotypes only using biological techniques. Therefore, effective computational techniques are in urgent demand to help determine haplotypes. Recently more and more individual genomes have been sequenced and the individual haplotyping problem, i.e., using computational techniques to infer the haplotypes of an individual from his or her DNA sequence fragments, has been a hotspot of bioinformatics.
Lancia et al. introduced two optimal computational models of the individual haplotyping problem: Minimum SNP removal (MSR) and Minimum fragment removal (MFR), and they proved that MFR and MSR are NP-hard if fragments have gaps. When all fragments are gapless, Bafna et al. introduced an O(mn2) algorithm and an O(m2n+m3) algorithm to solve MSR and MFR respectively, where m is the number of fragments and n is the number of SNP sites. When inferring a pair of haplotypes over a large region of a chromosome, m and n are very large, and both algorithms are still slow.
Limited by current sequencing techniques, the maximum length of fragments directly sequenced by modern automated DNA sequencing instruments is small. In order to save money and time, in the real DNA sequence fragments data, the number of fragments covering a SNP site is limited. Based on the above observations, the current paper parameterized the models of MSR and MFR and introduced new parameterized algorithms P_MSR and P_MFR, which solve the gapless MSR and MFR problems with significantly improved efficiency respectively.