Learning scoring schemes for sequence alignment from partial examples
- PMID: 18989042
- DOI: 10.1109/TCBB.2008.57
Learning scoring schemes for sequence alignment from partial examples
Abstract
When aligning biological sequences, the choice of parameter values for the alignment scoring function is critical. Small changes in gap penalties, for example, can yield radically different alignments. A rigorous way to compute parameter values that are appropriate for aligning biological sequences is through inverse parametric sequence alignment. Given a collection of examples of biologically correct alignments, this is the problem of finding parameter values that make the scores of the example alignments close to those of optimal alignments for their sequences. We extend prior work on inverse parametric alignment to partial examples, which contain regions where the alignment is left unspecified, and to an improved formulation based on minimizing the average error between the score of an example and the score of an optimal alignment. Experiments on benchmark biological alignments show we can find parameters that generalize across protein families and that boost the accuracy of multiple sequence alignment by as much as 25 percent.
Similar articles
-
Accuracy of structure-based sequence alignment of automatic methods.BMC Bioinformatics. 2007 Sep 20;8:355. doi: 10.1186/1471-2105-8-355. BMC Bioinformatics. 2007. PMID: 17883866 Free PMC article.
-
A word-oriented approach to alignment validation.Bioinformatics. 2005 May 15;21(10):2230-9. doi: 10.1093/bioinformatics/bti335. Epub 2005 Feb 22. Bioinformatics. 2005. PMID: 15728118
-
Scoring profile-to-profile sequence alignments.Protein Sci. 2004 Jun;13(6):1612-26. doi: 10.1110/ps.03601504. Protein Sci. 2004. PMID: 15152092 Free PMC article.
-
The many faces of sequence alignment.Brief Bioinform. 2005 Mar;6(1):6-22. doi: 10.1093/bib/6.1.6. Brief Bioinform. 2005. PMID: 15826353 Review.
-
Protein sequence comparisons: searching databases and aligning sequences.Curr Opin Biotechnol. 1994 Feb;5(1):24-8. doi: 10.1016/s0958-1669(05)80065-5. Curr Opin Biotechnol. 1994. PMID: 7764639 Review.
Cited by
-
Accuracy estimation and parameter advising for protein multiple sequence alignment.J Comput Biol. 2013 Apr;20(4):259-79. doi: 10.1089/cmb.2013.0007. Epub 2013 Mar 14. J Comput Biol. 2013. PMID: 23489379 Free PMC article.
-
Parameters for accurate genome alignment.BMC Bioinformatics. 2010 Feb 9;11:80. doi: 10.1186/1471-2105-11-80. BMC Bioinformatics. 2010. PMID: 20144198 Free PMC article.
-
Optimizing substitution matrix choice and gap parameters for sequence alignment.BMC Bioinformatics. 2009 Dec 2;10:396. doi: 10.1186/1471-2105-10-396. BMC Bioinformatics. 2009. PMID: 19954534 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous