Aligning protein sequences with predicted secondary structure
- PMID: 20377464
- DOI: 10.1089/cmb.2009.0222
Aligning protein sequences with predicted secondary structure
Abstract
Accurately aligning distant protein sequences is notoriously difficult. Since the amino acid sequence alone often does not provide enough information to obtain accurate alignments under the standard alignment scoring functions, a recent approach to improving alignment accuracy is to use additional information such as secondary structure. We make several advances in alignment of protein sequences annotated with predicted secondary structure: (1) more accurate models for scoring alignments, (2) efficient algorithms for optimal alignment under these models, and (3) improved learning criteria for setting model parameters through inverse alignment, as well as (4) in-depth experiments evaluating model variants on benchmark alignments. More specifically, the new models use secondary structure predictions and their confidences to modify the scoring of both substitutions and gaps. All models have efficient algorithms for optimal pairwise alignment that run in near-quadratic time. These models have many parameters, which are rigorously learned using inverse alignment under a new criterion that carefully balances score error and recovery error. We then evaluate these models by studying how accurately an optimal alignment under the model recovers benchmark reference alignments that are based on the known three-dimensional structures of the proteins. The experiments show that these new models provide a significant boost in accuracy over the standard model for distant sequences. The improvement for pairwise alignment is as much as 15% for sequences with less than 25% identity, while for multiple alignment the improvement is more than 20% for difficult benchmarks whose accuracy under standard tools is at most 40%.
Similar articles
-
Learning scoring schemes for sequence alignment from partial examples.IEEE/ACM Trans Comput Biol Bioinform. 2008 Oct-Dec;5(4):546-56. doi: 10.1109/TCBB.2008.57. IEEE/ACM Trans Comput Biol Bioinform. 2008. PMID: 18989042
-
PROMALS: towards accurate multiple sequence alignments of distantly related proteins.Bioinformatics. 2007 Apr 1;23(7):802-8. doi: 10.1093/bioinformatics/btm017. Epub 2007 Jan 31. Bioinformatics. 2007. PMID: 17267437
-
Multiple sequence alignment based on profile alignment of intermediate sequences.J Comput Biol. 2008 Sep;15(7):767-77. doi: 10.1089/cmb.2007.0132. J Comput Biol. 2008. PMID: 18662101
-
Multiple sequence alignments.Curr Opin Struct Biol. 2005 Jun;15(3):261-6. doi: 10.1016/j.sbi.2005.04.002. Curr Opin Struct Biol. 2005. PMID: 15963889 Review.
-
Sequence and structure alignments in post-AlphaFold era.Curr Opin Struct Biol. 2023 Apr;79:102539. doi: 10.1016/j.sbi.2023.102539. Epub 2023 Feb 6. Curr Opin Struct Biol. 2023. PMID: 36753924 Review.
Cited by
-
Crystal structure of Zebrafish interferons I and II reveals conservation of type I interferon structure in vertebrates.J Virol. 2011 Aug;85(16):8181-7. doi: 10.1128/JVI.00521-11. Epub 2011 Jun 8. J Virol. 2011. PMID: 21653665 Free PMC article.
-
Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization.Bioinformatics. 2020 Jul 1;36(Suppl_1):i317-i325. doi: 10.1093/bioinformatics/btaa336. Bioinformatics. 2020. PMID: 32657384 Free PMC article.
-
Defining the Domain Arrangement of the Mammalian Target of Rapamycin Complex Component Rictor Protein.J Comput Biol. 2015 Sep;22(9):876-86. doi: 10.1089/cmb.2015.0103. Epub 2015 Jul 15. J Comput Biol. 2015. PMID: 26176550 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources