Aligning protein sequences with predicted secondary structure

John Kececioglu¹, Eagu Kim, Travis Wheeler

Affiliations

PMID: 20377464
DOI: 10.1089/cmb.2009.0222

Aligning protein sequences with predicted secondary structure

John Kececioglu et al. J Comput Biol. 2010 Mar.

. 2010 Mar;17(3):561-80.

doi: 10.1089/cmb.2009.0222.

Authors

John Kececioglu¹, Eagu Kim, Travis Wheeler

Affiliation

¹ Department of Computer Science, University of Arizona, Tucson, Arizona 85721, USA. kece@cs.arizona.edu

PMID: 20377464
DOI: 10.1089/cmb.2009.0222

Abstract

Accurately aligning distant protein sequences is notoriously difficult. Since the amino acid sequence alone often does not provide enough information to obtain accurate alignments under the standard alignment scoring functions, a recent approach to improving alignment accuracy is to use additional information such as secondary structure. We make several advances in alignment of protein sequences annotated with predicted secondary structure: (1) more accurate models for scoring alignments, (2) efficient algorithms for optimal alignment under these models, and (3) improved learning criteria for setting model parameters through inverse alignment, as well as (4) in-depth experiments evaluating model variants on benchmark alignments. More specifically, the new models use secondary structure predictions and their confidences to modify the scoring of both substitutions and gaps. All models have efficient algorithms for optimal pairwise alignment that run in near-quadratic time. These models have many parameters, which are rigorously learned using inverse alignment under a new criterion that carefully balances score error and recovery error. We then evaluate these models by studying how accurately an optimal alignment under the model recovers benchmark reference alignments that are based on the known three-dimensional structures of the proteins. The experiments show that these new models provide a significant boost in accuracy over the standard model for distant sequences. The improvement for pairwise alignment is as much as 15% for sequences with less than 25% identity, while for multiple alignment the improvement is more than 20% for difficult benchmarks whose accuracy under standard tools is at most 40%.

PubMed Disclaimer

Cited by

Crystal structure of Zebrafish interferons I and II reveals conservation of type I interferon structure in vertebrates.
Hamming OJ, Lutfalla G, Levraud JP, Hartmann R. Hamming OJ, et al. J Virol. 2011 Aug;85(16):8181-7. doi: 10.1128/JVI.00521-11. Epub 2011 Jun 8. J Virol. 2011. PMID: 21653665 Free PMC article.
Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization.
Krieger S, Kececioglu J. Krieger S, et al. Bioinformatics. 2020 Jul 1;36(Suppl_1):i317-i325. doi: 10.1093/bioinformatics/btaa336. Bioinformatics. 2020. PMID: 32657384 Free PMC article.
Defining the Domain Arrangement of the Mammalian Target of Rapamycin Complex Component Rictor Protein.
Zhou P, Zhang N, Nussinov R, Ma B. Zhou P, et al. J Comput Biol. 2015 Sep;22(9):876-86. doi: 10.1089/cmb.2015.0103. Epub 2015 Jul 15. J Comput Biol. 2015. PMID: 26176550 Free PMC article.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- Atypon
Other Literature Sources
- H1 Connect

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Aligning protein sequences with predicted secondary structure

Affiliation

Aligning protein sequences with predicted secondary structure

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources