New approaches to phylogenetic tree search and their application to large numbers of protein alignments
- PMID: 17849327
- DOI: 10.1080/10635150701611134
New approaches to phylogenetic tree search and their application to large numbers of protein alignments
Abstract
Phylogenetic tree estimation plays a critical role in a wide variety of molecular studies, including molecular systematics, phylogenetics, and comparative genomics. Finding the optimal tree relating a set of sequences using score-based (optimality criterion) methods, such as maximum likelihood and maximum parsimony, may require all possible trees to be considered, which is not feasible even for modest numbers of sequences. In practice, trees are estimated using heuristics that represent a trade-off between topological accuracy and speed. I present a series of novel algorithms suitable for score-based phylogenetic tree reconstruction that demonstrably improve the accuracy of tree estimates while maintaining high computational speeds. The heuristics function by allowing the efficient exploration of large numbers of trees through novel hill-climbing and resampling strategies. These heuristics, and other computational approximations, are implemented for maximum likelihood estimation of trees in the program Leaphy, and its performance is compared to other popular phylogenetic programs. Trees are estimated from 4059 different protein alignments using a selection of phylogenetic programs and the likelihoods of the tree estimates are compared. Trees estimated using Leaphy are found to have equal to or better likelihoods than trees estimated using other phylogenetic programs in 4004 (98.6%) families and provide a unique best tree that no other program found in 1102 (27.1%) families. The improvement is particularly marked for larger families (80 to 100 sequences), where Leaphy finds a unique best tree in 81.7% of families.
Similar articles
-
On the quality of tree-based protein classification.Bioinformatics. 2005 May 1;21(9):1876-90. doi: 10.1093/bioinformatics/bti244. Epub 2005 Jan 12. Bioinformatics. 2005. PMID: 15647305
-
SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1. Syst Biol. 2012. PMID: 22139466
-
Very fast algorithms for evaluating the stability of ML and Bayesian phylogenetic trees from sequence data.Genome Inform. 2002;13:82-92. Genome Inform. 2002. PMID: 14571377
-
Inferring trees.Methods Mol Biol. 2008;452:287-309. doi: 10.1007/978-1-60327-159-2_14. Methods Mol Biol. 2008. PMID: 18566770 Review.
-
BOOL-AN: a method for comparative sequence analysis and phylogenetic reconstruction.Mol Phylogenet Evol. 2009 Sep;52(3):887-97. doi: 10.1016/j.ympev.2009.04.019. Epub 2009 May 5. Mol Phylogenet Evol. 2009. PMID: 19422923 Review.
Cited by
-
The Tree Reconstruction Game: Phylogenetic Reconstruction Using Reinforcement Learning.Mol Biol Evol. 2024 Jun 1;41(6):msae105. doi: 10.1093/molbev/msae105. Mol Biol Evol. 2024. PMID: 38829798 Free PMC article.
-
Online Bayesian Phylodynamic Inference in BEAST with Application to Epidemic Reconstruction.Mol Biol Evol. 2020 Jun 1;37(6):1832-1842. doi: 10.1093/molbev/msaa047. Mol Biol Evol. 2020. PMID: 32101295 Free PMC article.
-
A conserved non-reproductive GnRH system in chordates.PLoS One. 2012;7(7):e41955. doi: 10.1371/journal.pone.0041955. Epub 2012 Jul 27. PLoS One. 2012. PMID: 22848672 Free PMC article.
-
Addressing inter-gene heterogeneity in maximum likelihood phylogenomic analysis: yeasts revisited.PLoS One. 2011;6(8):e22783. doi: 10.1371/journal.pone.0022783. Epub 2011 Aug 5. PLoS One. 2011. PMID: 21850235 Free PMC article.
-
Informatics approaches to understanding TGFbeta pathway regulation.Development. 2009 Nov;136(22):3729-40. doi: 10.1242/dev.030320. Development. 2009. PMID: 19855015 Free PMC article. Review.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources