Getting a Tree Fast: Neighbor Joining, FastME, and Distance‐Based Methods

互联网2013-12-31

1103

Abstract
Table of Contents
Figures
Literature Cited

Abstract

Neighbor Joining (NJ), FastME, and other distance?based programs including BIONJ, WEIGHBOR, and (to some extent) FITCH, are fast methods to build phylogenetic trees. This makes them particularly effective for large?scale studies or for bootstrap analysis, which require runs on multiple data sets. Like maximum likelihood methods, distance methods are based on a sequence evolution model that is used to estimate the matrix of pairwise evolutionary distances. Computer simulations indicate that the topological accuracy of FastME is best, followed by FITCH, WEIGHBOR, and BIONJ, while NJ is worse. Moreover, FastME is even faster than NJ with large data sets. Best?distance methods are equivalent to parsimony in most cases, but become more accurate when the molecular clock is strongly violated or in the presence of long (e.g., outgroup) branches. This unit describes how to use distance?based methods, focusing on NJ (the most popular) and FastME (the most efficient today). It also describes how to estimate evolutionary distances from DNA and proteins, how to perform bootstrap analysis, and how to use CLUSTAL to compute both a sequence alignment and a phylogenetic tree.

Keywords: Phylogenetic trees; evolutionary distances; distance?based methods; NJ; FastME

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Basic Protocol 1: Using the NEIGHBOR Program from the PHYLIP Package to Construct a Phylogenetic Tree
Support Protocol 1: Distance Matrix Estimation from DNA (or RNA) Sequences Using DNADIST
Support Protocol 2: Distance Matrix Estimation from Proteins Using PROTDIST
Support Protocol 3: Bootstrapping Using SEQBOOT and CONSENSE
Alternate Protocol 1: Using BIONJ, WEIGHBOR, or FITCH to Construct a Tree
Alternate Protocol 2: Using FastME to Construct a Tree
Alternate Protocol 3: Computing NJ Trees Using Clustal
Guidelines for Understanding Results
Commentary
Literature Cited
Figures
Tables

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Materials

GO TO THE FULL PROTOCOL:

PDF or HTML at Wiley Online Library

Figures

Figure 6.3.1 Flowchart illustrating the relationship between the multiple protocols presented in this unit.

View Image

Figure 6.3.2 Distance matrix in square format.

View Image

Figure 6.3.3 The NEIGHBOR screen showing options for renaming files as well as options for settings and their defaults.

View Image

Figure 6.3.4 Two trees in Newick format, which were obtained from the distance matrix in Figure by BIONJ and NEIGHBOR, respectively. Both trees have identical topologies, but slightly different branch lengths.

View Image

Figure 6.3.5 TreeView representation of the BIONJ tree of Figure .

View Image
Figure 6.3.6 NEIGHBOR tree, as represented in the outfile.

View Image
Figure 6.3.7 Alignment in interleaved PHYLIP format.

View Image
Figure 6.3.8 Alignment in sequential PHYLIP format.

View Image

Figure 6.3.9 The DNADIST screen with options for renaming files and setting parameters. The default parameters are shown.

View Image

Figure 6.3.10 The PROTDIST screen showing options for renaming files and setting parameters. The default parameter settings are shown.

View Image

Figure 6.3.11 The SEQBOOT screen showing options for renaming files and setting parameters. The default parameters are shown.

View Image

Figure 6.3.12 The CONSENSE screen showing options for renaming files and setting parameters. The default parameters are shown.

View Image

Figure 6.3.13 TreeView representation of the bootstrap tree that is obtained with NEIGHBOR with 1000 replicates. Bootstrap supports are associated with internal branches (or clades). For example, Spongipell, Athelia_bo is supported by 790 pseudo‐trees out of 1000.

View Image

Figure 6.3.14 FastME screen (in command‐prompt window) showing options for renaming files and setting parameters.

View Image

Figure 6.3.15 The Trees menu in the program ClustalX showing the menu commands and dialog boxes used to control how the program constructs neighbor‐joining trees. Note that Exclude Positions with Gaps and Correct for Multiple Substitutions are not selected. If they were selected, a check mark would appear next to each option.

View Image

Figure 6.3.16 Nearest Neighbor Interchange (NNI) swapping subtrees B and C .

View Image

Figure 6.3.17 Topological accuracy of NEIGHBOR (N), FastME (M), DNAPARS (D) and PHYML (P) with 5000 randomly generated 40‐taxon Trees. Maximum pairwise divergence is given in number of substitutions per site. Topological accuracy is measured by the number of clades in the inferred tree that are not in the correct tree, divided by the total number of clades; 0.0 corresponds to identical trees, while 1.0 means that both trees have no clade in common.

View Image

Videos

Literature Cited

	Atteson, K. 1997. The performance of the NJ method of phylogeny reconstruction. In Mathematical Hierarchies and Biology (B. Mirkin, F.R. McMorris, F.S. Roberts, and A. Rzhetsky, eds.) pp.133‐148. American Mathematical Society, Providence, R.I.
	Berry, V. and Gascuel, O. 1996. Interpretation of bootstrap trees: Threshold of clade selection and induced gain. Mol. Biol. Evol. 13:999‐1011.
	Bruno, W.J., Socci, N.D., and Halpern, A.L. 2000. Weighted neighbor joining: A likelihood‐based approach to distance‐based phylogeny reconstruction. Mol. Biol. Evol. 17:189‐197.
	Bulmer, M. 1991. Use of the method of generalized least squares in reconstructing phylogenies from sequence data. Mol. Biol. Evol. 8:868‐883.
	Dayhoff, M.O., Schwartz, R.M., and Orcutt, B.C. 1979. A model for evolutionary change in proteins. In Atlas of Protein Sequence and Structure (M.O. Dayhoff, ed.), vol. 5, pp. 345‐352.
	Desper, R. and Gascuel, O. 2002. Fast and accurate phylogeny reconstruction algorithms based on the minimum‐evolution principle. J. Comp. Biol. 9:687‐705.
	Desper, R. and Gascuel, O. 2004. Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least‐squares tree fitting. Mol. Biol. Evol. 21:587‐598.
	Desper, R. and Gascuel, O. 2005. The minimum evolution distance‐based approach to phylogenetic inference. In Mathematics of Evolution and Phylogeny (O. Gascuel, ed.) pp. 1‐32. Oxford University Press, Oxford.
	Felsenstein, J. 1989. PHYLIP–Phylogeny inference package (version 3.2). Cladistics 5:164‐166.
	Felsenstein, J. 1997. An alternating least‐squares approach to inferring phylogenies from pairwise distances. Syst. Biol. 46:101‐111.
	Felsenstein, J. and Churchill, G.A. 1996. A hidden Markov model approach to variation among sites in rate of evolution Mol. Biol. Evol. 13:93‐104
	Fitch, W.M. and Margoliash, E. 1967. Construction of phylogenetic trees. Science 155:279‐284.
	Galtier, N. 2001. Maximum‐likelihood phylogenetic analysis under a covarion‐like model. Mol. Biol. Evol. 18:866‐873.
	Gascuel, O. 1997a. BIONJ: An improved version of the NJ algorithm based on a simple model of sequence data. Mol. Biol. Evol. 14:685‐695.
	Gascuel, O. 1997b. Concerning the NJ algorithm and its unweighted version, UNJ. Mathematical Hierarchies and Biology (B. Mirkin, F.R. McMorris, F.S. Roberts, and A. Rzhetsky, eds.) pp. 149‐170. American Mathematical Society, Providence, R.I.
	Graur, D. and Li, W.‐H. 2000. Fundamentals of Molecular Evolution. Sinauer Associates, Sunderland, Mass.
	Guindon, S. and Gascuel, O. 2002. Efficient biased estimation of evolutionary distances when substitution rates vary across sites. Mol. Biol. Evol. 19:534‐543.
	Guindon, S. and Gascuel, O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696‐704.
	Howe, K., Bateman, A., and Durbin, R. 2002. QuickTree: Building huge Neighbour‐Joining trees of protein sequences. Bioinformatics 18:1546‐1547.
	Jin, L. and Nei, M. 1990. Limitations of the evolutionary parsimony method of phylogenetic analysis. Mol. Biol. Evol. 7:82‐102.
	Jones, D.T., Taylor, W.R., and Thornton, J.M. 1992. The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci. 8:275‐82.
	Jukes, T.H. and Cantor, C.R. 1969. Evolution of protein molecules. Mammalian Protein Metabolism (H. N. Munro, ed.) pp.21‐132. Academic Press, New York.
	Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol. 16:111‐120.
	Kimura, M. 1983. The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, U.K.
	Kishino, H. and Hasegawa, M. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea J. Mol. Evol. 29:170‐179.
	Nei, M. and Jin, L. 1989. Variances of the average numbers of nucleotide substitutions within and between populations. Mol. Biol. Evol. 6:290‐300.
	Page, R.D.M. and Holmes, E.C. 1998. Molecular Evolution: A Phylogenetic Approach. Blackwell Scientific, Oxford, U.K.
	Pauplin, Y. 2000. Direct calculation of a tree length using a distance matrix. J. Mol. Evol. 51:41‐47.
	Perrière, G. and Gouy, M. 1996. WWW‐Query: An on‐line retrieval system for biological sequence banks. Biochimie 78:364‐369.
	Rzhetsky, A. and Nei, M. 1993. Theoretical foundation of the minimum‐evolution method of phylogenetic inference. Mol. Biol. Evol. 10:1073‐1095.
	Saitou, N. and Nei, M. 1987. The neighbor‐joining method: A new method for reconstruction of phylogenetic trees. Mol. Biol. Evol. 4:406‐425.
	Sattath, S. and Tversky, A. 1977. Additive similarity trees. Psychometrika 42:319‐345.
	Sokal, R.R. and Michener, C.D. 1958. A statistical method for evaluating systematic relationships. Univ. Kans. Sci. Bull. 38:1409‐1438.
	Stamatakis, A., Ludwig, T., and Meier, H. 2004. RAxML‐III: A fast program for maximum likelihood‐based inference of large phylogenetic trees. Bioinformatics 21:456‐463.
	Steel, M.A. 1994. Recovering a tree from the Markov leaf colourations it generates under a Markov model. Appl. Math. Lett. 7:19‐23.
	Studier, J.A. and Keppler, K.J. 1988. A note on the neighbor‐joining algorithm of Saitou and Nei. Mol. Biol. Evol. 5:729‐31.
	Swofford, D.L., Olsen, G.L., Waddell, P.J., and Hillis, D.M. 1996. Phylogenetic inference. In Molecular Systematics (D.M. Hillis, C. Moritz, and B.K. Mable, eds.) pp. 407‐514. Sinauer Associates, Sunderland, Mass.
	Thompson, J.D., Gibson, T.J., Plewniak, F., Jeanmougin, F., and Higgins, D.G. 1997. The ClustalX windows interface: Flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res. 25:4876‐4882.
	Veerassamy, S., Smith, A., and Tillier, E.R. 2003. A transition probability model for amino acid substitutions from blocks. J. Comput Biol. 10:997‐1010.
	Vinh, L.S. and von Haeseler, A. 2005. Shortest triplet clustering: Reconstructing large phylogenies using representative sets. BMC Bionfirmatics 6:92.
	Xia, X. and Xie, Z. 2001. DAMBE: Data analysis in molecular biology and evolution. J. Hered. 92:371‐373.
	Yang, Z. 1996. Among‐site rate variation and its impact on phylogenetic analyses. TREE 11:367‐372.
	Zaretskii, K. 1965. Reconstructing a tree from the distances between its leaves. Uspehi Mathematicheskikh Nauk 20:90‐92 (in Russian).
Internet Resources
	http://atgc.lirmm.fr/fastme
	This web page from the authors provides FastME C source code and binaries for Windows, MAC OS and LINUX, as well as several papers to understand in depth the minimum evolution principle, its algorithms and its properties.
	http://atgc.lirmm.fr/phyml
	This web page provides PHYML binaries for Windows, MAC OS and LINUX, and a web server to run PHYML online.
	http://www.cladistics.com/webtnt.html
	Goloboff, P., Farris, S., and Nixon, K. 2000. TNT: Tree analysis using new technology. Beta version, published by the authors, Tucumán, Argentina.
	http://evolution.genetics.washington.edu/phylip/software.html
	Joe Felsenstein's Web page, containing an extensive list of phylogeny software programs, including numerous distance‐based methods.