Molecular Phylogenetics and Evolution 58 (2011) 297–303 Contents lists available at ScienceDirect Molecular Phylogenetics and Evolution journal homepage: www.elsevier.com/locate/ympev Reconstructing the phylogeny of ‘‘Buarremon’’ brush-ﬁnches and near relatives (Aves, Emberizidae) from individual gene trees Alexander Flórez-Rodríguez a,⇑, Matthew D. Carling b, Carlos Daniel Cadena a a b Laboratorio de Biología Evolutiva de Vertebrados, Departamento de Ciencias Biológicas, Universidad de los Andes, Apartado 4976, Bogotá, Colombia Berry Biodiversity Conservation Center, Department of Zoology and Physiology, University of Wyoming, Laramie, WY 82071, USA a r t i c l e i n f o Article history: Received 26 July 2010 Revised 2 November 2010 Accepted 15 November 2010 Available online 25 November 2010 Keywords: Anomalous genes trees BEST COAL Minimizing deep coalescences Species tree Topological incongruence a b s t r a c t Gene trees are often assumed to be equivalent to species trees, but processes such as incomplete lineage sorting can generate incongruence among gene topologies and analyzing multilocus data in concatenated matrices can be prone to systematic errors. Accordingly, a variety of new methods have been developed to estimate species trees using multilocus data sets. Here, we apply some of these methods to reconstruct the phylogeny of Buarremon and near relatives, a group in which phylogenetic analyses of mitochondrial DNA sequences produced results that were inconsistent with relationships implied by a taxonomy based on variation in external phenotype. Gene genealogies obtained for seven loci (one mitochondrial, six nuclear) were varied, with some supporting and some rejecting the monophyly of Buarremon. Overall, our speciestree analyses tended to support a monophyletic Buarremon, but due to lack of congruence between methodologies, resolution of the phylogeny of this group remains uncertain. More generally, our study indicates that the number of individuals sampled can have an important effect on phylogenetic reconstruction, that the use of seven markers does not guarantee obtaining a strongly-supported species tree, and that methods for species-tree reconstruction can produce different results using the same data; these are important considerations for researchers using these new phylogenetic approaches in other systems. Ó 2010 Elsevier Inc. All rights reserved. 1. Introduction The accurate reconstruction of phylogenetic relationships among species using molecular data can be complicated for a variety of reasons (Carstens and Knowles, 2007; Brumﬁeld et al., 2008; Degnan and Rosenberg, 2009). One obstacle is the stochastic sorting of ancestral polymorphisms following species divergence at deep or shallow levels of the phylogeny, which can result in discordant topologies between gene and species trees (Pamilo and Nei, 1988; Takahata, 1989; Maddison and Knowles, 2006; McCormack et al., 2009). Furthermore, if reproductive isolation between taxa is not complete, then gene ﬂow can cause incongruent topologies across genes (Meng and Kubatko, 2009). In addition, processes of gene duplication (Fitch, 1970) and horizontal transfer (Cummings, 1994) can also complicate the traditional assumption that gene trees always reﬂect species trees (Nichols, 2001). Some of these problems are particularly acute when phylogenies are reconstructed from single-locus datasets (e.g. mitochondrial DNA; Jennings and Edwards, 2005). The ability to obtain sequence data from multiple loci across taxa is one of the major recent breakthroughs in molecular system⇑ Corresponding author. Fax: +57(1) 33934949x2817. E-mail addresses: al-ﬂ[email protected] (A. Flórez-Rodríguez), [email protected] cornell.edu (M.D. Carling), [email protected] (C.D. Cadena). 1055-7903/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.ympev.2010.11.012 atics, and has brought with it new opportunities and challenges for phylogeny reconstruction (Brito and Edwards, 2009; Knowles, 2010; Knowles and Kubatko, 2010). To date, the most common approach used in multilocus phylogenetics is concatenation of data using supermatrices, an approach assuming that all gene trees have the same topology (Rokas et al., 2003; Philippe et al., 2009). However, this approach might be positively misleading because of the existence of anomalous gene trees (i.e. gene trees that are more likely than the tree matching the species tree; Degnan and Rosenberg, 2006; Liu and Edwards, 2009). In addition, obtaining well-supported trees consistent with the true phylogeny using concatenated data might require a large number of loci in comparison to other, novel methods for species-tree reconstruction (Edwards et al., 2007; see below). Another often employed approach for phylogeny reconstruction from multilocus data is to construct consensus trees based on genealogies obtained independently for each locus (De Queiroz, 1993), but this requires a greater number of genes than concatenation to obtain a similarly supported tree (Gadagkar et al., 2005) and is also prone to be positively misleading as the number of genes increases when anomalous gene trees exist (Degnan et al., 2009). Owing to these limitations, developing alternatives to concatenation and consensus methods for the reconstruction of robust species trees has become an important priority. Novel analytical tools have allowed a movement towards multilocus methodologies for phylogenetic reconstruction more robust 298 A. Flórez-Rodríguez et al. / Molecular Phylogenetics and Evolution 58 (2011) 297–303 than concatenation or consensus methods (Jennings and Edwards, 2005; Carling and Brumﬁeld, 2008; Degnan and Rosenberg, 2009; Knowles, 2009, 2010). For example, one can attempt to reconcile gene trees contained within species trees by minimizing the number of deep coalescences, i.e., the coalescence of two gene copies that predates a particular speciation event (Maddison, 1997; Maddison and Knowles, 2006; Leaché, 2009). Alternatively, the BEST (Bayesian Estimation of Species Trees) method estimates the joint posterior distribution of gene trees for each locus and uses the resulting joint posterior distribution of gene trees to approximate the Bayesian posterior distribution of the species tree based on coalescent theory (Liu and Pearl, 2007; Edwards et al., 2007). Yet another alternative is to calculate the probability of a genealogy given a species tree under the coalescent (Degnan and Salter, 2005). Despite these developments, as the ﬁeld of multilocus phylogenetics is still maturing, studies using species-tree approaches with empirical data are scarce (Brumﬁeld et al., 2008; Liu et al., 2008; Linnen and Farrell, 2008; Hird et al., 2010; Waters et al., 2010). In this study, we use three methods of species-tree reconstruction based on multilocus data to revisit an empirical phylogenetic question of interest in systematic ornithology. In so doing, we come across some practical issues related to the effect of taxon sampling and the variation in results among methods that should be of interest to developers and users of such methods. The genus Buarremon (Aves, Emberizidae), as traditionally deﬁned, consists of three morphologically similar species: Buarremon torquatus, Buarremon brunneinucha, and Buarremon virenticeps. However, a recent study rejected the monophyly of the genus. Based on analyses involving sequences of four mitochondrial genes (ND2, cyt b, ATPase 6, ATPase 8) and two nuclear introns (ACOI and MUSK), the clade formed by representatives of multiple populations of B. torquatus was recovered as sister to the monophyletic genus Arremon, whereas B. brunneinucha and B. virenticeps formed a clade that was sister to a clade formed by species in the genus Lysurus albeit with low support (Cadena et al., 2007; Fig. 1a). This result was unexpected considering the overall phenotypic similarity of Buarremon taxa, but it led to the merging of the three genera in an expanded genus Arremon (Remsen et al., 2010). However, the results of this study were supported mainly by mitochondrial DNA sequences and the deep internodes of the mitochondrial topologies were notably shorter than the terminal branches. The existence of short internal branches can lead to retention of ancestral polymorphisms, representing one of the most difﬁcult scenarios for inferring phylogenies from single-locus data sets owing the high stochasticity in gene sorting. Under such scenarios, mitochondrial DNA can reveal trees with good nodal support that are incongruent with the species tree (Carling and Brumﬁeld, 2008; Leaché, 2009, McCormack et al., 2009). In addition, the mitochondrial DNA topology was at least partly inconsistent with the topologies of two nuclear introns (Cadena et al., 2007). Thus, in this study we revisit the relationships of Buarremon and related genera by reconstructing the species tree based on phylogenetic analyses of sequences from multiple loci. We discuss the implications of our results in relation to challenges in sampling design and in the use of different methods, which might be common to other studies seeking to reconstruct species trees using multilocus sequence data. 2. Materials and methods 2.1. Sampling, PCR ampliﬁcation and sequencing We obtained frozen tissue samples from the collections of Instituto Alexander von Humbolt (IAvH) and the Banco de Tejidos of the Museo de Historia Natural, Universidad de los Andes (ANDES-BT) for a single individual of four focal taxa (B. torquatus (IAvH- 1145), B. brunneinucha (ANDES-BT-0120), Arremon schelegeli (ANDES-BT-0016) and Lysurus castaneiceps (IAvH-CT-825)) and one outgroup (Atlapetes latinuchus (ANDES-BT-0130), a valid strategy considering these species are strongly supported monophyletic groups (Cadena et al., 2007). Note we did not include B. virenticeps in our study because this species was nested with strong support within a clade formed by populations of B. brunneinucha in phylogenetic analyses of mitochondrial and nuclear DNA sequences (Cadena et al., 2007). Therefore, the inclusion of this taxon (and of populations of B. torquatus that likely merit species status; Cadena and Cuervo, 2010) was not necessary to address the monophyly of Buarremon. The important question in this regard is whether the brunneinucha–virenticeps clade and the torquatus clade form a monophyletic group to the exclusion of the Lysurus and Arremon clades. Total DNA was extracted from all samples using a DNeasy tissue kit (QIAGEN, Valencia, CA), following the manufacturer’s protocol. Subsequently, we ampliﬁed six nuclear introns (four autosomal, two z-linked) and one mitochondrial protein-coding gene (Table 1) using primers published by Slade et al. (1993), Sorenson et al. (1999), and Kimball et al. (2009). The concentrations and conditions used for PCR were those described by Cadena et al. (2007). Amplicons were cleaned using Exosap IT (USB corporation, Cleveland, Ohio) and then sequenced in both directions. Resulting chromatographs were assembled in Geneious Basic 4.02. (Drummond et al., 2007). In cases where double peaks of equal height were detected in the sequence, the site was considered ambiguous (i.e. we did not attempt to phase haplotypes because sites with double peaks were scarce and because we had data for a single individual per species, which impeded haplotype estimations). 2.2. Alignment and conventional phylogenetic analyses Sequences were aligned using the MUSCLE algorithm (Edgar, 2004) implemented in Geneious (Drummond et al., 2007) and edited manually. Intralocus recombination was tested using the program RecombiTEST (Piganeau et al., 2004). Genealogies were reconstructed individually for each locus using maximum likelihood (ML) and Bayesian inference (BI) methods, and we also conducted analyses using a concatenated matrix that included sequences of all seven genes and a partitioned matrix specifying a substitution model for each of the seven loci. For each analysis, we implemented the model of nucleotide substitution selected as the best-ﬁt to the data (Table 1) based on the Akaike Information Criterion using ModelTest 3.7 for ML (Posada and Crandall, 1998) and MrModelTest 2.3 for BI (Nylander, 2004). Branch-and-bound searches were conducted using PAUP 4.0b10 (Swofford, 2002) for ML analyses; nodal support was assessed with 1000 ML heuristic bootstrap replicates with tree bisection-reconnection (TBR) branch swapping. Bayesian analyses were conducted in MrBayes 3.1.2 (Ronquist and Huelsenbeck, 2003) and consisted of two runs of four MCMC chains for 15,000,000 generations sampled every 1000 generations; the ﬁrst 25% of the trees sampled was discarded as burn-in. We used the program Tracer v.1.4.1 (Rambaut and Drummond, 2007) to evaluate sampling of the tree and parameter space in Bayesian analyses. Because plots of number of generations vs. likelihood showed stabilization, effective sample sizes for all parameters was always greater than 200, and the average standard deviation of split frequencies across runs was less than 0.002 in all the analyses, chains likely sampled the posterior distributions adequately. To assess convergence of MCMC runs, we plotted posterior probabilities of clades as a function of generation number and compared results of different runs by plotting the posterior probabilities of all splits for paired runs using AWTY (Wilgenbusch et al., 2004). 299 A. Flórez-Rodríguez et al. / Molecular Phylogenetics and Evolution 58 (2011) 297–303 (a) Cadena et al. 2007 78/21 (c) β-Fib 5 (b) ND2 A. schlegeli A. schlegeli B. torquatus 0.85/69 4/4 138-8/6-2 2/2 L. castaneiceps Arremon B. brunneinucha B. virenticeps Lysurus L. castaneiceps 0.77/- B. brunneinucha B. torquatus 0.49/- B. torquatus B. brunneinucha A. latinuchus A. latinuchus Outgroup (d) MyO ( f ) MUSK (e) β-Fib 7 B. torquatus L. castaneiceps B. brunneinucha A. schlegeli A. schlegeli A. schlegeli 1.00/89 0.98/64 1.00/85 L. castaneiceps B. torquatus B. torquatus 0.64/- 0.96/57 B. brunneinucha B. brunneinucha A. latinuchus L. castaneiceps A. latinuchus A. latinuchus (h) ACOI (g) TGF (i) Concatenated B. brunneinucha L. castaneiceps L. castaneiceps 1.00/93 B. torquatus A. schlegeli A. schlegeli 1.00/100 0.76/63 B. brunneinucha A. schlegeli B. torquatus 1.00/99 0.75/69 0.87/63 B. torquatus L. castaneiceps B. brunneinucha A. latinuchus A. latinuchus A. latinuchus (j) Partitioned B. brunneinucha 1.00/93 1.00/- B. torquatus A. schlegeli L. castaneiceps A. latinuchus Fig. 1. Gene tree topologies obtained for each locus and for the analyses of the concatenated and partitioned matrix. Fig. 1a is a schematic topology of the results obtained for Cadena et al. (2007), values within the triangles are indicating the sampling for: ND2/Concatenated matrix of four mitochondrial DNA. From 1b–j, value s in the nodes correspond to posterior probability of IB/ML bootstrap. Table 1 Summary of sequence variation and substitution models determined according to the AIC for different loci employed in phylogenetic analyses. Locus Length (pb) Number of parsimony informative characters Substitution modela Substitution modelb Base frequencyc bFib5 MyO bFib7 MUSK ACOI TGF ND2 Conc 601 740 998 563 992 546 1026 5456 0 1 4 7 2 12 99 125 TVM + I HKY TVM TVM TrN GTR TrN + I TVM + I GTR + I HKY GTR GTR HKY GTR HKY + G GTR + I 0.2966, 0.2686, 0.3128, 0.2792, 0.2587, 0.2419, 0.3004, 0.2801, Conc = Concatenated sequences of all seven loci. a Substitution model in ModelTest. b Substitution model in MrModelTest. c A,C,G,T. 0.1779, 0.2374, 0.1829, 0.1712, 0.1735, 0.2216, 0.3727, 0.2222, Accession number 0.211, 0.3144 0.2191, 0.274 0.1657, 0.338 0.2221, 0.327 0.208, 0.3598 0.2269, 0.309 0.0984, 0.228 0.187, 0.3107 HQ537396-400 HQ537411-415 HQ537401-405 HQ537406-410 HQ537391-395 HQ537421-425 HQ537416-420 300 A. Flórez-Rodríguez et al. / Molecular Phylogenetics and Evolution 58 (2011) 297–303 2.3. Minimizing deep coalescence We inferred a species tree by reconciling each of the individual gene trees obtained from ML and BI analyses described above with the 15 possible rooted evolutionary histories for the four ingroup taxa by minimizing deep coalescence events using the program Mesquite (Maddison and Maddison, 2004; Maddison and Knowles, 2006). The total cost of each species tree was estimated summing the number of deep coalescent events over the seven genealogies; the selected species tree was the topology with the lowest score of coalescent steps summed across all genealogies. 2.4. Probabilities of gene trees given the species tree We calculated the probability of each gene genealogy given all possible species trees - 15 species trees for 4 taxa – using COAL (Degnan and Salter, 2005). For this analysis, we considered ﬁve different internal:terminal branch length ratios (RIT) for the species trees: (1) RIT 1:1 with length (coalescence units) = 1.0, (2) RIT 1:1 with length = 0.5, (3) RIT 1:1 with length = 0.2, (4) RIT 1:100 where branch 1 has length 0.01 and other branches have length 1.0, and (5) RIT 1:100 where branch n 2 (n is the number of taxa) has length 0.01 and other branches have length 1.0. We chose these options available in COAL to represent different evolutionary scenarios with internal and terminal branches of equal length (1– 3), short internal branches (4) and short terminal branches (5). Because COAL requires fully resolved bifurcating topologies and the forced resolution of polytomies can bias results (Carling and Brumﬁeld, 2008), genes resulting in unresolved topologies (MyO and bFib 5, see below) were not included in this analysis. 2.5. Bayesian estimation of species tree The joint posterior distribution of gene trees and the species tree was estimated in BEST 1.6 (Liu et al., 2008). For our analysis, we employed commonly-used ﬂat priors recommended by the authors (Liu et al., 2008), which were applied in an earlier phylogenetic study of birds (Brumﬁeld et al., 2008): inverse gamma distribution with a = 3 and b = 0.003, and a uniform distribution with bounded values of 0.5 and 1.5, for the prior distribution of population size and mutation rate, respectively. Three runs of two MCMC chains were performed for 200,000,000 generations with a sampling frequency of 20,000; the ﬁrst 25% of trees was discarded as burn-in. Chain stationarity was assessed using Tracer v.1.4. based on plots of number of generations vs. likelihood and estimates of effective sample sizes for parameters, and convergence across runs was examined using AWTY as described above. We did not observe changes in posterior probabilities after 20 million generations and results of independent runs were remarkably similar to each other, indicating convergence. 3. Results 3.1. Conventional phylogenetic analyses No recombination events were detected for any of the seven loci. All six nuclear loci exhibited few informative characters (Table 1), which led to unresolved gene trees in some cases (e.g. b-Fib 5 and MyO; Fig. 1c and d). However, several gene trees recovered clades with good branch support (Bayesian posterior probabilities >0.95 and maximum likelihood bootstrap values >85%). Some of these suggested paraphyly of Buarrremon (b-Fib 7 and MUSK; Fig. 1e and f) and others supported its monophyly (TGF, ACOI; Fig. 1g and h). The mitochondrial locus (ND2) also supported monophyly of Buarremon (Fig. 1b), a result contrary to that ob- tained by Cadena et al. (2007; Fig. 1a) for the same locus with a more extensive sample of individuals, but this result was not strongly supported in our analyses (see below). Some of the loci that supported the monophyly of Buarremon did so in different ways (e.g. Lysurus was found as sister to Buarremon in the ACOI tree, whereas TGF placed Arremon and Buarremon as sister groups) and with varying support (0.87–1.00 posterior probability and 63– 99% ML bootstrap). Likewise, gene genealogies that recovered a paraphyletic Buarremon (e.g. b-Fib 7 and MUSK) differed topologically. Despite the variation across loci in gene-tree topologies, a partition homogeneity test did not reveal signiﬁcant inconsistency in phylogenetic signal across loci. The concatenated and partitioned analyses using data from all loci showed strong support for the monophyly of Buarremon, but recovered different relationships of Buarremon with Lysurus and Arremon (Fig. 1i and j). 3.2. Species tree analysis Buarremon was consistently recovered as a monophyletic group using the three methodologies for species-tree reconstruction. In particular, the species tree obtained by minimizing deep coalescences was the topology that placed a clade formed by L. castaneiceps and A. schlegeli as sister to a monophyletic Buarremon (six deep coalescence events), followed by the topology that placed A. schlegeli sister to a monophyletic Buarremon (seven deep coalescence events; Table 2). In contrast, the topology showing Buarremon as a paraphyletic group (Cadena et al., 2007) required 11 deep coalescence events. The topology with A. schlegeli sister to a monophyletic Buarremon was the species tree with the highest average probability of genealogies according to all replicates with different RITs in COAL (the highest value was recovered with RITs of 1:1 and 1.0; Table 3). This topology was also the species tree reconstructed using the Bayesian analysis in BEST, which supported the monophyly of Buarremon with a posterior probability of 0.82 (Fig. 2). 4. Discussion The monophyly of Buarremon based on traditional taxonomy was brought into question by Cadena et al. (2007) based on phylogenetic analyses of sequences of four mitochondrial loci. This result was partly inconsistent with one of the nuclear gene trees reported by them, but not with another. Based on sequence data from additional loci collected from a small number of individuals, we here found that other nuclear genes recovered different topologies from those reported by these authors, some of which had good support. Because there is only one species history, the variation in gene genealogies demonstrates that in Buarremon and near relatives, phylogenetic reconstructions based on single-locus data sets can result in erroneous inferences of evolutionary relationships. Incongruence between gene-tree topologies is often an indication of incomplete lineage sorting resulting from rapid radiations, a pattern frequently seen in phylogenies where internal branches are short in comparison to terminal branches (Degnan and Rosenberg, 2009). However, in the case of Buarremon and allies, the results of COAL show the highest average probability for a topology lacking short internodes (i.e. RIT of 1:1 and branch lengths of 1.0; Table 3). This suggests that a possible source of incongruence between gene genealogies in our study system could be the existence of populations with large effective sizes, in which ancestral polymorphism was retained despite relatively long intervals between speciation events. In contrast to results of the earlier study based on conventional phylogenetic analyses of a few genes (Cadena et al., 2007), three different methods for species-tree reconstruction from multilocus 301 A. Flórez-Rodríguez et al. / Molecular Phylogenetics and Evolution 58 (2011) 297–303 Table 2 Coalescence cost required to reconcile gene-tree topologies with the ﬁfteen possible species trees. Species tree Genealogies MyO bFib5 (((A sch,(B bru,B tor)),L cas,A lat) (((B tor,(B bru,A sch)),L cas),A lat) (((B bru,(B tor,A sch)),L cas),A lat) (((L cas,(B tor,B bru)),A sch),A lat) (((B tor,(L cas,B bru)),A sch),A lat) (((B bru,(L cas,B tor)),A sch),A lat) (((L cas,(B bru,A sch)),B tor),A lat) (((A sch,(B bru,L cas)),B tor),A lat) (((B bru,(A sch,L cas)),B tor),A lat) (((L cas,(A sch,B tor)),B bru),A lat) (((A sch,(L cas,B tor)),B bru),A lat) (((B tor,(L cas,A sch)),B bru),A lat) (((L cas,A sch),(B tor,B bru)),A lat) (((L cas,B tor),(A sch,B bru)),A lat) (((L cas,B bru),(A sch,B tor)),A lat) MUSK bFIb7 ACOI TGF ND2 Total score ML BI ML BI ML BI ML BI ML BI ML BI ML BI ML BI 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 1 2 2 1 2 2 1 1 1 0 0 0 1 2 2 1 2 2 1 2 2 1 1 1 3 3 3 3 3 3 1 1 0 2 2 1 1 2 2 3 3 3 3 3 3 1 1 0 2 2 1 1 2 2 2 2 1 3 3 3 3 3 3 0 1 1 2 2 1 2 2 1 3 3 3 3 3 3 0 1 1 2 2 1 1 2 2 0 1 1 3 3 3 3 3 3 1 2 2 1 2 2 0 1 1 3 3 3 3 3 3 1 2 2 0 1 1 1 2 2 3 3 3 3 3 3 1 2 2 0 1 1 1 2 2 3 3 3 3 3 3 1 2 2 1 2 2 1 2 2 2 2 1 2 2 1 0 2 2 1 2 2 1 2 2 1 1 0 1 1 0 0 2 2 7 10 9 9 13 13 13 14 12 11 13 11 6 11 10 7 10 9 9 13 13 12 13 11 10 12 10 6 11 10 Bold numbers indicate the lowest values of coalescence cost for each genealogy. Total is the sum across genealogies for each species tree. Abbreviations: A sch (Arremon schlegeli); B tor (Buarremon torquatus); B bru (Buarremon brunneinucha); L cas (Lysurus castaneiceps) and A lat (Atlapetes latinuchus). Table 3 Probabilities of gene trees given the 15 possible species trees with RIT of 1:1 and branch lengths of 1.0. Species tree MyO bFib7 ACOI TGF ND2 Average (((A sch,(B bru,B tor)),L cas,A lat) (((B tor,(B bru,A sch)),L cas),A lat) (((B bru,(B tor,A sch)),L cas),A lat) (((L cas,(B tor,B bru)),A sch),A lat) (((B tor,(L cas,B bru)),A sch),A lat) (((B bru,(L cas,B tor)),A sch),A lat) (((L cas,(B bru,A sch)),B tor),A lat) (((A sch,(B bru,L cas)),B tor),A lat) (((B bru,(A sch,L cas)),B tor),A lat) (((L cas,(A sch,B tor)),B bru),A lat) (((A sch,(L cas,B tor)),B bru),A lat) (((B tor,(L cas,A sch)),B bru),A lat) (((L cas,A sch),(B tor,B bru)),A lat) (((L cas,B tor),(A sch,B bru)),A lat) (((L cas,B bru),(A sch,B tor)),A lat) 0.057 0.057 0.408 0.001 0.001 0.001 0.001 0.001 0.001 0.063 0.013 0.013 0.004 0.004 0.054 0.001 0.001 0.001 0.001 0.001 0.001 0.057 0.057 0.408 0.013 0.013 0.063 0.054 0.004 0.004 0.063 0.013 0.013 0.408 0.057 0.057 0.001 0.001 0.001 0.001 0.001 0.001 0.054 0.004 0.004 0.408 0.057 0.057 0.063 0.013 0.013 0.001 0.001 0.001 0.001 0.001 0.001 0.054 0.004 0.004 0.063 0.014 0.014 0.063 0.014 0.014 0.014 0.014 0.063 0.014 0.014 0.063 0.410 0.009 0.009 0.118 0.028 0.098 0.107 0.017 0.017 0.014 0.014 0.095 0.018 0.008 0.028 0.115 0.005 0.015 Bold numbers indicate the highest values of probability. Abbreviations: A sch (Arremon schlegeli); B tor (Buarremon torquatus); B bru (Buarremon brunneinucha); L cas (Lysurus castaneiceps) and A lat (Atlapetes latinuchus). A. schlegeli 0.73 B. brunneinucha 0.82 1 1 B. torquatus L. castaneiceps A. latinuchus 0.004 Fig. 2. Species tree reconstructed with BEST. Posterior probability values on nodes were obtained in analysis with 200,000,000 generations. 302 A. Flórez-Rodríguez et al. / Molecular Phylogenetics and Evolution 58 (2011) 297–303 data consistently recovered Buarremon as a monophyletic group in the present study. Therefore, the mitochondrial DNA topology obtained by Cadena et al. (2007) might be an anomalous gene tree or might incorrectly reﬂect relationships owing to stochastic sorting along short branches deep in the tree (Degnan and Rosenberg, 2006; Degnan et al., 2009), and the true species tree could be one that is consistent with traditional taxonomy based on external phenotypic similarity. However, the relationships among our ingroup taxa varied among different approaches and support was highly variable (see below), demonstrating the difﬁculty of resolving the phylogeny of this group with certainty. An important difference between the design of our study and that of Cadena et al. (2007), in addition to the number of loci used, is the number of individuals sequenced per locus. Cadena et al. (2007) generated sequences of one mitochondrial gene (ND2) for 238 samples, including all the subspecies of B. brunneinucha, 13 of 14 subspecies of B. torquatus, eight individuals of B. virenticeps and one individual for three species of Arremon and two of Lysurus. In addition, they sequenced three mitochondrial loci (cyt b, ATPase 6, ATPase 8) for 43 individuals, and two nuclear loci (ACOI and MUSK) for 22 individuals. Conversely, in our study we analyzed sequences of seven genes for a single individual per species, which we considered justiﬁable considering that all the lineages that they represent were recovered as strongly supported monophyletic groups that have evolved in isolation for a considerable period of time in the earlier study (Cadena et al., 2007). In scenarios such as this one, the number of individuals per taxon included in analyses is not expected to strongly inﬂuence phylogenetic reconstructions (Leaché, 2009). In addition, because all the information required to infer species trees is contained in the pattern of gene lineage coalescence across multiple loci, adding loci rather than individuals is more likely to improve phylogenetic accuracy when reciprocal monophyly has been achieved (Maddison and Knowles, 2006; Knowles, 2010). We note that because individual loci had limited variation, some researchers would advice excluding them from analyses and replacing them with more variable loci. However, the inﬂuence of data quality on the accuracy of species-tree methods remains to be fully examined (Huang et al., 2010), and removing loci with few informative characters from species-tree analyses is not recommended because this could lead to ascertainment bias in the estimation of effective population sizes (Wakeley et al., 2001), a parameter on which methods based on coalescence (e.g. BEST) rely to evaluate the probability of alternative species trees (Knowles, 2010). Our analyses suggest that the density of sampling has an important effect on the conventional reconstruction of gene genealogies: one of the most surprising results from our analyses is that the ND2 topology we recovered (which was weakly supported; Fig. 1b) and the topology reconstructed by Cadena et al. (2007) for this same gene and for a concatenated set of four mitochondrial loci, were incongruent. However, when we analyzed the ﬁve sequences considered in this study together with the ND2 data of Cadena et al. (2007), the results (tree not shown) matched those of the earlier study (i.e. supported the paraphyly of Buarremon) and not those recovered here for these ﬁve individual sequences. An explanation for this result could be that parameter estimation for model-based analyses can be difﬁcult and thus lead to erroneous conclusions with a small sample of sequences (Huelsenbeck et al., 1994; Lewis, 1998). In any event, this unexpected result suggests that limited sampling may introduce bias in analyses, and because ND2 is the most variable locus in our data set, analyses concerning multilocus data are likely to be inﬂuenced strongly by variation in this gene and their results should be interpreted cautiously. Phylogenetics is moving from the individual reconstruction of gene genealogies to a new paradigm where the explicit reconstruc- tion of species trees is a fundamental aim (Edwards et al., 2007; Knowles, 2009). Amid this transition, empirical analyses of real data obtained from non-model organisms, such as those presented in this study, are valuable to highlight the promises of new approaches and to help identify potential pitfalls in their application. In addition to the issues related to number of individuals sampled which we described above, our analyses show that the number of loci suggested as sufﬁcient to resolve species trees with accuracy under some simulation conditions (e.g. eight loci, Edwards et al., 2007) might be insufﬁcient in cases such as the Buarremon and allies radiation (see also Brumﬁeld et al., 2008). Our analyses also demonstrate that the same data can lead to different inferences of evolutionary relationships when analyzed with different methods for species-tree reconstruction. Although all three methods for species-tree reconstruction consistently recovered Buarremon as a monophyletic group (Fig. 2, Tables 2 and 3), relationships among taxa varied among different approaches. Speciﬁcally, MDC recovered a species tree in which Lysurus and Arremon form a clade that is sister to a monophyletic Buarremon, whereas BEST and COAL placed Buarremon as sister to Arremon, with Lysurus sister to the clade formed by these two. We note, however, that differences between these topologies in terms of the number of coalescent events are minor owing to the small number of individuals in the sample (i.e. 6 vs. 7 deep coalescent events). Other studies have reported different results using different methods for species-tree reconstruction on the same data set (McCormack et al., 2009), implying (1) that more work involving simulations and analyses of empirical data is necessary to better understand the sources of discrepancy between alternative approaches that use data in different ways, and (2) that users of methods for species-tree reconstruction should examine the robustness of their inferences across different analytical approaches. In conclusion, although Buarremon was consistently reconstructed as a monophyletic group by our multilocus approach, considerable incongruence was observed across analyses, indicating the difﬁculty of achieving phylogenetic resolution in this group. Adding molecular data in the way of additional sequences from individual loci (Spinks et al., 2009) or from genome-wide surveys of SNP variation generated using high-throughput technologies (cf. Decker et al., 2009) may allow for increased resolution in the future, but the uncertainty existing at present leads us to recommend that the traditional genera Buarremon, Arremon and Lysurus be maintained in an expanded genus Arremon as suggested by Cadena et al. (2007), the monophyly of which seems clear. Acknowledgments This study was ﬁnanced by the Facultad de Ciencias at Universidad de los Andes. Tissue samples were provided by the Instituto Alexander von Humbolt (IAvH) and the Banco de Tejidos of the Museo de Historia Natural, Universidad de los Andes (ANDES-BT). We thank N. Gutiérrez, C. Salazar, E. Valderrama, F. Velásquez, and other members of the Laboratorio de Biología Evolutiva de Vertebrados for assistance in the use of software and discussions at several stages of this project. Comments from J. McCormack and two anonymous reviewers greatly improved this manuscript. AFR would like to especially thank D.R. Rodríguez, L. Florez, and K.P. Florez, who made a special effort to support him. References Brito, P., Edwards, S.V., 2009. Multilocus phylogeography and phylogenetics using sequence-based markers. Genetica 135, 439–455. Brumﬁeld, R.T., Liu, L., Lum, D.E., Edwards, S.V., 2008. Comparison of species tree methods for reconstructing the phylogeny of bearded Manakins (Aves: Pipridae, Manacus) from multilocus sequence data. Syst. Biol. 57, 719–731. A. Flórez-Rodríguez et al. / Molecular Phylogenetics and Evolution 58 (2011) 297–303 Cadena, C.D., Cuervo, A.M., 2010. Molecules, morphology, ecology, and songs in concert: How many species is ‘‘Arremon torquatus’’ (Aves, Emberizidae)? Biol. J. Linn. Soc. 99, 152–176. Cadena, C.D., Klicka, J., Ricklefs, R.E., 2007. Evolutionary differentiation in the Neotropical montane region: molecular phylogenetics and phylogeography of Buarremon brush-ﬁnches (Aves, Emberizidae). Mol. Phylogent. Evol. 44, 993– 1016. Carling, M.D., Brumﬁeld, R.T., 2008. Integrating phylogenetic and population genetic analyses of multiple loci to test species divergence hypothesis in passerina buntings. Genetics 178, 363–377. Carstens, B.C., Knowles, L.L., 2007. Estimating phylogeny from gene tree probabilities in Melanoplus grasshoppers despite incomplete lineage sorting. Syst. Biol. 56, 400–411. Cummings, M.P., 1994. Transmission patterns of eukaryotic transposable elements: arguments for and against horizontal transfer. Trends Ecol. Evol. 9, 141–145. De Queiroz, A., 1993. For consensus (sometimes). Syst. Biol. 42, 368–372. Decker, J.E., Pires, J.C., Conant, G.C., McKay, S.D., Heaton, M.P., Chen, K.C., Cooper, A., Vilkki, J., Seabury, C.M., Caetano, A.R., Johnson, G.S., Brenneman, R.A., Hanotte, O., Eggert, L.S., Wiener, P., Kim, J., Kim, K.S., Sonstegard, T.S., Van Tassell, C.P., Neibergs, H.L., McEwan, J.C., Brauning, R., Coutinho, L.L., Babar, M.E., Wilson, G.A., McClure, M.C., Rolf, M.M., Kim, J.W., Schnabel, R.D., Taylor, J.F., 2009. Resolving the evolution of extant and extinct ruminants with high-throughput phylogenomics. Proc. Natl. Acad. Sci. USA 106, 18644–18649. Degnan, J.H., Rosenberg, N.A., 2006. Discordance of species trees with their most likely gene trees. PLoS Genet. 2, 762–768. Degnan, J.H., Rosenberg, N.A., 2009. Gene tree discordance, phylogenetic inference, and the multispecies coalescent. Trends Ecol. Evol. 24, 332–340. Degnan, J.H., Salter, L.A., 2005. Gene tree distributions under the coalescent process. Evolution 59, 24–37. Degnan, J.H., De Giorgio, M., Bryant, D., Rosenberg, N.A., 2009. Properties of consensus methods for inferring species trees from gene trees. Syst. Biol. 58, 35–54. Drummond, A.J., Ashton, B., Cheung, M., Heled, J., Kearse, M., Moir, R., Stones-Havas, S., Thierer, T., Wilson, A., 2007. Geneious v3.0. <http://www.geneious.com/>. Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797. Edwards, S.V., Liu, L., Pearl, D.K., 2007. High-resolution species tree without concatenation. Proc. Natl. Acad. Sci. USA 104, 5936–5941. Fitch, W.M., 1970. Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113. Gadagkar, S., Rosenberg, S., Kumar, S., 2005. Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. J. Exp. Zool. 304B, 64–74. Hird, S., Kubatko, L.S., Carstens, B.C., 2010. Rapid and accurate species tree estimation for phylogeographic investigations using replicated subsampling. Mol. Phylogent. Evol. 57, 888–898. Huang, H., He, Q., Kubatko, L.S., Knowles, L.L., 2010. Sources of error inherent in species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods. Syst. Biol. 59, 573–583. Huelsenbeck, J.P., Swofford, D.L., Cunningham, C.W., Bull, J.J., Waddell, P., 1994. Is character weighting a panacea for the problem of data heterogeneity in phylogenetic analysis? Syst. Biol. 43, 288–291. Jennings, W.B., Edwards, S.V., 2005. Speciational history of Australian grass ﬁnches (Poephila) inferred from 30 gene trees. Evolution 59, 2033–2047. Kimball, R.T., Braun, E.L., Barker, F.K., Bowie, R.C.K., Braun, M.J., Chojnowski, J.L., Hackett, S.J., Han, K.L., Harshman, J., Heimer-Torres, V., Holznagel, W., Huddleston, C.J., Marks, B.D., Miglia, K.J., Moore, W.S., Reddy, S., Sheldon, F.H., Smith, J.V., Witt, C.C., Yuri, T., 2009. A well-tested set of primers to amplify regions spread across the avian genome. Mol. Phylogent. Evol. 50, 654–660. Knowles, L.L., 2009. Estimating species trees: methods of phylogenetic analysis when there is incongruence across genes. Syst. Biol. 58, 463–467. Knowles, L.L., 2010. Sampling strategies for species-tree estimation. In: Knowles, L.L., Kubatko, L.S. (Eds.), Estimating Species Trees: Practical and Theoretical Aspects. Wiley-Blackwell, pp. 163–173. Knowles, L.L., Kubatko, L.S., 2010. Estimating species trees: practical and theoretical aspects. Wiley, Blackwell. Leaché, A.D., 2009. Species tree discordance traces to phylogeographic clade boundaries in North American fence lizards (Sceloporus). Syst. Biol. 58, 547–559. 303 Lewis, P.O., 1998. Maximum likelihood as an alternative to parsimony for inferring phylogeny using nucleotide sequence data. In: Soltis, D.E., Soltis, P.S., Doyle, J.J. (Eds.), Molecular Systematics of Plants II. Kluwer, Boston, pp. 132–163. Linnen, C., Farrell, B., 2008. Comparison of methods for species-tree inference in the sawﬂy genus neodiprion (hymenoptera: Diprionidae). Syst. Biol. 57, 876–890. Liu, L., Edwards, S.V., 2009. Phylogenetic analysis in the anomaly zone. Syst. Biol. 58, 452–460. Liu, L., Pearl, D., 2007. Species tree from gene trees: reconstructing bayesian posterior distribution of species phylogeny using estimated gene tree distributions. Syst. Biol. 56, 504–514. Liu, L., Pearl, D., Brumﬁeld, R., Edwards, S.V., 2008. Estimating species trees using multiple-allele DNA sequence data. Evolution 62, 2080–2091. Maddison, W.P., 1997. Gene trees in species trees. Syst. Biol. 46, 523–536. Maddison, W.P., Knowles, L.L., 2006. Inferring phylogeny despite incomplete lineage sorting. Syst. Biol. 55, 21–30. Maddison, W.P., Maddison, D.R., 2004. Mesquite: A Modular System for Evolutionary Analysis, Version 1.01. <http://mesquiteproject.org>. McCormack, J.E., Huang, H., Knowles, L.L., 2009. Maximum-likelihood estimates of species trees: how accuracy of phylogenetic inference depends upon the divergence history and sampling design. Syst. Biol. 58, 501–508. Meng, C., Kubatko, L.S., 2009. Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. Theor. Pop. Biol. 75, 35–45. Nichols, R., 2001. Gene trees and species trees are not the same. Trends Ecol. Evol. 16, 358–364. Nylander, J.A.A., 2004. MrModeltest v2. Program Distributed by the Author. Evolutionary Biology Centre, Uppsala University. Pamilo, P., Nei, M., 1988. Relationships between gene trees and species trees. Mol. Biol. Evol. 5, 568–583. Philippe, H., Derelle, R., Lopez, P., Pick, K., Borchiellini, C., Boury-Esnault, N., Vacelet, J., Renard, E., Houliston, E., Queinnec, E., Da Silva, C., Winicker, P., Le Guyader, H., Leys, S., Jackson, D.J., Schrieber, F., Erpenbeck, D., Morgenstern, B., Worheide, G., Manuel, M., 2009. Phylogenomics revives traditional views on deep animal relationships. Curr. Biol. 19, 706–712. Piganeau, G.V., Gardner, M., Eyre-Walker, A., 2004. A broad survey of recombination in animal mitochondrial. Mol. Biol. Evol. 21, 2319–2325. Posada, D., Crandall, K.A., 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14, 817–818. Rambaut, A., Drummond, A., 2007. Tracer v1.4 2003–2008. Mcmc Trace. Remsen Jr., J.V., Cadena, C.D., Jaramillo, A., Nores, M., Pacheco, J.F., Robbins, M.B., Schulenberg, T.S., Stiles, F.G., Stotz, D.F., Zimmer, K.J., 2010. A Classiﬁcation of the Bird Species of South America. American Ornithologists’ Union. <http:// www.museum.lsu.edu/~Remsen/SACCBaseline.html>. Rokas, A., King, N., Finnerty, J., Carroll, S.B., 2003. Conﬂicting phylogenetic signals at the base of the metazoan tree. Evol. Dev. 5, 346–359. Ronquist, F., Huelsenbeck, J.P., 2003. MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574. Slade, R.W., Moritz, C., Heideman, A., 1993. Rapid assessment of single-copy nuclear DNA variation in diverse species. Mol. Ecol. 2, 359–373. Sorenson, M.D., Ast, J.C., DimcheV, D.E., Yuri, T., Mindell, D.P., 1999. Primers for a PCR-based approach to mitochondrial genome sequencing in birds and other vertebrates. Mol. Phylogenet. Evol. 12, 105–114. Spinks, P.Q., Thomson, R.C., Lovely, G.A., Shaffer, H.B., 2009. Assessing what is needed to resolve a molecular phylogeny: simulations and empirical data from emydid turtles. BMC Evol. Biol. 9, 56. Swofford, D.L., 2002. PAUP. Phylogenetic Analysis Using Parsimony ( and Other Methods), Version 4. Sinauer Associates, Sunderland, MA. Takahata, N., 1989. Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122, 957–966. Wakeley, J., Nielsen, R., Liu-Cordero, S.N., Ardlie, K., 2001. The discovery of singlenucleotide polymorphisms and inferences about human demographic history. Am. J. Hum. Genet. 69, 1332–1347. Waters, J.M., Rowe, D.L., Burridge, C.P., Wallis, G.P., 2010. Gene trees versus species trees: reassessing life-history evolution in a freshwater ﬁsh radiation. Syst. Biol. 59, 504–517. Wilgenbusch, J.C., Warren, D.L., Swofford, D.L., 2004. AWTY: A System for Graphical Exploration of MCMC Convergence in Bayesian Phylogenetic Inference. <http:// ceb.csit.fsu.edu/awt>.
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project