Reconstructing the phylogeny of ‘‘Buarremon’’ brush-finches and near relatives

Reconstructing the phylogeny of ‘‘Buarremon’’ brush-finches and near relatives
Molecular Phylogenetics and Evolution 58 (2011) 297–303
Contents lists available at ScienceDirect
Molecular Phylogenetics and Evolution
journal homepage: www.elsevier.com/locate/ympev
Reconstructing the phylogeny of ‘‘Buarremon’’ brush-finches and near relatives
(Aves, Emberizidae) from individual gene trees
Alexander Flórez-Rodríguez a,⇑, Matthew D. Carling b, Carlos Daniel Cadena a
a
b
Laboratorio de Biología Evolutiva de Vertebrados, Departamento de Ciencias Biológicas, Universidad de los Andes, Apartado 4976, Bogotá, Colombia
Berry Biodiversity Conservation Center, Department of Zoology and Physiology, University of Wyoming, Laramie, WY 82071, USA
a r t i c l e
i n f o
Article history:
Received 26 July 2010
Revised 2 November 2010
Accepted 15 November 2010
Available online 25 November 2010
Keywords:
Anomalous genes trees
BEST
COAL
Minimizing deep coalescences
Species tree
Topological incongruence
a b s t r a c t
Gene trees are often assumed to be equivalent to species trees, but processes such as incomplete lineage
sorting can generate incongruence among gene topologies and analyzing multilocus data in concatenated
matrices can be prone to systematic errors. Accordingly, a variety of new methods have been developed to
estimate species trees using multilocus data sets. Here, we apply some of these methods to reconstruct the
phylogeny of Buarremon and near relatives, a group in which phylogenetic analyses of mitochondrial DNA
sequences produced results that were inconsistent with relationships implied by a taxonomy based on
variation in external phenotype. Gene genealogies obtained for seven loci (one mitochondrial, six nuclear)
were varied, with some supporting and some rejecting the monophyly of Buarremon. Overall, our speciestree analyses tended to support a monophyletic Buarremon, but due to lack of congruence between methodologies, resolution of the phylogeny of this group remains uncertain. More generally, our study indicates
that the number of individuals sampled can have an important effect on phylogenetic reconstruction, that
the use of seven markers does not guarantee obtaining a strongly-supported species tree, and that
methods for species-tree reconstruction can produce different results using the same data; these are
important considerations for researchers using these new phylogenetic approaches in other systems.
Ó 2010 Elsevier Inc. All rights reserved.
1. Introduction
The accurate reconstruction of phylogenetic relationships
among species using molecular data can be complicated for a variety of reasons (Carstens and Knowles, 2007; Brumfield et al., 2008;
Degnan and Rosenberg, 2009). One obstacle is the stochastic sorting of ancestral polymorphisms following species divergence at
deep or shallow levels of the phylogeny, which can result in discordant topologies between gene and species trees (Pamilo and Nei,
1988; Takahata, 1989; Maddison and Knowles, 2006; McCormack
et al., 2009). Furthermore, if reproductive isolation between taxa
is not complete, then gene flow can cause incongruent topologies
across genes (Meng and Kubatko, 2009). In addition, processes of
gene duplication (Fitch, 1970) and horizontal transfer (Cummings,
1994) can also complicate the traditional assumption that gene
trees always reflect species trees (Nichols, 2001). Some of these
problems are particularly acute when phylogenies are reconstructed from single-locus datasets (e.g. mitochondrial DNA;
Jennings and Edwards, 2005).
The ability to obtain sequence data from multiple loci across
taxa is one of the major recent breakthroughs in molecular system⇑ Corresponding author. Fax: +57(1) 33934949x2817.
E-mail addresses: al-fl[email protected] (A. Flórez-Rodríguez), [email protected]
cornell.edu (M.D. Carling), [email protected] (C.D. Cadena).
1055-7903/$ - see front matter Ó 2010 Elsevier Inc. All rights reserved.
doi:10.1016/j.ympev.2010.11.012
atics, and has brought with it new opportunities and challenges for
phylogeny reconstruction (Brito and Edwards, 2009; Knowles,
2010; Knowles and Kubatko, 2010). To date, the most common approach used in multilocus phylogenetics is concatenation of data
using supermatrices, an approach assuming that all gene trees
have the same topology (Rokas et al., 2003; Philippe et al., 2009).
However, this approach might be positively misleading because
of the existence of anomalous gene trees (i.e. gene trees that are
more likely than the tree matching the species tree; Degnan and
Rosenberg, 2006; Liu and Edwards, 2009). In addition, obtaining
well-supported trees consistent with the true phylogeny using
concatenated data might require a large number of loci in comparison to other, novel methods for species-tree reconstruction (Edwards et al., 2007; see below). Another often employed approach
for phylogeny reconstruction from multilocus data is to construct
consensus trees based on genealogies obtained independently for
each locus (De Queiroz, 1993), but this requires a greater number
of genes than concatenation to obtain a similarly supported tree
(Gadagkar et al., 2005) and is also prone to be positively misleading
as the number of genes increases when anomalous gene trees exist
(Degnan et al., 2009). Owing to these limitations, developing alternatives to concatenation and consensus methods for the reconstruction of robust species trees has become an important priority.
Novel analytical tools have allowed a movement towards multilocus methodologies for phylogenetic reconstruction more robust
298
A. Flórez-Rodríguez et al. / Molecular Phylogenetics and Evolution 58 (2011) 297–303
than concatenation or consensus methods (Jennings and Edwards,
2005; Carling and Brumfield, 2008; Degnan and Rosenberg, 2009;
Knowles, 2009, 2010). For example, one can attempt to reconcile
gene trees contained within species trees by minimizing the number of deep coalescences, i.e., the coalescence of two gene copies
that predates a particular speciation event (Maddison, 1997;
Maddison and Knowles, 2006; Leaché, 2009). Alternatively, the
BEST (Bayesian Estimation of Species Trees) method estimates
the joint posterior distribution of gene trees for each locus and uses
the resulting joint posterior distribution of gene trees to approximate the Bayesian posterior distribution of the species tree based
on coalescent theory (Liu and Pearl, 2007; Edwards et al., 2007).
Yet another alternative is to calculate the probability of a genealogy given a species tree under the coalescent (Degnan and Salter,
2005). Despite these developments, as the field of multilocus phylogenetics is still maturing, studies using species-tree approaches
with empirical data are scarce (Brumfield et al., 2008; Liu et al.,
2008; Linnen and Farrell, 2008; Hird et al., 2010; Waters et al.,
2010). In this study, we use three methods of species-tree reconstruction based on multilocus data to revisit an empirical phylogenetic question of interest in systematic ornithology. In so doing, we
come across some practical issues related to the effect of taxon
sampling and the variation in results among methods that should
be of interest to developers and users of such methods.
The genus Buarremon (Aves, Emberizidae), as traditionally defined, consists of three morphologically similar species: Buarremon
torquatus, Buarremon brunneinucha, and Buarremon virenticeps.
However, a recent study rejected the monophyly of the genus.
Based on analyses involving sequences of four mitochondrial genes
(ND2, cyt b, ATPase 6, ATPase 8) and two nuclear introns (ACOI and
MUSK), the clade formed by representatives of multiple populations of B. torquatus was recovered as sister to the monophyletic
genus Arremon, whereas B. brunneinucha and B. virenticeps formed
a clade that was sister to a clade formed by species in the genus
Lysurus albeit with low support (Cadena et al., 2007; Fig. 1a). This
result was unexpected considering the overall phenotypic similarity of Buarremon taxa, but it led to the merging of the three genera
in an expanded genus Arremon (Remsen et al., 2010). However, the
results of this study were supported mainly by mitochondrial DNA
sequences and the deep internodes of the mitochondrial topologies
were notably shorter than the terminal branches. The existence of
short internal branches can lead to retention of ancestral polymorphisms, representing one of the most difficult scenarios for inferring phylogenies from single-locus data sets owing the high
stochasticity in gene sorting. Under such scenarios, mitochondrial
DNA can reveal trees with good nodal support that are incongruent
with the species tree (Carling and Brumfield, 2008; Leaché, 2009,
McCormack et al., 2009). In addition, the mitochondrial DNA topology was at least partly inconsistent with the topologies of two nuclear introns (Cadena et al., 2007). Thus, in this study we revisit the
relationships of Buarremon and related genera by reconstructing
the species tree based on phylogenetic analyses of sequences from
multiple loci. We discuss the implications of our results in relation
to challenges in sampling design and in the use of different methods, which might be common to other studies seeking to reconstruct species trees using multilocus sequence data.
2. Materials and methods
2.1. Sampling, PCR amplification and sequencing
We obtained frozen tissue samples from the collections of Instituto Alexander von Humbolt (IAvH) and the Banco de Tejidos of the
Museo de Historia Natural, Universidad de los Andes (ANDES-BT)
for a single individual of four focal taxa (B. torquatus (IAvH-
1145), B. brunneinucha (ANDES-BT-0120), Arremon schelegeli (ANDES-BT-0016) and Lysurus castaneiceps (IAvH-CT-825)) and one
outgroup (Atlapetes latinuchus (ANDES-BT-0130), a valid strategy
considering these species are strongly supported monophyletic
groups (Cadena et al., 2007). Note we did not include B. virenticeps
in our study because this species was nested with strong support
within a clade formed by populations of B. brunneinucha in phylogenetic analyses of mitochondrial and nuclear DNA sequences
(Cadena et al., 2007). Therefore, the inclusion of this taxon (and
of populations of B. torquatus that likely merit species status;
Cadena and Cuervo, 2010) was not necessary to address the monophyly of Buarremon. The important question in this regard is
whether the brunneinucha–virenticeps clade and the torquatus
clade form a monophyletic group to the exclusion of the Lysurus
and Arremon clades.
Total DNA was extracted from all samples using a DNeasy tissue
kit (QIAGEN, Valencia, CA), following the manufacturer’s protocol.
Subsequently, we amplified six nuclear introns (four autosomal,
two z-linked) and one mitochondrial protein-coding gene (Table 1)
using primers published by Slade et al. (1993), Sorenson et al.
(1999), and Kimball et al. (2009). The concentrations and conditions used for PCR were those described by Cadena et al. (2007).
Amplicons were cleaned using Exosap IT (USB corporation, Cleveland, Ohio) and then sequenced in both directions. Resulting chromatographs were assembled in Geneious Basic 4.02. (Drummond
et al., 2007). In cases where double peaks of equal height were detected in the sequence, the site was considered ambiguous (i.e. we
did not attempt to phase haplotypes because sites with double
peaks were scarce and because we had data for a single individual
per species, which impeded haplotype estimations).
2.2. Alignment and conventional phylogenetic analyses
Sequences were aligned using the MUSCLE algorithm (Edgar,
2004) implemented in Geneious (Drummond et al., 2007) and edited manually. Intralocus recombination was tested using the program RecombiTEST (Piganeau et al., 2004). Genealogies were
reconstructed individually for each locus using maximum likelihood (ML) and Bayesian inference (BI) methods, and we also conducted analyses using a concatenated matrix that included
sequences of all seven genes and a partitioned matrix specifying
a substitution model for each of the seven loci. For each analysis,
we implemented the model of nucleotide substitution selected as
the best-fit to the data (Table 1) based on the Akaike Information
Criterion using ModelTest 3.7 for ML (Posada and Crandall, 1998)
and MrModelTest 2.3 for BI (Nylander, 2004). Branch-and-bound
searches were conducted using PAUP 4.0b10 (Swofford, 2002)
for ML analyses; nodal support was assessed with 1000 ML heuristic bootstrap replicates with tree bisection-reconnection (TBR)
branch swapping. Bayesian analyses were conducted in MrBayes
3.1.2 (Ronquist and Huelsenbeck, 2003) and consisted of two runs
of four MCMC chains for 15,000,000 generations sampled every
1000 generations; the first 25% of the trees sampled was discarded
as burn-in.
We used the program Tracer v.1.4.1 (Rambaut and Drummond,
2007) to evaluate sampling of the tree and parameter space in
Bayesian analyses. Because plots of number of generations vs. likelihood showed stabilization, effective sample sizes for all parameters was always greater than 200, and the average standard
deviation of split frequencies across runs was less than 0.002 in
all the analyses, chains likely sampled the posterior distributions
adequately. To assess convergence of MCMC runs, we plotted posterior probabilities of clades as a function of generation number
and compared results of different runs by plotting the posterior
probabilities of all splits for paired runs using AWTY (Wilgenbusch
et al., 2004).
299
A. Flórez-Rodríguez et al. / Molecular Phylogenetics and Evolution 58 (2011) 297–303
(a) Cadena et al. 2007
78/21
(c) β-Fib 5
(b) ND2
A. schlegeli
A. schlegeli
B. torquatus
0.85/69
4/4
138-8/6-2
2/2
L. castaneiceps
Arremon
B. brunneinucha B. virenticeps
Lysurus
L. castaneiceps
0.77/-
B. brunneinucha
B. torquatus
0.49/-
B. torquatus
B. brunneinucha
A. latinuchus
A. latinuchus
Outgroup
(d) MyO
( f ) MUSK
(e) β-Fib 7
B. torquatus
L. castaneiceps
B. brunneinucha
A. schlegeli
A. schlegeli
A. schlegeli
1.00/89
0.98/64
1.00/85
L. castaneiceps
B. torquatus
B. torquatus
0.64/-
0.96/57
B. brunneinucha
B. brunneinucha
A. latinuchus
L. castaneiceps
A. latinuchus
A. latinuchus
(h) ACOI
(g) TGF
(i) Concatenated
B. brunneinucha
L. castaneiceps
L. castaneiceps
1.00/93
B. torquatus
A. schlegeli
A. schlegeli
1.00/100
0.76/63
B. brunneinucha
A. schlegeli
B. torquatus
1.00/99
0.75/69
0.87/63
B. torquatus
L. castaneiceps
B. brunneinucha
A. latinuchus
A. latinuchus
A. latinuchus
(j) Partitioned
B. brunneinucha
1.00/93
1.00/-
B. torquatus
A. schlegeli
L. castaneiceps
A. latinuchus
Fig. 1. Gene tree topologies obtained for each locus and for the analyses of the concatenated and partitioned matrix. Fig. 1a is a schematic topology of the results obtained for
Cadena et al. (2007), values within the triangles are indicating the sampling for: ND2/Concatenated matrix of four mitochondrial DNA. From 1b–j, value s in the nodes
correspond to posterior probability of IB/ML bootstrap.
Table 1
Summary of sequence variation and substitution models determined according to the AIC for different loci employed in phylogenetic analyses.
Locus
Length (pb)
Number of parsimony
informative characters
Substitution modela
Substitution modelb
Base frequencyc
bFib5
MyO
bFib7
MUSK
ACOI
TGF
ND2
Conc
601
740
998
563
992
546
1026
5456
0
1
4
7
2
12
99
125
TVM + I
HKY
TVM
TVM
TrN
GTR
TrN + I
TVM + I
GTR + I
HKY
GTR
GTR
HKY
GTR
HKY + G
GTR + I
0.2966,
0.2686,
0.3128,
0.2792,
0.2587,
0.2419,
0.3004,
0.2801,
Conc = Concatenated sequences of all seven loci.
a
Substitution model in ModelTest.
b
Substitution model in MrModelTest.
c
A,C,G,T.
0.1779,
0.2374,
0.1829,
0.1712,
0.1735,
0.2216,
0.3727,
0.2222,
Accession number
0.211, 0.3144
0.2191, 0.274
0.1657, 0.338
0.2221, 0.327
0.208, 0.3598
0.2269, 0.309
0.0984, 0.228
0.187, 0.3107
HQ537396-400
HQ537411-415
HQ537401-405
HQ537406-410
HQ537391-395
HQ537421-425
HQ537416-420
300
A. Flórez-Rodríguez et al. / Molecular Phylogenetics and Evolution 58 (2011) 297–303
2.3. Minimizing deep coalescence
We inferred a species tree by reconciling each of the individual
gene trees obtained from ML and BI analyses described above with
the 15 possible rooted evolutionary histories for the four ingroup
taxa by minimizing deep coalescence events using the program
Mesquite (Maddison and Maddison, 2004; Maddison and Knowles,
2006). The total cost of each species tree was estimated summing
the number of deep coalescent events over the seven genealogies;
the selected species tree was the topology with the lowest score of
coalescent steps summed across all genealogies.
2.4. Probabilities of gene trees given the species tree
We calculated the probability of each gene genealogy given all
possible species trees - 15 species trees for 4 taxa – using COAL
(Degnan and Salter, 2005). For this analysis, we considered five different internal:terminal branch length ratios (RIT) for the species
trees: (1) RIT 1:1 with length (coalescence units) = 1.0, (2) RIT
1:1 with length = 0.5, (3) RIT 1:1 with length = 0.2, (4) RIT 1:100
where branch 1 has length 0.01 and other branches have length
1.0, and (5) RIT 1:100 where branch n 2 (n is the number of taxa)
has length 0.01 and other branches have length 1.0. We chose
these options available in COAL to represent different evolutionary
scenarios with internal and terminal branches of equal length (1–
3), short internal branches (4) and short terminal branches (5). Because COAL requires fully resolved bifurcating topologies and the
forced resolution of polytomies can bias results (Carling and Brumfield, 2008), genes resulting in unresolved topologies (MyO and bFib 5, see below) were not included in this analysis.
2.5. Bayesian estimation of species tree
The joint posterior distribution of gene trees and the species
tree was estimated in BEST 1.6 (Liu et al., 2008). For our analysis,
we employed commonly-used flat priors recommended by the
authors (Liu et al., 2008), which were applied in an earlier phylogenetic study of birds (Brumfield et al., 2008): inverse gamma distribution with a = 3 and b = 0.003, and a uniform distribution with
bounded values of 0.5 and 1.5, for the prior distribution of population size and mutation rate, respectively. Three runs of two MCMC
chains were performed for 200,000,000 generations with a sampling frequency of 20,000; the first 25% of trees was discarded as
burn-in. Chain stationarity was assessed using Tracer v.1.4. based
on plots of number of generations vs. likelihood and estimates of
effective sample sizes for parameters, and convergence across runs
was examined using AWTY as described above. We did not observe
changes in posterior probabilities after 20 million generations and
results of independent runs were remarkably similar to each other,
indicating convergence.
3. Results
3.1. Conventional phylogenetic analyses
No recombination events were detected for any of the seven
loci. All six nuclear loci exhibited few informative characters
(Table 1), which led to unresolved gene trees in some cases (e.g.
b-Fib 5 and MyO; Fig. 1c and d). However, several gene trees recovered clades with good branch support (Bayesian posterior probabilities >0.95 and maximum likelihood bootstrap values >85%).
Some of these suggested paraphyly of Buarrremon (b-Fib 7 and
MUSK; Fig. 1e and f) and others supported its monophyly (TGF,
ACOI; Fig. 1g and h). The mitochondrial locus (ND2) also supported
monophyly of Buarremon (Fig. 1b), a result contrary to that ob-
tained by Cadena et al. (2007; Fig. 1a) for the same locus with a
more extensive sample of individuals, but this result was not
strongly supported in our analyses (see below). Some of the loci
that supported the monophyly of Buarremon did so in different
ways (e.g. Lysurus was found as sister to Buarremon in the ACOI
tree, whereas TGF placed Arremon and Buarremon as sister groups)
and with varying support (0.87–1.00 posterior probability and 63–
99% ML bootstrap). Likewise, gene genealogies that recovered a
paraphyletic Buarremon (e.g. b-Fib 7 and MUSK) differed topologically. Despite the variation across loci in gene-tree topologies, a
partition homogeneity test did not reveal significant inconsistency
in phylogenetic signal across loci. The concatenated and partitioned analyses using data from all loci showed strong support
for the monophyly of Buarremon, but recovered different relationships of Buarremon with Lysurus and Arremon (Fig. 1i and j).
3.2. Species tree analysis
Buarremon was consistently recovered as a monophyletic group
using the three methodologies for species-tree reconstruction. In
particular, the species tree obtained by minimizing deep coalescences was the topology that placed a clade formed by L. castaneiceps and A. schlegeli as sister to a monophyletic Buarremon (six
deep coalescence events), followed by the topology that placed A.
schlegeli sister to a monophyletic Buarremon (seven deep coalescence events; Table 2). In contrast, the topology showing Buarremon as a paraphyletic group (Cadena et al., 2007) required 11
deep coalescence events. The topology with A. schlegeli sister to a
monophyletic Buarremon was the species tree with the highest
average probability of genealogies according to all replicates with
different RITs in COAL (the highest value was recovered with RITs
of 1:1 and 1.0; Table 3). This topology was also the species tree
reconstructed using the Bayesian analysis in BEST, which supported the monophyly of Buarremon with a posterior probability
of 0.82 (Fig. 2).
4. Discussion
The monophyly of Buarremon based on traditional taxonomy
was brought into question by Cadena et al. (2007) based on phylogenetic analyses of sequences of four mitochondrial loci. This result
was partly inconsistent with one of the nuclear gene trees reported
by them, but not with another. Based on sequence data from additional loci collected from a small number of individuals, we here
found that other nuclear genes recovered different topologies from
those reported by these authors, some of which had good support.
Because there is only one species history, the variation in gene
genealogies demonstrates that in Buarremon and near relatives,
phylogenetic reconstructions based on single-locus data sets can
result in erroneous inferences of evolutionary relationships.
Incongruence between gene-tree topologies is often an indication of incomplete lineage sorting resulting from rapid radiations,
a pattern frequently seen in phylogenies where internal branches
are short in comparison to terminal branches (Degnan and Rosenberg, 2009). However, in the case of Buarremon and allies, the results of COAL show the highest average probability for a topology
lacking short internodes (i.e. RIT of 1:1 and branch lengths of
1.0; Table 3). This suggests that a possible source of incongruence
between gene genealogies in our study system could be the existence of populations with large effective sizes, in which ancestral
polymorphism was retained despite relatively long intervals between speciation events.
In contrast to results of the earlier study based on conventional
phylogenetic analyses of a few genes (Cadena et al., 2007), three
different methods for species-tree reconstruction from multilocus
301
A. Flórez-Rodríguez et al. / Molecular Phylogenetics and Evolution 58 (2011) 297–303
Table 2
Coalescence cost required to reconcile gene-tree topologies with the fifteen possible species trees.
Species tree
Genealogies
MyO
bFib5
(((A sch,(B bru,B tor)),L cas,A lat)
(((B tor,(B bru,A sch)),L cas),A lat)
(((B bru,(B tor,A sch)),L cas),A lat)
(((L cas,(B tor,B bru)),A sch),A lat)
(((B tor,(L cas,B bru)),A sch),A lat)
(((B bru,(L cas,B tor)),A sch),A lat)
(((L cas,(B bru,A sch)),B tor),A lat)
(((A sch,(B bru,L cas)),B tor),A lat)
(((B bru,(A sch,L cas)),B tor),A lat)
(((L cas,(A sch,B tor)),B bru),A lat)
(((A sch,(L cas,B tor)),B bru),A lat)
(((B tor,(L cas,A sch)),B bru),A lat)
(((L cas,A sch),(B tor,B bru)),A lat)
(((L cas,B tor),(A sch,B bru)),A lat)
(((L cas,B bru),(A sch,B tor)),A lat)
MUSK
bFIb7
ACOI
TGF
ND2
Total score
ML
BI
ML
BI
ML
BI
ML
BI
ML
BI
ML
BI
ML
BI
ML
BI
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
2
2
1
2
2
1
2
2
1
1
1
0
0
0
1
2
2
1
2
2
1
2
2
1
1
1
3
3
3
3
3
3
1
1
0
2
2
1
1
2
2
3
3
3
3
3
3
1
1
0
2
2
1
1
2
2
2
2
1
3
3
3
3
3
3
0
1
1
2
2
1
2
2
1
3
3
3
3
3
3
0
1
1
2
2
1
1
2
2
0
1
1
3
3
3
3
3
3
1
2
2
1
2
2
0
1
1
3
3
3
3
3
3
1
2
2
0
1
1
1
2
2
3
3
3
3
3
3
1
2
2
0
1
1
1
2
2
3
3
3
3
3
3
1
2
2
1
2
2
1
2
2
2
2
1
2
2
1
0
2
2
1
2
2
1
2
2
1
1
0
1
1
0
0
2
2
7
10
9
9
13
13
13
14
12
11
13
11
6
11
10
7
10
9
9
13
13
12
13
11
10
12
10
6
11
10
Bold numbers indicate the lowest values of coalescence cost for each genealogy.
Total is the sum across genealogies for each species tree.
Abbreviations: A sch (Arremon schlegeli); B tor (Buarremon torquatus); B bru (Buarremon brunneinucha); L cas (Lysurus castaneiceps) and A lat (Atlapetes latinuchus).
Table 3
Probabilities of gene trees given the 15 possible species trees with RIT of 1:1 and branch lengths of 1.0.
Species tree
MyO
bFib7
ACOI
TGF
ND2
Average
(((A sch,(B bru,B tor)),L cas,A lat)
(((B tor,(B bru,A sch)),L cas),A lat)
(((B bru,(B tor,A sch)),L cas),A lat)
(((L cas,(B tor,B bru)),A sch),A lat)
(((B tor,(L cas,B bru)),A sch),A lat)
(((B bru,(L cas,B tor)),A sch),A lat)
(((L cas,(B bru,A sch)),B tor),A lat)
(((A sch,(B bru,L cas)),B tor),A lat)
(((B bru,(A sch,L cas)),B tor),A lat)
(((L cas,(A sch,B tor)),B bru),A lat)
(((A sch,(L cas,B tor)),B bru),A lat)
(((B tor,(L cas,A sch)),B bru),A lat)
(((L cas,A sch),(B tor,B bru)),A lat)
(((L cas,B tor),(A sch,B bru)),A lat)
(((L cas,B bru),(A sch,B tor)),A lat)
0.057
0.057
0.408
0.001
0.001
0.001
0.001
0.001
0.001
0.063
0.013
0.013
0.004
0.004
0.054
0.001
0.001
0.001
0.001
0.001
0.001
0.057
0.057
0.408
0.013
0.013
0.063
0.054
0.004
0.004
0.063
0.013
0.013
0.408
0.057
0.057
0.001
0.001
0.001
0.001
0.001
0.001
0.054
0.004
0.004
0.408
0.057
0.057
0.063
0.013
0.013
0.001
0.001
0.001
0.001
0.001
0.001
0.054
0.004
0.004
0.063
0.014
0.014
0.063
0.014
0.014
0.014
0.014
0.063
0.014
0.014
0.063
0.410
0.009
0.009
0.118
0.028
0.098
0.107
0.017
0.017
0.014
0.014
0.095
0.018
0.008
0.028
0.115
0.005
0.015
Bold numbers indicate the highest values of probability.
Abbreviations: A sch (Arremon schlegeli); B tor (Buarremon torquatus); B bru (Buarremon brunneinucha); L cas (Lysurus castaneiceps) and A lat (Atlapetes latinuchus).
A. schlegeli
0.73
B. brunneinucha
0.82
1
1
B. torquatus
L. castaneiceps
A. latinuchus
0.004
Fig. 2. Species tree reconstructed with BEST. Posterior probability values on nodes were obtained in analysis with 200,000,000 generations.
302
A. Flórez-Rodríguez et al. / Molecular Phylogenetics and Evolution 58 (2011) 297–303
data consistently recovered Buarremon as a monophyletic group in
the present study. Therefore, the mitochondrial DNA topology obtained by Cadena et al. (2007) might be an anomalous gene tree or
might incorrectly reflect relationships owing to stochastic sorting
along short branches deep in the tree (Degnan and Rosenberg,
2006; Degnan et al., 2009), and the true species tree could be
one that is consistent with traditional taxonomy based on external
phenotypic similarity. However, the relationships among our ingroup taxa varied among different approaches and support was
highly variable (see below), demonstrating the difficulty of resolving the phylogeny of this group with certainty.
An important difference between the design of our study and
that of Cadena et al. (2007), in addition to the number of loci used,
is the number of individuals sequenced per locus. Cadena et al.
(2007) generated sequences of one mitochondrial gene (ND2) for
238 samples, including all the subspecies of B. brunneinucha, 13
of 14 subspecies of B. torquatus, eight individuals of B. virenticeps
and one individual for three species of Arremon and two of Lysurus.
In addition, they sequenced three mitochondrial loci (cyt b, ATPase
6, ATPase 8) for 43 individuals, and two nuclear loci (ACOI and
MUSK) for 22 individuals. Conversely, in our study we analyzed sequences of seven genes for a single individual per species, which
we considered justifiable considering that all the lineages that they
represent were recovered as strongly supported monophyletic
groups that have evolved in isolation for a considerable period of
time in the earlier study (Cadena et al., 2007). In scenarios such
as this one, the number of individuals per taxon included in analyses is not expected to strongly influence phylogenetic reconstructions (Leaché, 2009). In addition, because all the information
required to infer species trees is contained in the pattern of gene
lineage coalescence across multiple loci, adding loci rather than
individuals is more likely to improve phylogenetic accuracy when
reciprocal monophyly has been achieved (Maddison and Knowles,
2006; Knowles, 2010). We note that because individual loci had
limited variation, some researchers would advice excluding them
from analyses and replacing them with more variable loci. However, the influence of data quality on the accuracy of species-tree
methods remains to be fully examined (Huang et al., 2010), and
removing loci with few informative characters from species-tree
analyses is not recommended because this could lead to ascertainment bias in the estimation of effective population sizes (Wakeley
et al., 2001), a parameter on which methods based on coalescence
(e.g. BEST) rely to evaluate the probability of alternative species
trees (Knowles, 2010).
Our analyses suggest that the density of sampling has an important effect on the conventional reconstruction of gene genealogies:
one of the most surprising results from our analyses is that the
ND2 topology we recovered (which was weakly supported;
Fig. 1b) and the topology reconstructed by Cadena et al. (2007)
for this same gene and for a concatenated set of four mitochondrial
loci, were incongruent. However, when we analyzed the five sequences considered in this study together with the ND2 data of
Cadena et al. (2007), the results (tree not shown) matched those
of the earlier study (i.e. supported the paraphyly of Buarremon)
and not those recovered here for these five individual sequences.
An explanation for this result could be that parameter estimation
for model-based analyses can be difficult and thus lead to erroneous conclusions with a small sample of sequences (Huelsenbeck
et al., 1994; Lewis, 1998). In any event, this unexpected result suggests that limited sampling may introduce bias in analyses, and because ND2 is the most variable locus in our data set, analyses
concerning multilocus data are likely to be influenced strongly
by variation in this gene and their results should be interpreted
cautiously.
Phylogenetics is moving from the individual reconstruction of
gene genealogies to a new paradigm where the explicit reconstruc-
tion of species trees is a fundamental aim (Edwards et al., 2007;
Knowles, 2009). Amid this transition, empirical analyses of real
data obtained from non-model organisms, such as those presented
in this study, are valuable to highlight the promises of new approaches and to help identify potential pitfalls in their application.
In addition to the issues related to number of individuals sampled
which we described above, our analyses show that the number of
loci suggested as sufficient to resolve species trees with accuracy
under some simulation conditions (e.g. eight loci, Edwards et al.,
2007) might be insufficient in cases such as the Buarremon and allies radiation (see also Brumfield et al., 2008). Our analyses also
demonstrate that the same data can lead to different inferences
of evolutionary relationships when analyzed with different methods for species-tree reconstruction. Although all three methods
for species-tree reconstruction consistently recovered Buarremon
as a monophyletic group (Fig. 2, Tables 2 and 3), relationships
among taxa varied among different approaches. Specifically, MDC
recovered a species tree in which Lysurus and Arremon form a clade
that is sister to a monophyletic Buarremon, whereas BEST and COAL
placed Buarremon as sister to Arremon, with Lysurus sister to the
clade formed by these two. We note, however, that differences between these topologies in terms of the number of coalescent events
are minor owing to the small number of individuals in the sample
(i.e. 6 vs. 7 deep coalescent events). Other studies have reported
different results using different methods for species-tree reconstruction on the same data set (McCormack et al., 2009), implying
(1) that more work involving simulations and analyses of empirical
data is necessary to better understand the sources of discrepancy
between alternative approaches that use data in different ways,
and (2) that users of methods for species-tree reconstruction
should examine the robustness of their inferences across different
analytical approaches.
In conclusion, although Buarremon was consistently reconstructed as a monophyletic group by our multilocus approach, considerable incongruence was observed across analyses, indicating
the difficulty of achieving phylogenetic resolution in this group.
Adding molecular data in the way of additional sequences from
individual loci (Spinks et al., 2009) or from genome-wide surveys
of SNP variation generated using high-throughput technologies
(cf. Decker et al., 2009) may allow for increased resolution in the
future, but the uncertainty existing at present leads us to recommend that the traditional genera Buarremon, Arremon and Lysurus
be maintained in an expanded genus Arremon as suggested by
Cadena et al. (2007), the monophyly of which seems clear.
Acknowledgments
This study was financed by the Facultad de Ciencias at Universidad de los Andes. Tissue samples were provided by the Instituto
Alexander von Humbolt (IAvH) and the Banco de Tejidos of the Museo de Historia Natural, Universidad de los Andes (ANDES-BT). We
thank N. Gutiérrez, C. Salazar, E. Valderrama, F. Velásquez, and
other members of the Laboratorio de Biología Evolutiva de Vertebrados for assistance in the use of software and discussions at several stages of this project. Comments from J. McCormack and two
anonymous reviewers greatly improved this manuscript. AFR
would like to especially thank D.R. Rodríguez, L. Florez, and K.P.
Florez, who made a special effort to support him.
References
Brito, P., Edwards, S.V., 2009. Multilocus phylogeography and phylogenetics using
sequence-based markers. Genetica 135, 439–455.
Brumfield, R.T., Liu, L., Lum, D.E., Edwards, S.V., 2008. Comparison of species tree
methods for reconstructing the phylogeny of bearded Manakins (Aves: Pipridae,
Manacus) from multilocus sequence data. Syst. Biol. 57, 719–731.
A. Flórez-Rodríguez et al. / Molecular Phylogenetics and Evolution 58 (2011) 297–303
Cadena, C.D., Cuervo, A.M., 2010. Molecules, morphology, ecology, and songs in
concert: How many species is ‘‘Arremon torquatus’’ (Aves, Emberizidae)? Biol. J.
Linn. Soc. 99, 152–176.
Cadena, C.D., Klicka, J., Ricklefs, R.E., 2007. Evolutionary differentiation in the
Neotropical montane region: molecular phylogenetics and phylogeography of
Buarremon brush-finches (Aves, Emberizidae). Mol. Phylogent. Evol. 44, 993–
1016.
Carling, M.D., Brumfield, R.T., 2008. Integrating phylogenetic and population genetic
analyses of multiple loci to test species divergence hypothesis in passerina
buntings. Genetics 178, 363–377.
Carstens, B.C., Knowles, L.L., 2007. Estimating phylogeny from gene tree
probabilities in Melanoplus grasshoppers despite incomplete lineage sorting.
Syst. Biol. 56, 400–411.
Cummings, M.P., 1994. Transmission patterns of eukaryotic transposable elements:
arguments for and against horizontal transfer. Trends Ecol. Evol. 9, 141–145.
De Queiroz, A., 1993. For consensus (sometimes). Syst. Biol. 42, 368–372.
Decker, J.E., Pires, J.C., Conant, G.C., McKay, S.D., Heaton, M.P., Chen, K.C., Cooper, A.,
Vilkki, J., Seabury, C.M., Caetano, A.R., Johnson, G.S., Brenneman, R.A., Hanotte,
O., Eggert, L.S., Wiener, P., Kim, J., Kim, K.S., Sonstegard, T.S., Van Tassell, C.P.,
Neibergs, H.L., McEwan, J.C., Brauning, R., Coutinho, L.L., Babar, M.E., Wilson,
G.A., McClure, M.C., Rolf, M.M., Kim, J.W., Schnabel, R.D., Taylor, J.F., 2009.
Resolving the evolution of extant and extinct ruminants with high-throughput
phylogenomics. Proc. Natl. Acad. Sci. USA 106, 18644–18649.
Degnan, J.H., Rosenberg, N.A., 2006. Discordance of species trees with their most
likely gene trees. PLoS Genet. 2, 762–768.
Degnan, J.H., Rosenberg, N.A., 2009. Gene tree discordance, phylogenetic inference,
and the multispecies coalescent. Trends Ecol. Evol. 24, 332–340.
Degnan, J.H., Salter, L.A., 2005. Gene tree distributions under the coalescent process.
Evolution 59, 24–37.
Degnan, J.H., De Giorgio, M., Bryant, D., Rosenberg, N.A., 2009. Properties of
consensus methods for inferring species trees from gene trees. Syst. Biol. 58,
35–54.
Drummond, A.J., Ashton, B., Cheung, M., Heled, J., Kearse, M., Moir, R., Stones-Havas,
S., Thierer, T., Wilson, A., 2007. Geneious v3.0. <http://www.geneious.com/>.
Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and
high throughput. Nucleic Acids Res. 32, 1792–1797.
Edwards, S.V., Liu, L., Pearl, D.K., 2007. High-resolution species tree without
concatenation. Proc. Natl. Acad. Sci. USA 104, 5936–5941.
Fitch, W.M., 1970. Distinguishing homologous from analogous proteins. Syst. Zool.
19, 99–113.
Gadagkar, S., Rosenberg, S., Kumar, S., 2005. Inferring species phylogenies from
multiple genes: concatenated sequence tree versus consensus gene tree. J. Exp.
Zool. 304B, 64–74.
Hird, S., Kubatko, L.S., Carstens, B.C., 2010. Rapid and accurate species tree
estimation for phylogeographic investigations using replicated subsampling.
Mol. Phylogent. Evol. 57, 888–898.
Huang, H., He, Q., Kubatko, L.S., Knowles, L.L., 2010. Sources of error inherent in
species-tree estimation: impact of mutational and coalescent effects on
accuracy and implications for choosing among different methods. Syst. Biol.
59, 573–583.
Huelsenbeck, J.P., Swofford, D.L., Cunningham, C.W., Bull, J.J., Waddell, P., 1994. Is
character weighting a panacea for the problem of data heterogeneity in
phylogenetic analysis? Syst. Biol. 43, 288–291.
Jennings, W.B., Edwards, S.V., 2005. Speciational history of Australian grass finches
(Poephila) inferred from 30 gene trees. Evolution 59, 2033–2047.
Kimball, R.T., Braun, E.L., Barker, F.K., Bowie, R.C.K., Braun, M.J., Chojnowski, J.L.,
Hackett, S.J., Han, K.L., Harshman, J., Heimer-Torres, V., Holznagel, W.,
Huddleston, C.J., Marks, B.D., Miglia, K.J., Moore, W.S., Reddy, S., Sheldon, F.H.,
Smith, J.V., Witt, C.C., Yuri, T., 2009. A well-tested set of primers to amplify
regions spread across the avian genome. Mol. Phylogent. Evol. 50, 654–660.
Knowles, L.L., 2009. Estimating species trees: methods of phylogenetic analysis
when there is incongruence across genes. Syst. Biol. 58, 463–467.
Knowles, L.L., 2010. Sampling strategies for species-tree estimation. In: Knowles,
L.L., Kubatko, L.S. (Eds.), Estimating Species Trees: Practical and Theoretical
Aspects. Wiley-Blackwell, pp. 163–173.
Knowles, L.L., Kubatko, L.S., 2010. Estimating species trees: practical and theoretical
aspects. Wiley, Blackwell.
Leaché, A.D., 2009. Species tree discordance traces to phylogeographic clade
boundaries in North American fence lizards (Sceloporus). Syst. Biol. 58, 547–559.
303
Lewis, P.O., 1998. Maximum likelihood as an alternative to parsimony for inferring
phylogeny using nucleotide sequence data. In: Soltis, D.E., Soltis, P.S., Doyle, J.J.
(Eds.), Molecular Systematics of Plants II. Kluwer, Boston, pp. 132–163.
Linnen, C., Farrell, B., 2008. Comparison of methods for species-tree inference in the
sawfly genus neodiprion (hymenoptera: Diprionidae). Syst. Biol. 57, 876–890.
Liu, L., Edwards, S.V., 2009. Phylogenetic analysis in the anomaly zone. Syst. Biol. 58,
452–460.
Liu, L., Pearl, D., 2007. Species tree from gene trees: reconstructing bayesian
posterior distribution of species phylogeny using estimated gene tree
distributions. Syst. Biol. 56, 504–514.
Liu, L., Pearl, D., Brumfield, R., Edwards, S.V., 2008. Estimating species trees using
multiple-allele DNA sequence data. Evolution 62, 2080–2091.
Maddison, W.P., 1997. Gene trees in species trees. Syst. Biol. 46, 523–536.
Maddison, W.P., Knowles, L.L., 2006. Inferring phylogeny despite incomplete lineage
sorting. Syst. Biol. 55, 21–30.
Maddison, W.P., Maddison, D.R., 2004. Mesquite: A Modular System for
Evolutionary Analysis, Version 1.01. <http://mesquiteproject.org>.
McCormack, J.E., Huang, H., Knowles, L.L., 2009. Maximum-likelihood estimates of
species trees: how accuracy of phylogenetic inference depends upon the
divergence history and sampling design. Syst. Biol. 58, 501–508.
Meng, C., Kubatko, L.S., 2009. Detecting hybrid speciation in the presence of
incomplete lineage sorting using gene tree incongruence: a model. Theor. Pop.
Biol. 75, 35–45.
Nichols, R., 2001. Gene trees and species trees are not the same. Trends Ecol. Evol.
16, 358–364.
Nylander, J.A.A., 2004. MrModeltest v2. Program Distributed by the Author.
Evolutionary Biology Centre, Uppsala University.
Pamilo, P., Nei, M., 1988. Relationships between gene trees and species trees. Mol.
Biol. Evol. 5, 568–583.
Philippe, H., Derelle, R., Lopez, P., Pick, K., Borchiellini, C., Boury-Esnault, N., Vacelet,
J., Renard, E., Houliston, E., Queinnec, E., Da Silva, C., Winicker, P., Le Guyader, H.,
Leys, S., Jackson, D.J., Schrieber, F., Erpenbeck, D., Morgenstern, B., Worheide, G.,
Manuel, M., 2009. Phylogenomics revives traditional views on deep animal
relationships. Curr. Biol. 19, 706–712.
Piganeau, G.V., Gardner, M., Eyre-Walker, A., 2004. A broad survey of recombination
in animal mitochondrial. Mol. Biol. Evol. 21, 2319–2325.
Posada, D., Crandall, K.A., 1998. Modeltest: testing the model of DNA substitution.
Bioinformatics 14, 817–818.
Rambaut, A., Drummond, A., 2007. Tracer v1.4 2003–2008. Mcmc Trace.
Remsen Jr., J.V., Cadena, C.D., Jaramillo, A., Nores, M., Pacheco, J.F., Robbins, M.B.,
Schulenberg, T.S., Stiles, F.G., Stotz, D.F., Zimmer, K.J., 2010. A Classification of
the Bird Species of South America. American Ornithologists’ Union. <http://
www.museum.lsu.edu/~Remsen/SACCBaseline.html>.
Rokas, A., King, N., Finnerty, J., Carroll, S.B., 2003. Conflicting phylogenetic signals at
the base of the metazoan tree. Evol. Dev. 5, 346–359.
Ronquist, F., Huelsenbeck, J.P., 2003. MRBAYES 3: Bayesian phylogenetic inference
under mixed models. Bioinformatics 19, 1572–1574.
Slade, R.W., Moritz, C., Heideman, A., 1993. Rapid assessment of single-copy nuclear
DNA variation in diverse species. Mol. Ecol. 2, 359–373.
Sorenson, M.D., Ast, J.C., DimcheV, D.E., Yuri, T., Mindell, D.P., 1999. Primers for a
PCR-based approach to mitochondrial genome sequencing in birds and other
vertebrates. Mol. Phylogenet. Evol. 12, 105–114.
Spinks, P.Q., Thomson, R.C., Lovely, G.A., Shaffer, H.B., 2009. Assessing what is
needed to resolve a molecular phylogeny: simulations and empirical data from
emydid turtles. BMC Evol. Biol. 9, 56.
Swofford, D.L., 2002. PAUP. Phylogenetic Analysis Using Parsimony ( and Other
Methods), Version 4. Sinauer Associates, Sunderland, MA.
Takahata, N., 1989. Gene genealogy in three related populations: consistency
probability between gene and population trees. Genetics 122, 957–966.
Wakeley, J., Nielsen, R., Liu-Cordero, S.N., Ardlie, K., 2001. The discovery of singlenucleotide polymorphisms and inferences about human demographic history.
Am. J. Hum. Genet. 69, 1332–1347.
Waters, J.M., Rowe, D.L., Burridge, C.P., Wallis, G.P., 2010. Gene trees versus species
trees: reassessing life-history evolution in a freshwater fish radiation. Syst. Biol.
59, 504–517.
Wilgenbusch, J.C., Warren, D.L., Swofford, D.L., 2004. AWTY: A System for Graphical
Exploration of MCMC Convergence in Bayesian Phylogenetic Inference. <http://
ceb.csit.fsu.edu/awt>.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement