Allan Wilson Centre
facilityPalmerston North, New Zealand
Research output, citation impact, and the most-cited recent papers from Allan Wilson Centre (New Zealand). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Allan Wilson Centre
Computational evolutionary biology, statistical phylogenetics and coalescent-based population genetics are becoming increasingly central to the analysis and understanding of molecular sequence data. We present the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package version 1.7, which implements a family of Markov chain Monte Carlo (MCMC) algorithms for Bayesian phylogenetic inference, divergence time dating, coalescent analysis, phylogeography and related molecular evolutionary analyses. This package includes an enhanced graphical user interface program called Bayesian Evolutionary Analysis Utility (BEAUti) that enables access to advanced models for molecular sequence and phenotypic trait evolution that were previously available to developers only. The package also provides new tools for visualizing and summarizing multispecies coalescent and phylogeographic analyses. BEAUti and BEAST 1.7 are open source under the GNU lesser general public license and available at http://beast-mcmc.googlecode.com and http://beast.bio.ed.ac.uk.
We present a new open source, extensible and flexible software platform for Bayesian evolutionary analysis called BEAST 2. This software platform is a re-design of the popular BEAST 1 platform to correct structural deficiencies that became evident as the BEAST 1 software evolved. Key among those deficiencies was the lack of post-deployment extensibility. BEAST 2 now has a fully developed package management system that allows third party developers to write additional functionality that can be directly installed to the BEAST 2 analysis platform via a package manager without requiring a new software release of the platform. This package architecture is showcased with a number of recently published new models encompassing birth-death-sampling tree priors, phylodynamics and model averaging for substitution models and site partitioning. A second major improvement is the ability to read/write the entire state of the MCMC chain to/from disk allowing it to be easily shared between multiple instances of the BEAST software. This facilitates checkpointing and better support for multi-processor and high-end computing extensions. Finally, the functionality in new packages can be easily added to the user interface (BEAUti 2) by a simple XML template-based mechanism because BEAST 2 has been re-designed to provide greater integration between the analysis engine and the user interface so that, for example BEAST and BEAUti use exactly the same XML file format.
Until recently, it has been common practice for a phylogenetic analysis to use a single gene sequence from a single individual organism as a proxy for an entire species. With technological advances, it is now becoming more common to collect data sets containing multiple gene loci and multiple individuals per species. These data sets often reveal the need to directly model intraspecies polymorphism and incomplete lineage sorting in phylogenetic estimation procedures. For a single species, coalescent theory is widely used in contemporary population genetics to model intraspecific gene trees. Here, we present a Bayesian Markov chain Monte Carlo method for the multispecies coalescent. Our method coestimates multiple gene trees embedded in a shared species tree along with the effective population size of both extant and ancestral species. The inference is made possible by multilocus data from multiple individuals per species. Using a multiindividual data set and a series of simulations of rapid species radiations, we demonstrate the efficacy of our new method. These simulations give some insight into the behavior of the method as a function of sampled individuals, sampled loci, and sequence length. Finally, we compare our new method to both an existing method (BEST 2.2) with similar goals and the supermatrix (concatenation) method. We demonstrate that both BEST and our method have much better estimation accuracy for species tree topology than concatenation, and our method outperforms BEST in divergence time and population size estimation.
PHYML Online is a web interface to PHYML, a software that implements a fast and accurate heuristic for estimating maximum likelihood phylogenies from DNA and protein sequences. This tool provides the user with a number of options, e.g. nonparametric bootstrap and estimation of various evolutionary parameters, in order to perform comprehensive phylogenetic analyses on large datasets in reasonable computing time. The server and its documentation are available at http://atgc.lirmm.fr/phyml.
The multispecies coalescent provides an elegant theoretical framework for estimating species trees and species demographics from genetic markers. However, practical applications of the multispecies coalescent model are limited by the need to integrate or sample over all gene trees possible for each genetic marker. Here we describe a polynomial-time algorithm that computes the likelihood of a species tree directly from the markers under a finite-sites model of mutation effectively integrating over all possible gene trees. The method applies to independent (unlinked) biallelic markers such as well-spaced single nucleotide polymorphisms, and we have implemented it in SNAPP, a Markov chain Monte Carlo sampler for inferring species trees, divergence dates, and population sizes. We report results from simulation experiments and from an analysis of 1997 amplified fragment length polymorphism loci in 69 individuals sampled from six species of Ourisia (New Zealand native foxglove).
Mitochondrial DNA (mtDNA) sequences from 686 wild and domestic pig specimens place the origin of wild boar in island Southeast Asia (ISEA), where they dispersed across Eurasia. Previous morphological and genetic evidence suggested pig domestication took place in a limited number of locations (principally the Near East and Far East). In contrast, new genetic data reveal multiple centers of domestication across Eurasia and that European, rather than Near Eastern, wild boar are the principal source of modern European domestic pigs.
There are two competing hypotheses for the origin of the Indo-European language family. The conventional view places the homeland in the Pontic steppes about 6000 years ago. An alternative hypothesis claims that the languages spread from Anatolia with the expansion of farming 8000 to 9500 years ago. We used Bayesian phylogeographic approaches, together with basic vocabulary data from 103 ancient and contemporary Indo-European languages, to explicitly model the expansion of the family and test these hypotheses. We found decisive support for an Anatolian origin over a steppe origin. Both the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago. These results highlight the critical role that phylogeographic inference can play in resolving debates about human prehistory.
Abstract ‘Phylogenetics’ is the reconstruction and analysis of phylogenetic (evolutionary) trees and networks based on inherited characteristics. It is a flourishing area of intereaction between mathematics, statistics, computer science and biology. The main role of phylogenetic techniques lies in evolutionary biology, where it is used to infer historical relationships between species. However, the methods are also relevant to a diverse range of fields including epidemiology, ecology, medicine, as well as linguistics and cognitive psychology This graduate-level book, based on the authors lectures at The University of Canterbury, New Zealand, focuses on the mathematical aspects of phylogenetics. It brings together the central results of the field (providing proofs of the main theorem), outlines their biological significance, and indicates how algorithms may be derived. The presentation is self-contained and relies on discrete mathematics with some probability theory. A set of exercises and at least one specialist topic ends each chapter. This book is intended for biologists interested in the mathematical theory behind phylogenetic methods, and for mathematicians, statisticians, and computer scientists eager to learn about this emerging area of discrete mathematics. 'Phylogenetics' in the 24th volume in the Oxford Lecture Series in Mathematics and its Applications. This series contains short books suitable for graduate students and researchers who want a well-written account of mathematics that is fundamental to current to research. The series emphasises future directions of research and focuses on genuine applications of mathematics to finance, engineering and the physical and biological sciences.
Phylogenetic trees can be used to infer the processes that generated them. Here, we introduce a model, the bayesian birth-death skyline plot, which explicitly estimates the rate of transmission, recovery, and sampling and thus allows inference of the effective reproductive number directly from genetic data. Our method allows these parameters to vary through time in a piecewise fashion and is implemented within the BEAST2 software framework. The method is a powerful alternative to the existing coalescent skyline plot, providing insight into the differing roles of incidence and prevalence in an epidemic. We apply this method to data from the United Kingdom HIV-1 epidemic and Egyptian hepatitis C virus (HCV) epidemic. The analysis reveals temporal changes of the effective reproductive number that highlight the effect of past public health interventions.
Two issues long debated among Pacific and American prehistorians are (i) whether there was a pre-Columbian introduction of chicken (Gallus gallus) to the Americas and (ii) whether Polynesian contact with South America might be identified archaeologically, through the recovery of remains of unquestionable Polynesian origin. We present a radiocarbon date and an ancient DNA sequence from a single chicken bone recovered from the archaeological site of El Arenal-1, on the Arauco Peninsula, Chile. These results not only provide firm evidence for the pre-Columbian introduction of chickens to the Americas, but strongly suggest that it was a Polynesian introduction.
Studies of microbial evolutionary dynamics are being transformed by the availability of affordable high-throughput sequencing technologies, which allow whole-genome sequencing of hundreds of related taxa in a single study. Reconstructing a phylogenetic tree of these taxa is generally a crucial step in any evolutionary analysis. Instead of constructing genome assemblies for all taxa, annotating these assemblies, and aligning orthologous genes, many recent studies 1) directly map raw sequencing reads to a single reference sequence, 2) extract single nucleotide polymorphisms (SNPs), and 3) infer the phylogenetic tree using maximum likelihood methods from the aligned SNP positions. However, here we show that, when using such methods to reconstruct phylogenies from sets of simulated sequences, both the exclusion of nonpolymorphic positions and the alignment to a single reference genome, introduce systematic biases and errors in phylogeny reconstruction. To address these problems, we developed a new method that combines alignments from mappings to multiple reference sequences and show that this successfully removes biases from the reconstructed phylogenies. We implemented this method as a web server named REALPHY (Reference sequence Alignment-based Phylogeny builder), which fully automates phylogenetic reconstruction from raw sequencing reads.
Early experience has a particularly great effect on most organisms. Normal development may be disrupted by early environmental influences; individuals that survive have to cope with the damaging consequences. Additionally, the responses required to cope with environmental challenges in early life may have long-term effects on the adult organism. A further set of processes, those of developmental plasticity, may induce a phenotype that is adapted to the adult environment predicted by the conditions of early life. A mismatch between prediction and subsequent reality can cause severe health problems in those human societies where economic circumstances and nutrition are rapidly improving. Understanding the underlying mechanisms of plasticity is, therefore, clinically important. However, to conduct research in this area, developmental plasticity must be disentangled from disruption and the adverse long-term effects of coping. The paper reviews these concepts and explores ways in which such distinctions may be made in practice.
BACKGROUND: Relaxed molecular clock models allow divergence time dating and "relaxed phylogenetic" inference, in which a time tree is estimated in the face of unequal rates across lineages. We present a new method for relaxing the assumption of a strict molecular clock using Markov chain Monte Carlo to implement Bayesian modeling averaging over random local molecular clocks. The new method approaches the problem of rate variation among lineages by proposing a series of local molecular clocks, each extending over a subregion of the full phylogeny. Each branch in a phylogeny (subtending a clade) is a possible location for a change of rate from one local clock to a new one. Thus, including both the global molecular clock and the unconstrained model results, there are a total of 2(2n-2) possible rate models available for averaging with 1, 2, ..., 2n - 2 different rate categories. RESULTS: We propose an efficient method to sample this model space while simultaneously estimating the phylogeny. The new method conveniently allows a direct test of the strict molecular clock, in which one rate rules them all, against a large array of alternative local molecular clock models. We illustrate the method's utility on three example data sets involving mammal, primate and influenza evolution. Finally, we explore methods to visualize the complex posterior distribution that results from inference under such models. CONCLUSIONS: The examples suggest that large sequence datasets may only require a small number of local molecular clocks to reconcile their branch lengths with a time scale. All of the analyses described here are implemented in the open access software package BEAST 1.5.4 (http://beast-mcmc.googlecode.com/).
Despite 250 years of work in systematics, the majority of species remains to be identified. Rising extinction rates and the need for increased biological monitoring lend urgency to this task. DNA sequencing, with key sequences serving as a "barcode", has therefore been proposed as a technology that might expedite species identification. In particular, the mitochondrial cytochrome c oxidase subunit 1 gene has been employed as a possible DNA marker for species and a number of studies in a variety of taxa have accordingly been carried out to examine its efficacy. In general, these studies demonstrate that DNA barcoding resolves most species, although some taxa have proved intractable. In some studies, barcoding provided a means of highlighting potential cryptic, synonymous or extinct species as well as matching adults with immature specimens. Higher taxa, however, have not been resolved as accurately as species. Nonetheless, DNA barcoding appears to offer a means of identifying species and may become a standard tool.
The extent and evolutionary significance of hybridization is difficult to evaluate because of the difficulty in distinguishing hybridization from incomplete lineage sorting. Here we present a novel parametric approach for statistically distinguishing hybridization from incomplete lineage sorting based on minimum genetic distances of a nonrecombining locus. It is based on the idea that the expected minimum genetic distance between sequences from two species is smaller for some hybridization events than for incomplete lineage sorting scenarios. When applied to empirical data sets, distributions can be generated for the minimum interspecies distances expected under incomplete lineage sorting using coalescent simulations. If the observed distance between sequences from two species is smaller than its predicted distribution, incomplete lineage sorting can be rejected and hybridization inferred. We demonstrate the power of the method using simulations and illustrate its application on New Zealand alpine buttercups (Ranunculus). The method is robust and complements existing approaches. Thus it should allow biologists to assess with greater accuracy the importance of hybridization in evolution.
Abstract Background Pseudomonas fluorescens are common soil bacteria that can improve plant health through nutrient cycling, pathogen antagonism and induction of plant defenses. The genome sequences of strains SBW25 and Pf0-1 were determined and compared to each other and with P. fluorescens Pf-5. A functional genomic in vivo expression technology (IVET) screen provided insight into genes used by P. fluorescens in its natural environment and an improved understanding of the ecological significance of diversity within this species. Results Comparisons of three P. fluorescens genomes (SBW25, Pf0-1, Pf-5) revealed considerable divergence: 61% of genes are shared, the majority located near the replication origin. Phylogenetic and average amino acid identity analyses showed a low overall relationship. A functional screen of SBW25 defined 125 plant-induced genes including a range of functions specific to the plant environment. Orthologues of 83 of these exist in Pf0-1 and Pf-5, with 73 shared by both strains. The P. fluorescens genomes carry numerous complex repetitive DNA sequences, some resembling Miniature Inverted-repeat Transposable Elements (MITEs). In SBW25, repeat density and distribution revealed 'repeat deserts' lacking repeats, covering approximately 40% of the genome. Conclusions P. fluorescens genomes are highly diverse. Strain-specific regions around the replication terminus suggest genome compartmentalization. The genomic heterogeneity among the three strains is reminiscent of a species complex rather than a single species. That 42% of plant-inducible genes were not shared by all strains reinforces this conclusion and shows that ecological success requires specialized and core functions. The diversity also indicates the significant size of genetic information within the Pseudomonas pan genome.
Phylogenetic analyses which include fossils or molecular sequences that are sampled through time require models that allow one sample to be a direct ancestor of another sample. As previously available phylogenetic inference tools assume that all samples are tips, they do not allow for this possibility. We have developed and implemented a Bayesian Markov Chain Monte Carlo (MCMC) algorithm to infer what we call sampled ancestor trees, that is, trees in which sampled individuals can be direct ancestors of other sampled individuals. We use a family of birth-death models where individuals may remain in the tree process after sampling, in particular we extend the birth-death skyline model [Stadler et al., 2013] to sampled ancestor trees. This method allows the detection of sampled ancestors as well as estimation of the probability that an individual will be removed from the process when it is sampled. We show that even if sampled ancestors are not of specific interest in an analysis, failing to account for them leads to significant bias in parameter estimates. We also show that sampled ancestor birth-death models where every sample comes from a different time point are non-identifiable and thus require one parameter to be known in order to infer other parameters. We apply our phylogenetic inference accounting for sampled ancestors to epidemiological data, where the possibility of sampled ancestors enables us to identify individuals that infected other individuals after being sampled and to infer fundamental epidemiological parameters. We also apply the method to infer divergence times and diversification rates when fossils are included along with extant species samples, so that fossilisation events are modelled as a part of the tree branching process. Such modelling has many advantages as argued in the literature. The sampler is available as an open-source BEAST2 package (https://github.com/CompEvol/sampled-ancestors).
Evolution of chromosome complements can be resolved by genome sequencing, comparative genetic mapping, and comparative chromosome painting. Previously, comparison of genetic maps and gene-based phylogenies suggested that the karyotypes of Arabidopsis thaliana (n = 5) and of related species with six or seven chromosome pairs were derived from an ancestral karyotype with eight chromosome pairs. To test this hypothesis, we applied multicolor chromosome painting using contiguous bacterial artificial chromosome pools of A. thaliana arranged according to the genetic maps of Arabidopsis lyrata and Capsella rubella (both n = 8) to A. thaliana, A. lyrata, Neslia paniculata, Turritis glabra, and Hornungia alpina. This approach allowed us to map the A. lyrata centromeres as a prerequisite to defining a putative ancestral karyotype (n = 8) and to elucidate the evolutionary mechanisms that shaped the karyotype of A. thaliana and its relatives. We conclude that chromosome "fusions" in A. thaliana resulted from (i) generation of acrocentric chromosomes by pericentric inversions, (ii) reciprocal translocation between two chromosomes (one or both acrocentric), and (iii) elimination of a minichromosome that arose in addition to the "fusion chromosome." Comparative chromosome painting applied to N. paniculata (n = 7), T. glabra (n = 6), and H. alpina (n = 6), for which genetic maps are not available, revealed chromosomal colinearity between all species tested and allowed us to reconstruct the evolution of their chromosomes from a putative ancestral karyotype (n = 8). Although involving different ancestral chromosomes, chromosome number reduction followed similar routes as found within the genus Arabidopsis.
Supernatural belief presents an explanatory challenge to evolutionary theorists-it is both costly and prevalent. One influential functional explanation claims that the imagined threat of supernatural punishment can suppress selfishness and enhance cooperation. Specifically, morally concerned supreme deities or 'moralizing high gods' have been argued to reduce free-riding in large social groups, enabling believers to build the kind of complex societies that define modern humanity. Previous cross-cultural studies claiming to support the MHG hypothesis rely on correlational analyses only and do not correct for the statistical non-independence of sampled cultures. Here we use a Bayesian phylogenetic approach with a sample of 96 Austronesian cultures to test the MHG hypothesis as well as an alternative supernatural punishment hypothesis that allows punishment by a broad range of moralizing agents. We find evidence that broad supernatural punishment drives political complexity, whereas MHGs follow political complexity. We suggest that the concept of MHGs diffused as part of a suite of traits arising from cultural exchange between complex societies. Our results show the power of phylogenetic methods to address long-standing debates about the origins and functions of religion in human society.
MMOD is a library for the R programming language that allows the calculation of the population differentiation measures D(est), G″(ST) and φ'(ST). R provides a powerful environment in which to conduct and record population genetic analyses but, at present, no R libraries provide functions for the calculation of these statistics from standard population genetic files. In addition to the calculation of differentiation measures, mmod can produce parametric bootstrap and jackknife samples of data sets for further analysis. By integrating with and complimenting the existing libraries adegenet and pegas, mmod extends the power of R as a population genetic platform.