NobleBlocks
University of California, Berkeley logo

University of California, Berkeley

UniversityBerkeley, California, United States

Research output, citation impact, and the most-cited recent papers from University of California, Berkeley (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works
444.4K
Citations
55.0M
h-index
1852
i10-index
483.8K
Also known as
Cal BerkeleyUC BerkeleyUniversidad de California en BerkeleyUniversity of California at BerkeleyUniversity of California, BerkeleyUniversité de Californie à Berkeley

Top-cited papers from University of California, Berkeley

SciPy 1.0: fundamental algorithms for scientific computing in Python
Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland +4 more
2020· Nature Methods37.1Kdoi:10.1038/s41592-019-0686-2

SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.

Fully convolutional networks for semantic segmentation
Jonathan Long, Evan Shelhamer, Trevor Darrell
201536.8Kdoi:10.1109/cvpr.2015.7298965

Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one fifth of a second for a typical image.

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik
201431.6Kdoi:10.1109/cvpr.2014.81

Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also present experiments that provide insight into what the network learns, revealing a rich hierarchy of image features. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.

MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space
Fredrik Ronquist, Maxim Teslenko, Paul van der Mark, Daniel L. Ayres +4 more
2012· Systematic Biology27.7Kdoi:10.1093/sysbio/sys029

Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly. The introduction of new proposals and automatic optimization of tuning parameters has improved convergence for many problems. The new version also sports significantly faster likelihood calculations through streaming single-instruction-multiple-data extensions (SSE) and support of the BEAGLE library, allowing likelihood calculations to be delegated to graphics processing units (GPUs) on compatible hardware. Speedup factors range from around 2 with SSE code to more than 50 with BEAGLE for codon problems. Checkpointing across all models allows long runs to be completed even when an analysis is prematurely terminated. New models include relaxed clocks, dating, model averaging across time-reversible substitution models, and support for hard, negative, and partial (backbone) tree constraints. Inference of species trees from gene trees is supported by full incorporation of the Bayesian estimation of species trees (BEST) algorithms. Marginal model likelihoods for Bayes factor tests can be estimated accurately across the entire model space using the stepping stone method. The new version provides more output options than previously, including samples of ancestral states, site rates, site d(N)/d(S) rations, branch rates, and node dates. A wide range of statistics on tree parameters can also be output for visualization in FigTree and compatible software.

Latent dirichlet allocation
David M. Blei, Andrew Y. Ng, Michael I. Jordan
2003· Journal of Machine Learning Research27.0Kdoi:10.5555/944919.944937

We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

<i>PHENIX</i>: a comprehensive Python-based system for macromolecular structure solution
Paul D. Adams, Pavel V. Afonine, G. Bunkóczi, Vincent B. Chen +4 more
2010· Acta Crystallographica Section D Biological Crystallography24.4Kdoi:10.1107/s0907444909052925

Macromolecular X-ray crystallography is routinely applied to understand biological processes at a molecular level. However, significant time and effort are still required to solve and complete many of these structures because of the need for manual interpretation of complex numerical data using many software packages and the repeated use of interactive three-dimensional graphics. PHENIX has been developed to provide a comprehensive system for macromolecular crystallographic structure solution with an emphasis on the automation of all procedures. This has relied on the development of algorithms that minimize or eliminate subjective input, the development of algorithms that automate procedures that are traditionally performed by hand and, finally, the development of a framework that allows a tight integration between the algorithms.

Artificial intelligence: a modern approach
Dr. Anil Kumar, Sivasubramanian Balasubramanian, Dr. Haewon Byeon, Prof. Ganesh Vasudeo Manerkar
1995· Choice Reviews Online22.2Kdoi:10.5860/choice.33-1577

The long-anticipated revision of this #1 selling book offers the most comprehensive, state of the art introduction to the theory and practice of artificial intelligence for modern applications. Intelligent Agents. Solving Problems by Searching. Informed Search Methods. Game Playing. Agents that Reason Logically. First-order Logic. Building a Knowledge Base. Inference in First-Order Logic. Logical Reasoning Systems. Practical Planning. Planning and Acting. Uncertainty. Probabilistic Reasoning Systems. Making Simple Decisions. Making Complex Decisions. Learning from Observations. Learning with Neural Networks. Reinforcement Learning. Knowledge in Learning. Agents that Communicate. Practical Communication in English. Perception. Robotics. For computer professionals, linguists, and cognitive scientists interested in artificial intelligence.

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks
Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros
201721.8Kdoi:10.1109/iccv.2017.244

Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F : Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.

A new generation of Ca2+ indicators with greatly improved fluorescence properties.
Grzegorz Grynkiewicz, Martin Poenie, Roger Y. Tsien
1985· Journal of Biological Chemistry21.7Kdoi:10.1016/s0021-9258(19)83641-4

A new family of highly fluorescent indicators has been synthesized for biochemical studies of the physiological role of cytosolic free Ca2+. The compounds combine an 8-coordinate tetracarboxylate chelating site with stilbene chromophores. Incorporation of the ethylenic linkage of the stilbene into a heterocyclic ring enhances the quantum efficiency and photochemical stability of the fluorophore. Compared to their widely used predecessor, "quin2", the new dyes offer up to 30-fold brighter fluorescence, major changes in wavelength not just intensity upon Ca2+ binding, slightly lower affinities for Ca2+, slightly longer wavelengths of excitation, and considerably improved selectivity for Ca2+ over other divalent cations. These properties, particularly the wavelength sensitivity to Ca2+, should make these dyes the preferred fluorescent indicators for many intracellular applications, especially in single cells, adherent cell layers, or bulk tissues.

Array programming with NumPy
Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers +4 more
2020· Nature21.7Kdoi:10.1038/s41586-020-2649-2

Abstract Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves 1 and in the first imaging of a black hole 2 . Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis.

Welcome to the Tidyverse
Hadley Wickham, Mara Averick, Jennifer Bryan, Winston Chang +4 more
2019· The Journal of Open Source Software21.1Kdoi:10.21105/joss.01686

RESUMENEvaluación del efecto de un curso nivelatorio de matemáticas en educación superior: el caso de Matemáticas Básicas La investigación evalúa los efectos de tomar un curso de nivelación obligatorio, que se ofrece una única vez (i.e.no puede repetirse) para estudiantes de pregrado, sobre la probabilidad de matricularse, el desempeño en las asignaturas universitarias de matemáticas, avance en la carrera y probabilidad de graduarse.La investigación emplea un diseño de regresión discontinua que aprovecha el hecho de que los estudiantes admitidos a la universidad que tengan en la prueba de ingreso un puntaje en matemáticas inferior a un umbral están obligados a tomar el curso de nivelación de matemáticas básicas.Se encuentra que el curso de nivelación no tiene un efecto en la probabilidad de matricularse, de desvincularse del programa ni de graduarse seis años después de haber sido admitido.Hay un efecto

A global reference for human genetic variation
Corresponding authors, Adam Auton, Gonçalo R. Abecasis, David M. Altshuler +4 more
2015· Nature19.8Kdoi:10.1038/nature15393

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Results for the final phase of the 1000 Genomes Project are presented including whole-genome sequencing, targeted exome sequencing, and genotyping on high-density SNP arrays for 2,504 individuals across 26 populations, providing a global reference data set to support biomedical genetics. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.

Observational Evidence from Supernovae for an Accelerating Universe and a Cosmological Constant
Adam G. Riess, A. V. Filippenko, P. Challis, A. Clocchiatti +4 more
1998· The Astronomical Journal19.4Kdoi:10.1086/300499

We present spectral and photometric observations of 10 Type Ia supernovae (SNe Ia) in the redshift range 0.16 z 0.62. The luminosity distances of these objects are determined by methods that employ relations between SN Ia luminosity and light curve shape. Combined with previous data from our High-z Supernova Search Team and recent results by Riess et al., this expanded set of 16 high-redshift M \ 1) methods. We estimate the dynamical age of the universe to be 14.2 ^1.7 Gyr including systematic uncertainties in the current Cepheid distance scale. We estimate the likely e ect of several sources of systematic error, including progenitor and metallicity evolution, extinction, sample selection bias, local perturbations in the expansion rate, gravitational lensing, and sample contamination. Presently, none of these e ects appear to reconcile the data with and ) " \ 0 q 0 0.

Array programming with NumPy
Harris, CR, Millman, KJ, van der Walt, SJ, Gommers, R +4 more
2020· TUScholarShare (Temple University)18.8K

Array programming provides a powerful, compact and expressive syntax for accessing, manipulating and operating on data in vectors, matrices and higher-dimensional arrays. NumPy is the primary array programming library for the Python language. It has an essential role in research analysis pipelines in fields as diverse as physics, chemistry, astronomy, geoscience, biology, psychology, materials science, engineering, finance and economics. For example, in astronomy, NumPy was an important part of the software stack used in the discovery of gravitational waves1 and in the first imaging of a black hole2. Here we review how a few fundamental array concepts lead to a simple and powerful programming paradigm for organizing, exploring and analysing scientific data. NumPy is the foundation upon which the scientific Python ecosystem is constructed. It is so pervasive that several projects, targeting audiences with specialized needs, have developed their own NumPy-like interfaces and array objects. Owing to its central position in the ecosystem, NumPy increasingly acts as an interoperability layer between such array computation libraries and, together with its application programming interface (API), provides a flexible framework to support the next decade of scientific and industrial analysis.

Measurements of Ω and Λ from 42 High‐Redshift Supernovae
S. Perlmutter, G. Aldering, G. Goldhaber, R. A. Knop +4 more
1999· The Astrophysical Journal17.9Kdoi:10.1086/307221

We report measurements of the mass density, ΩM, and cosmological-constant energy density, ΩΛ of the universe based on the analysis of 42 type Ia supernovae discovered by the Supernova Cosmology Project. The magnitude-redshift data for these supernovae, at redshifts between 0.18 and 0.83, are fitted jointly with a set of supernovae from the Calán/Tololo Supernova Survey, at redshifts below 0.1, to yield values for the cosmological parameters. All supernova peak magnitudes are standardized using a SN Ia light-curve width-luminosity relation. The measurement yields a joint probability distribution of the cosmological parameters that is approximated by the relation 0.8ΩM - 0.6ΩΛ ≈ - 0.2 ± 0.1 in the region of interest (ΩM ≲ 1.5). For a flat (ΩM + ΩΛ = 1) cosmology we find ΩflatM = 0.28+0.09-0.08 (1 σ statistical) +0.05-0.04 (identified systematics). The data are strongly inconsistent with a Λ = 0 flat cosmology, the simplest inflationary universe model. An open, Λ = 0 cosmology also does not fit the data well: the data indicate that the cosmological constant is nonzero and positive, with a confidence of P(Λ &gt; 0) = 99%, including the identified systematic uncertainties. The best-fit age of the universe relative to the Hubble time is tflat0 = 14.9+1.4-1.1(0.63/h) Gyr for a flat cosmology. The size of our sample allows us to perform a variety of statistical tests to check for possible systematic errors and biases. We find no significant differences in either the host reddening distribution or Malmquist bias between the low-redshift Calán/Tololo sample and our high-redshift sample. Excluding those few supernovae that are outliers in color excess or fit residual does not significantly change the results. The conclusions are also robust whether or not a width-luminosity relation is used to standardize the supernova peak magnitudes. We discuss and constrain, where possible, hypothetical alternatives to a cosmological constant.

A Programmable Dual-RNA–Guided DNA Endonuclease in Adaptive Bacterial Immunity
Martin Jínek, Krzysztof Chylinski, Ines Fonfara, M. Hauer +2 more
2012· Science17.1Kdoi:10.1126/science.1225829

Clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) systems provide bacteria and archaea with adaptive immunity against viruses and plasmids by using CRISPR RNAs (crRNAs) to guide the silencing of invading nucleic acids. We show here that in a subset of these systems, the mature crRNA that is base-paired to trans-activating crRNA (tracrRNA) forms a two-RNA structure that directs the CRISPR-associated protein Cas9 to introduce double-stranded (ds) breaks in target DNA. At sites complementary to the crRNA-guide sequence, the Cas9 HNH nuclease domain cleaves the complementary strand, whereas the Cas9 RuvC-like domain cleaves the noncomplementary strand. The dual-tracrRNA:crRNA, when engineered as a single RNA chimera, also directs sequence-specific Cas9 dsDNA cleavage. Our study reveals a family of endonucleases that use dual-RNAs for site-specific DNA cleavage and highlights the potential to exploit the system for RNA-programmable genome editing.

The Chemistry and Applications of Metal-Organic Frameworks
Hiroyasu Furukawa, Kyle E. Cordova, M. O’Keeffe, Omar M. Yaghi
2013· Science16.4Kdoi:10.1126/science.1230444

Background Metal-organic frameworks (MOFs) are made by linking inorganic and organic units by strong bonds (reticular synthesis). The flexibility with which the constituents’ geometry, size, and functionality can be varied has led to more than 20,000 different MOFs being reported and studied within the past decade. The organic units are ditopic or polytopic organic carboxylates (and other similar negatively charged molecules), which, when linked to metal-containing units, yield architecturally robust crystalline MOF structures with a typical porosity of greater than 50% of the MOF crystal volume. The surface area values of such MOFs typically range from 1000 to 10,000 m 2 /g, thus exceeding those of traditional porous materials such as zeolites and carbons. To date, MOFs with permanent porosity are more extensive in their variety and multiplicity than any other class of porous materials. These aspects have made MOFs ideal candidates for storage of fuels (hydrogen and methane), capture of carbon dioxide, and catalysis applications, to mention a few. Advances The ability to vary the size and nature of MOF structures without changing their underlying topology gave rise to the isoreticular principle and its application in making MOFs with the largest pore aperture (98 Å) and lowest density (0.13 g/cm 3 ). This has allowed for the selective inclusion of large molecules (e.g., vitamin B 12 ) and proteins (e.g., green fluorescent protein) and the exploitation of the pores as reaction vessels. Along these lines, the thermal and chemical stability of many MOFs has made them amenable to postsynthetic covalent organic and metal-complex functionalization. These capabilities enable substantial enhancement of gas storage in MOFs and have led to their extensive study in the catalysis of organic reactions, activation of small molecules (hydrogen, methane, and water), gas separation, biomedical imaging, and proton, electron, and ion conduction. At present, methods are being developed for making nanocrystals and supercrystals of MOFs for their incorporation into devices. Outlook The precise control over the assembly of MOFs is expected to propel this field further into new realms of synthetic chemistry in which far more sophisticated materials may be accessed. For example, materials can be envisaged as having (i) compartments linked together to operate separately, yet function synergistically; (ii) dexterity to carry out parallel operations; (iii) ability to count, sort, and code information; and (iv) capability of dynamics with high fidelity. Efforts in this direction are already being undertaken through the introduction of a large number of different functional groups within the pores of MOFs. This yields multivariate frameworks in which the varying arrangement of functionalities gives rise to materials that offer a synergistic combination of properties. Future work will involve the assembly of chemical structures from many different types of building unit, such that the structures’ function is dictated by the heterogeneity of the specific arrangement of their constituents.

ANFIS: adaptive-network-based fuzzy inference system
Jyh‐Shing Roger Jang
1993· IEEE Transactions on Systems Man and Cybernetics16.1Kdoi:10.1109/21.256541

The architecture and learning procedure underlying ANFIS (adaptive-network-based fuzzy inference system) is presented, which is a fuzzy inference system implemented in the framework of adaptive networks. By using a hybrid learning procedure, the proposed ANFIS can construct an input-output mapping based on both human knowledge (in the form of fuzzy if-then rules) and stipulated input-output data pairs. In the simulation, the ANFIS architecture is employed to model nonlinear functions, identify nonlinear components on-line in a control system, and predict a chaotic time series, all yielding remarkable results. Comparisons with artificial neural networks and earlier work on fuzzy modeling are listed and discussed. Other extensions of the proposed ANFIS and promising applications to automatic control and signal processing are also suggested.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">&gt;</ETX>

Social Network Sites: Definition, History, and Scholarship
danah boyd, Nicole B. Ellison
2007· Journal of Computer-Mediated Communication16.0Kdoi:10.1111/j.1083-6101.2007.00393.x

Social network sites (SNSs) are increasingly attracting the attention of academic and industry researchers intrigued by their affordances and reach. This special theme section of the Journal of Computer-Mediated Communication brings together scholarship on these emergent phenomena. In this introductory article, we describe features of SNSs and propose a comprehensive definition. We then present one perspective on the history of such sites, discussing key changes and developments. After briefly summarizing existing scholarship concerning SNSs, we discuss the articles in this special section and conclude with considerations for future research.

FastTree 2 – Approximately Maximum-Likelihood Trees for Large Alignments
Morgan N. Price, Paramvir Dehal, Adam P. Arkin
2010· PLoS ONE15.8Kdoi:10.1371/journal.pone.0009490

BACKGROUND: We recently described FastTree, a tool for inferring phylogenies for alignments with up to hundreds of thousands of sequences. Here, we describe improvements to FastTree that improve its accuracy without sacrificing scalability. METHODOLOGY/PRINCIPAL FINDINGS: Where FastTree 1 used nearest-neighbor interchanges (NNIs) and the minimum-evolution criterion to improve the tree, FastTree 2 adds minimum-evolution subtree-pruning-regrafting (SPRs) and maximum-likelihood NNIs. FastTree 2 uses heuristics to restrict the search for better trees and estimates a rate of evolution for each site (the "CAT" approximation). Nevertheless, for both simulated and genuine alignments, FastTree 2 is slightly more accurate than a standard implementation of maximum-likelihood NNIs (PhyML 3 with default settings). Although FastTree 2 is not quite as accurate as methods that use maximum-likelihood SPRs, most of the splits that disagree are poorly supported, and for large alignments, FastTree 2 is 100-1,000 times faster. FastTree 2 inferred a topology and likelihood-based local support values for 237,882 distinct 16S ribosomal RNAs on a desktop computer in 22 hours and 5.8 gigabytes of memory. CONCLUSIONS/SIGNIFICANCE: FastTree 2 allows the inference of maximum-likelihood phylogenies for huge alignments. FastTree 2 is freely available at http://www.microbesonline.org/fasttree.