National Energy Research Scientific Computing Center
facilityBerkeley, United States
Research output, citation impact, and the most-cited recent papers from National Energy Research Scientific Computing Center (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from National Energy Research Scientific Computing Center
Studies of the human microbiome have revealed that even healthy individuals differ remarkably in the microbes that occupy habitats such as the gut, skin and vagina. Much of this diversity remains unexplained, although diet, environment, host genetics and early microbial exposure have all been implicated. Accordingly, to characterize the ecology of human-associated microbial communities, the Human Microbiome Project has analysed the largest cohort and set of distinct, clinically relevant body habitats so far. We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals. The project encountered an estimated 81–99% of the genera, enzyme families and community configurations occupied by the healthy Western microbiome. Metagenomic carriage of metabolic pathways was stable among individuals despite variation in community structure, and ethnic/racial background proved to be one of the strongest associations of both pathways and microbes with clinical metadata. These results thus delineate the range of structural and functional configurations normal in the microbial communities of a healthy population, enabling future characterization of the epidemiology, ecology and translational applications of the human microbiome. The Human Microbiome Project Consortium reports the first results of their analysis of microbial communities from distinct, clinically relevant body habitats in a human cohort; the insights into the microbial communities of a healthy population lay foundations for future exploration of the epidemiology, ecology and translational applications of the human microbiome. The Human Microbiome Project (HMP), supported by the National Institutes of Health Common Fund, has the goal of characterizing the microbial communities that inhabit and interact with the human body in sickness and in health. In two Articles in this issue of Nature, the HMP Consortium presents the first population-scale details of the organismal and functional composition of the microbiota across five areas of the body. An associated News & Views discusses the initial results — which, along with those of a series of co-publications, already constitute the most extensive catalogue of organisms and genes related to the human microbiome yet published — and highlights some of the major questions that the project will tackle in the next few years.
A variety of microbial communities and their genes (the microbiome) exist throughout the human body, with fundamental roles in human health and disease. The National Institutes of Health (NIH)-funded Human Microbiome Project Consortium has established a population-scale framework to develop metagenomic protocols, resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 or 18 body sites up to three times, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far. In parallel, approximately 800 reference strains isolated from the human body have been sequenced. Collectively, these data represent the largest resource describing the abundance and variety of the human microbiome, while providing a framework for current and future studies. The Human Microbiome Project Consortium has established a population-scale framework to study a variety of microbial communities that exist throughout the human body, enabling the generation of a range of quality-controlled data as well as community resources. The Human Microbiome Project (HMP), supported by the National Institutes of Health Common Fund, has the goal of characterizing the microbial communities that inhabit and interact with the human body in sickness and in health. In two Articles in this issue of Nature, the HMP Consortium presents the first population-scale details of the organismal and functional composition of the microbiota across five areas of the body. An associated News & Views discusses the initial results — which, along with those of a series of co-publications, already constitute the most extensive catalogue of organisms and genes related to the human microbiome yet published — and highlights some of the major questions that the project will tackle in the next few years.
The U.S. Department of Energy Systems Biology Knowledgebase (KBase, http://kbase.us) is an open-source software and data platform designed to tackle the grand challenge of systems biology—predicting and designing biological function at scales ranging from the biomolecular to the ecological. KBase is available for anyone to use, and enables researchers to collaboratively generate, test, compare, and share hypotheses about biological functions; perform large analyses on scalable computing infrastructure; and combine experimental evidence and conclusions to model plant and microbial physiology and community dynamics. The KBase platform has extensible analytical capabilities that currently include (meta)genome assembly, annotation, comparative genomics, transcriptomics, and metabolic modeling; a web-based user interface that supports building, sharing, and publishing reproducible and well-annotated analyses with integrated data; and a software development kit that enables the community to add functionality to the system.
Author(s): de Bernardis, P.; Ade, P.A.R.; Bock, J.J.; Bond, J.R.; Borrill, J.; Boscaleri, A.; Coble, K.; Crill, B.P.; De Gasperis, G.; Farese, P.C.; Ferreira, P.G.; Ganga, K.; Giacometti, M.; Hivon, E.; Hristov, V.V.; Iacoangeli, A.; Jaffe, A.H.; Lange, A.E.; Martinis, L.; Masi, S.; Mason, P.; Mauskopf, P.D.; Melchiorri, A.; Miglio, L.; Montroy, T.; Netterfield, C.B.; Pascale, E.; Piacentini, F.; Pogosyan, D.; Prunet, S.; Rao, S.; Romeo, G.; Ruhl, J.E.; Scaramuzzi, F.; Sforna, D.; Vittorio, N.
We present a map and an angular power spectrum of the anisotropy of the cosmic microwave background (CMB) from the first flight of the Millimeter-wave Anisotropy Experiment Imaging Array (MAXIMA). MAXIMA is a balloon-borne experiment with an array of 16 bolometric photometers operated at 100 mK. MAXIMA observed a 124 deg2 region of the sky with 10' resolution at frequencies of 150, 240, and 410 GHz. The data were calibrated using in-flight measurements of the CMB dipole anisotropy. A map of the CMB anisotropy was produced from three 150 and one 240 GHz photometer without need for foreground subtractions. Analysis of this CMB map yields a power spectrum for the CMB anisotropy over the range 36 ≤ l ≤ 785. The spectrum shows a peak with an amplitude of 78 ± 6 μK at l ≃ 220 and an amplitude varying between ~40 and ~50 μK for 400 ⪝ l ⪝ 785.
Colloidal quantum rods of cadmium selenide (CdSe) exhibit linearly polarized emission. Empirical pseudopotential calculations predict that slightly elongated CdSe nanocrystals have polarized emission along the long axis, unlike spherical dots, which emit plane-polarized light. Single-molecule luminescence spectroscopy measurements on CdSe quantum rods with an aspect ratio between 1 and 30 confirm a sharp transition from nonpolarized to purely linearly polarized emission at an aspect ratio of 2. Linearly polarized luminescent chromophores are highly desirable in a variety of applications.
SUMMARY: VISTA is a program for visualizing global DNA sequence alignments of arbitrary length. It has a clean output, allowing for easy identification of similarity, and is easily configurable, enabling the visualization of alignments of various lengths at different levels of resolution. It is currently available on the web, thus allowing for easy access by all researchers. AVAILABILITY: VISTA server is available on the web at http://www-gsd.lbl.gov/vista. The source code is available upon request. CONTACT: vista@lbl.gov
MOTIVATION: Protein fold recognition is an important approach to structure discovery without relying on sequence similarity. We study this approach with new multi-class classification methods and examined many issues important for a practical recognition system. RESULTS: Most current discriminative methods for protein fold prediction use the one-against-others method, which has the well-known 'False Positives' problem. We investigated two new methods: the unique one-against-others and the all-against-all methods. Both improve prediction accuracy by 14-110% on a dataset containing 27 SCOP folds. We used the Support Vector Machine (SVM) and the Neural Network (NN) learning methods as base classifiers. SVMs converges fast and leads to high accuracy. When scores of multiple parameter datasets are combined, majority voting reduces noise and increases recognition accuracy. We examined many issues involved with large number of classes, including dependencies of prediction accuracy on the number of folds and on the number of representatives in a fold. Overall, recognition systems achieve 56% fold prediction accuracy on a protein test dataset, where most of the proteins have below 25% sequence identity with the proteins used in training.
An important application of graph partitioning is data clustering using a graph model - the pairwise similarities between all data objects form a weighted graph adjacency matrix that contains all necessary information for clustering. In this paper, we propose a new algorithm for graph partitioning with an objective function that follows the min-max clustering principle. The relaxed version of the optimization of the min-max cut objective function leads to the Fiedler vector in spectral graph partitioning. Theoretical analyses of min-max cut indicate that it leads to balanced partitions, and lower bounds are derived. The min-max cut algorithm is tested on newsgroup data sets and is found to out-perform other current popular partitioning/clustering methods. The linkage-based refinements to the algorithm further improve the quality of clustering substantially. We also demonstrate that a linearized search order based on linkage differential is better than that based on the Fiedler vector, providing another effective partitioning method.
This paper presents a measurement of the angular power spectrum of the Cosmic Microwave Background from l=75 to l=1025 (~10' to 5 degrees) from a combined analysis of four 150 GHz channels in the BOOMERANG experiment. The spectrum contains multiple peaks and minima, as predicted by standard adiabatic-inflationary models in which the primordial plasma undergoes acoustic oscillations. These results significantly constrain the values of Omega_tot, Omega_b h^2, Omega_c h^2 and n_s.
Catalyst discovery and optimization is key to solving many societal and energy challenges including solar fuel synthesis, long-term energy storage, and renewable fertilizer production. Despite considerable effort by the catalysis community to apply machine learning models to the computational catalyst discovery process, it remains an open challenge to build models that can generalize across both elemental compositions of surfaces and adsorbate identity/configurations, perhaps because datasets have been smaller in catalysis than in related fields. To address this, we developed the OC20 dataset, consisting of 1,281,040 density functional theory (DFT) relaxations (∼264,890,000 single-point evaluations) across a wide swath of materials, surfaces, and adsorbates (nitrogen, carbon, and oxygen chemistries). We supplemented this dataset with randomly perturbed structures, short timescale molecular dynamics, and electronic structure analyses. The dataset comprises three central tasks indicative of day-to-day catalyst modeling and comes with predefined train/validation/test splits to facilitate direct comparisons with future model development efforts. We applied three state-of-the-art graph neural network models (CGCNN, SchNet, and DimeNet++) to each of these tasks as baseline demonstrations for the community to build on. In almost every task, no upper limit on model size was identified, suggesting that even larger models are likely to improve on initial results. The dataset and baseline models are both provided as open resources as well as a public leader board to encourage community contributions to solve these important tasks.
Machine learning (ML) provides novel and powerful ways of accurately and efficiently recognizing complex patterns, emulating nonlinear dynamics, and predicting the spatio-temporal evolution of weather and climate processes. Off-the-shelf ML models, however, do not necessarily obey the fundamental governing laws of physical systems, nor do they generalize well to scenarios on which they have not been trained. We survey systematic approaches to incorporating physics and domain knowledge into ML models and distill these approaches into broad categories. Through 10 case studies, we show how these approaches have been used successfully for emulating, downscaling, and forecasting weather and climate processes. The accomplishments of these studies include greater physical consistency, reduced training time, improved data efficiency, and better generalization. Finally, we synthesize the lessons learned and identify scientific, diagnostic, computational, and resource challenges for developing truly robust and reliable physics-informed ML models for weather and climate processes. This article is part of the theme issue 'Machine learning for weather and climate modelling'.
The popular K-means clustering partitions a data set by minimiz-ing a sum-of-squares cost function. A coordinate descend method is then used to nd local minima. In this paper we show that the minimization can be reformulated as a trace maximization problem associated with the Gram matrix of the data vectors. Furthermore, we show that a relaxed version of the trace maximization problem possesses global optimal solutions which can be obtained by com-puting a partial eigendecomposition of the Gram matrix, and the cluster assignment for each data vectors can be found by comput-ing a pivoted QR decomposition of the eigenvector matrix. As a by-product we also derive a lower bound for the minimum of the sum-of-squares cost function. 1
Selecting a small subset of genes out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. Feature sets obtained through the minimum redundancy - maximum relevance framework represent broader spectrum of characteristics of phenotypes than those obtained through standard ranking methods; they are more robust, generalize well to unseen data, and lead to significantly improved classifications in extensive experiments on 5 gene expressions data sets.
Normal mode analysis of proteins of various sizes, ranging from 46 (crambin) up to 858 residues (dimeric citrate synthase) were performed, by using standard approaches, as well as a recently proposed method that rests on the hypothesis that low-frequency normal modes of proteins can be described as pure rigid-body motions of blocks of consecutive amino-acid residues. Such a hypothesis is strongly supported by our results, because we show that the latter method, named RTB, yields very accurate approximations for the low-frequency normal modes of all proteins considered. Moreover, the quality of the normal modes thus obtained depends very little on the way the polypeptidic chain is split into blocks. Noteworthy, with six amino-acids per block, the normal modes are almost as accurate as with a single amino-acid per block. In this case, for a protein of n residues and N atoms, the RTB method requires the diagonalization of an n x n matrix, whereas standard procedures require the diagonalization of a 3N x 3N matrix. Being a fast method, our approach can be useful for normal mode analyses of large systems, paving the way for further developments and applications in contexts for which the normal modes are needed frequently, as for example during molecular dynamics calculations.
AMReX is a C++ software framework that supports the development of block-structured adaptive mesh refinement (AMR) algorithms for solving systems of partial differential equations (PDEs) with complex boundary conditions on current and emerging architectures.
Understanding the most efficient design and utilization of emerging multicore systems is one of the most challenging questions faced by the mainstream and scientific computing industries in several decades. Our work explores multicore stencil (nearest-neighbor) computations --- a class of algorithms at the heart of many structured grid codes, including PDF solvers. We develop a number of effective optimization strategies, and build an auto-tuning environment that searches over our optimizations and their parameters to minimize runtime, while maximizing performance portability. To evaluate the effectiveness of these strategies we explore the broadest set of multicore architectures in the current HPC literature, including the Intel Clovertown, AMD Barcelona, Sun Victoria Falls, IBM QS22 PowerXCell 8i, and NVIDIA GTX280. Overall, our auto-tuning optimization methodology results in the fastest multicore stencil performance to date. Finally, we present several key insights into the architectural tradeoffs of emerging multicore designs and their implications on scientific algorithm development.
We set new constraints on a seven-dimensional space of cosmological parameters within the class of inflationary adiabatic models. We use the angular power spectrum of the cosmic microwave background measured over a wide range of \\ell in the first flight of the MAXIMA balloon-borne experiment (MAXIMA-1) and the low \\ell results from COBE/DMR. We find constraints on the total energy density of the universe, \\Omega=1.0^{+0.15}_{-0.30}, the physical density of baryons, \\Omega_{b}h^2=0.03 +/- 0.01, the physical density of cold dark matter, \\Omega_{cdm}h^2=0.2^{+0.2}_{-0.1}$, and the spectral index of primordial scalar fluctuations, n_s=1.08+/-0.1, all at the 95% confidence level. By combining our results with measurements of high-redshift supernovae we constrain the value of the cosmological constant and the fractional amount of pressureless matter in the universe to 0.45<\\Omega_\\Lambda<0.75 and 0.25<\\Omega_{m}<0.50, at the 95% confidence level. Our results are consistent with a flat universe and the shape parameter deduced from large scale structure, and in marginal agreement with the baryon density from big bang nucleosynthesis.
The method of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] to model Landau damping has been recently applied to the moments of the gyrokinetic equation with curvature drift by Waltz, Dominguez, and Hammett [Phys. Fluids B 4, 3138 (1992)]. The higher moments are truncated in terms of the lower moments (density, parallel velocity, and parallel and perpendicular pressure) by modeling the deviation from a perturbed Maxwellian to fit the kinetic response function at all values of the kinetic parameters: k∥vth/ω, b=(k⊥ρ)2/2, and ωD/ω. Here the resulting gyro-Landau fluid equations are applied to the simulation of ion temperature gradient (ITG) mode turbulence in toroidal geometry using a novel three-dimensional (3-D) nonlinear ballooning mode representation. The representation is a Fourier transform of a field line following basis (ky′,kx′,z′) with periodicity in toroidal and poloidal angles. Particular emphasis is given to the role of nonlinearly generated n=0 (ky′ = 0, kx′ ≠ 0) ‘‘radial modes’’ in stabilizing the transport from the finite-n ITG ballooning modes. Detailing the parametric dependence of toroidal ITG turbulence is a key result.
The emerging technique of serial X-ray diffraction, in which diffraction data are collected from samples flowing across a pulsed X-ray source at repetition rates of 100 Hz or higher, has necessitated the development of new software in order to handle the large data volumes produced. Sorting of data according to different criteria and rapid filtering of events to retain only diffraction patterns of interest results in significant reductions in data volume, thereby simplifying subsequent data analysis and management tasks. Meanwhile the generation of reduced data in the form of virtual powder patterns, radial stacks, histograms and other meta data creates data set summaries for analysis and overall experiment evaluation. Rapid data reduction early in the analysis pipeline is proving to be an essential first step in serial imaging experiments, prompting the authors to make the tool described in this article available to the general community. Originally developed for experiments at X-ray free-electron lasers, the software is based on a modular facility-independent library to promote portability between different experiments and is available under version 3 or later of the GNU General Public License.