QB3
nonprofitSan Francisco, United States
Research output, citation impact, and the most-cited recent papers from QB3 (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from QB3
Techniques for systematically monitoring protein translation have lagged far behind methods for measuring messenger RNA (mRNA) levels. Here, we present a ribosome-profiling strategy that is based on the deep sequencing of ribosome-protected mRNA fragments and enables genome-wide investigation of translation with subcodon resolution. We used this technique to monitor translation in budding yeast under both rich and starvation conditions. These studies defined the protein sequences being translated and found extensive translational control in both determining absolute protein abundance and responding to environmental stress. We also observed distinct phases during translation that involve a large decrease in ribosome density going from early to late peptide elongation as well as widespread regulated initiation at non-adenine-uracil-guanine (AUG) codons. Ribosome profiling is readily adaptable to other organisms, making high-precision investigation of protein translation experimentally accessible.
Spurred by advances in processing power, memory, storage, and an unprecedented wealth of data, computers are being asked to tackle increasingly complex learning tasks, often with astonishing success. Computers have now mastered a popular variant of poker, learned the laws of physics from experimental data, and become experts in video games - tasks that would have been deemed impossible not too long ago. In parallel, the number of companies centered on applying complex data analysis to varying industries has exploded, and it is thus unsurprising that some analytic companies are turning attention to problems in health care. The purpose of this review is to explore what problems in medicine might benefit from such learning approaches and use examples from the literature to introduce basic concepts in machine learning. It is important to note that seemingly large enough medical data sets and adequate learning algorithms have been available for many decades, and yet, although there are thousands of papers applying machine learning algorithms to medical data, very few have contributed meaningfully to clinical care. This lack of impact stands in stark contrast to the enormous relevance of machine learning to many other industries. Thus, part of my effort will be to identify what obstacles there may be to changing the practice of medicine through statistical learning approaches, and discuss how these might be overcome.
autophagic responses. Here, we critically discuss current methods of assessing autophagy and the information they can, or cannot, provide. Our ultimate goal is to encourage intellectual and technical innovation in the field.
Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.
The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community.
Abstract The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes) 1 . In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.
This review presents recommended nomenclature for the biosynthesis of ribosomally synthesized and post-translationally modified peptides (RiPPs), a rapidly growing class of natural products. The current knowledge regarding the biosynthesis of the >20 distinct compound classes is also reviewed, and commonalities are discussed.
Many bacterial clustered regularly interspaced short palindromic repeats (CRISPR)-CRISPR-associated (Cas) systems employ the dual RNA-guided DNA endonuclease Cas9 to defend against invading phages and conjugative plasmids by introducing site-specific double-stranded breaks in target DNA. Target recognition strictly requires the presence of a short protospacer adjacent motif (PAM) flanking the target site, and subsequent R-loop formation and strand scission are driven by complementary base pairing between the guide RNA and target DNA, Cas9-DNA interactions, and associated conformational changes. The use of CRISPR-Cas9 as an RNA-programmable DNA targeting and editing platform is simplified by a synthetic single-guide RNA (sgRNA) mimicking the natural dual trans-activating CRISPR RNA (tracrRNA)-CRISPR RNA (crRNA) structure. This review aims to provide an in-depth mechanistic and structural understanding of Cas9-mediated RNA-guided DNA targeting and cleavage. Molecular insights from biochemical and structural studies provide a framework for rational engineering aimed at altering catalytic function, guide RNA specificity, and PAM requirements and reducing off-target activity for the development of Cas9-based therapies against genetic diseases.
Bacterial and fungal secondary metabolism is a rich source of novel bioactive compounds with potential pharmaceutical applications as antibiotics, anti-tumor drugs or cholesterol-lowering drugs. To find new drug candidates, microbiologists are increasingly relying on sequencing genomes of a wide variety of microbes. However, rapidly and reliably pinpointing all the potential gene clusters for secondary metabolites in dozens of newly sequenced genomes has been extremely challenging, due to their biochemical heterogeneity, the presence of unknown enzymes and the dispersed nature of the necessary specialized bioinformatics tools and resources. Here, we present antiSMASH (antibiotics & Secondary Metabolite Analysis Shell), the first comprehensive pipeline capable of identifying biosynthetic loci covering the whole range of known secondary metabolite compound classes (polyketides, non-ribosomal peptides, terpenes, aminoglycosides, aminocoumarins, indolocarbazoles, lantibiotics, bacteriocins, nucleosides, beta-lactams, butyrolactones, siderophores, melanins and others). It aligns the identified regions at the gene cluster level to their nearest relatives from a database containing all other known gene clusters, and integrates or cross-links all previously available secondary-metabolite specific gene analysis methods in one interactive view. antiSMASH is available at http://antismash.secondarymetabolites.org.
Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software.
Genome sequencing projects have resulted in a rapid increase in the number of known protein sequences. In contrast, only about one-hundredth of these sequences have been characterized using experimental structure determination methods. Computational protein structure modeling techniques have the potential to bridge this sequence-structure gap. This chapter presents an example that illustrates the use of MODELLER to construct a comparative model for a protein with unknown structure. Automation of similar protocols (correction of protcols) has resulted in models of useful accuracy for domains in more than half of all known protein sequences.
Numerous genetic and environmental insults impede the ability of cells to properly fold and posttranslationally modify secretory and transmembrane proteins in the endoplasmic reticulum (ER), leading to a buildup of misfolded proteins in this organelle--a condition called ER stress. ER-stressed cells must rapidly restore protein-folding capacity to match protein-folding demand if they are to survive. In the presence of high levels of misfolded proteins in the ER, an intracellular signaling pathway called the unfolded protein response (UPR) induces a set of transcriptional and translational events that restore ER homeostasis. However, if ER stress persists chronically at high levels, a terminal UPR program ensures that cells commit to self-destruction. Chronic ER stress and defects in UPR signaling are emerging as key contributors to a growing list of human diseases, including diabetes, neurodegeneration, and cancer. Hence, there is much interest in targeting components of the UPR as a therapeutic strategy to combat these ER stress-associated pathologies.
Abstract Comprehensive, high-quality reference genomes are required for functional characterization and taxonomic assignment of the human gut microbiota. We present the Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes. These genomes encode >170 million protein sequences, which we collated in the Unified Human Gastrointestinal Protein (UHGP) catalog. The UHGP more than doubles the number of gut proteins in comparison to those present in the Integrated Gene Catalog. More than 70% of the UHGG species lack cultured representatives, and 40% of the UHGP lack functional annotations. Intraspecies genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which are specific to individual human populations. The UHGG and UHGP collections will enable studies linking genotypes to phenotypes in the human gut microbiome.
Markov state models of molecular kinetics (MSMs), in which the long-time statistical dynamics of a molecule is approximated by a Markov chain on a discrete partition of configuration space, have seen widespread use in recent years. This approach has many appealing characteristics compared to straightforward molecular dynamics simulation and analysis, including the potential to mitigate the sampling problem by extracting long-time kinetic information from short trajectories and the ability to straightforwardly calculate expectation values and statistical uncertainties of various stationary and dynamical molecular observables. In this paper, we summarize the current state of the art in generation and validation of MSMs and give some important new results. We describe an upper bound for the approximation error made by modeling molecular dynamics with a MSM and we show that this error can be made arbitrarily small with surprisingly little effort. In contrast to previous practice, it becomes clear that the best MSM is not obtained by the most metastable discretization, but the MSM can be much improved if non-metastable states are introduced near the transition states. Moreover, we show that it is not necessary to resolve all slow processes by the state space partitioning, but individual dynamical processes of interest can be resolved separately. We also present an efficient estimator for reversible transition matrices and a robust test to validate that a MSM reproduces the kinetics of the molecular dynamics data.
The unfolded protein response (UPR) allows the endoplasmic reticulum (ER) to recover from the accumulation of misfolded proteins, in part by increasing its folding capacity. Inositol-requiring enzyme-1 (IRE1) promotes this remodeling by detecting misfolded ER proteins and activating a transcription factor, X-box-binding protein 1, through endonucleolytic cleavage of its messenger RNA (mRNA). Here, we report that IRE1 independently mediates the rapid degradation of a specific subset of mRNAs, based both on their localization to the ER membrane and on the amino acid sequence they encode. This response is well suited to complement other UPR mechanisms because it could selectively halt production of proteins that challenge the ER and clear the translocation and folding machinery for the subsequent remodeling process.
Introduction Bacteria and archaea defend themselves against invasive DNA using adaptive immune systems comprising CRISPR (clustered regularly interspaced short palindromic repeats) loci and CRISPR-associated (Cas) genes. In association with Cas proteins, small CRISPR RNAs (crRNAs) guide the detection and cleavage of complementary DNA sequences. Type II CRISPR systems employ the RNA-guided endonuclease Cas9 to recognize and cleave double-stranded DNA (dsDNA) targets using conserved RuvC and HNH nuclease domains. Cas9-mediated cleavage is strictly dependent on the presence of a protospacer adjacent motif (PAM) in the target DNA. Recently, the biochemical properties of Cas9–guide RNA complexes have been harnessed for various genetic engineering applications and RNA-guided transcriptional control. Despite these ongoing successes, the structural basis for guide RNA recognition and DNA targeting by Cas9 is still unknown. Rationale To compare the architectures and domain organization of diverse Cas9 proteins, the atomic structures of Cas9 from Streptococcus pyogenes (SpyCas) and Actinomyces naeslundii (AnaCas9) were determined by x-ray crystallography. Crosslinking of target DNA containing 5-bromodeoxyuridines was conducted to identify PAM-interacting regions in SpyCas9. To test functional interactions with nucleic acid ligands, structure-based mutant SpyCas9 proteins were assayed for endonuclease activity with radiolabeled oligonucleotide dsDNA targets, and target DNA binding was monitored by electrophoretic mobility shift assays. To compare conformations of Cas9 in different states of nucleic acid binding, three-dimensional reconstructions of apo-SpyCas9, SpyCas9:RNA, and SpyCas9:RNA:DNA were obtained by negative-stain single-particle electron microscopy. Guide RNA and target DNA positions were determined with streptavidin labeling. Exonuclease protection assays were carried out to determine the extent of Cas9–target DNA interactions. Results The 2.6 Å–resolution structure of apo-SpyCas9 reveals a bilobed architecture comprising a nuclease domain lobe and an α-helical lobe. Both lobes contain conserved clefts that may function in nucleic acid binding. Photocrosslinking experiments show that the PAM in target DNA is engaged by two tryptophan-containing flexible loops, and mutations of both loops impair target DNA binding and cleavage. The 2.2 Å–resolution crystal structure of AnaCas9 reveals the conserved structural core shared by all Cas9 enzyme subtypes, and both SpyCas9 and AnaCas9 adopt autoinhibited conformations in their apo forms. The electron microscopic (EM) reconstructions of SpyCas9:RNA and SpyCas9:RNA:DNA complexes reveal that guide RNA binding results in a conformational rearrangement and formation of a central channel for target DNA binding. Site-specific labeling of guide RNA and target DNA define the orientations of nucleic acids in the target-bound complex. Conclusion The SpyCas9 and AnaCas9 structures define the molecular architecture of the Cas9 enzyme family in which a conserved structural core encompasses the two nuclease domains responsible for DNA cleavage, while structurally divergent regions, including the PAM recognition loops, are likely responsible for distinct guide RNA and PAM specificities. Cas9 enzymes adopt a catalytically inactive conformation in the apo state, necessitating structural activation for DNA recognition and cleavage. Our EM analysis shows that by triggering a conformational rearrangement in Cas9, the guide RNA acts as a critical determinant of target DNA binding.
BACKGROUND: Heart failure with preserved ejection fraction (HFpEF) is a heterogeneous clinical syndrome in need of improved phenotypic classification. We sought to evaluate whether unbiased clustering analysis using dense phenotypic data (phenomapping) could identify phenotypically distinct HFpEF categories. METHODS AND RESULTS: We prospectively studied 397 patients with HFpEF and performed detailed clinical, laboratory, ECG, and echocardiographic phenotyping of the study participants. We used several statistical learning algorithms, including unbiased hierarchical cluster analysis of phenotypic data (67 continuous variables) and penalized model-based clustering, to define and characterize mutually exclusive groups making up a novel classification of HFpEF. All phenomapping analyses were performed by investigators blinded to clinical outcomes, and Cox regression was used to demonstrate the clinical validity of phenomapping. The mean age was 65±12 years; 62% were female; 39% were black; and comorbidities were common. Although all patients met published criteria for the diagnosis of HFpEF, phenomapping analysis classified study participants into 3 distinct groups that differed markedly in clinical characteristics, cardiac structure/function, invasive hemodynamics, and outcomes (eg, phenogroup 3 had an increased risk of HF hospitalization [hazard ratio, 4.2; 95% confidence interval, 2.0-9.1] even after adjustment for traditional risk factors [P<0.001]). The HFpEF phenogroup classification, including its ability to stratify risk, was successfully replicated in a prospective validation cohort (n=107). CONCLUSIONS: Phenomapping results in a novel classification of HFpEF. Statistical learning algorithms applied to dense phenotypic data may allow improved classification of heterogeneous clinical syndromes, with the ultimate goal of defining therapeutically homogeneous patient subclasses.
ModBase (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by ModPipe, an automated modeling pipeline that relies primarily on Modeller for fold assignment, sequence-structure alignment, model building and model assessment (http://salilab.org/modeller/). ModBase currently contains 10,355,444 reliable models for domains in 2,421,920 unique protein sequences. ModBase allows users to update comparative models on demand, and request modeling of additional sequences through an interface to the ModWeb modeling server (http://salilab.org/modweb). ModBase models are available through the ModBase interface as well as the Protein Model Portal (http://www.proteinmodelportal.org/). Recently developed associated resources include the SALIGN server for multiple sequence and structure alignment (http://salilab.org/salign), the ModEval server for predicting the accuracy of protein structure models (http://salilab.org/modeval), the PCSS server for predicting which peptides bind to a given protein (http://salilab.org/pcss) and the FoXS server for calculating and fitting Small Angle X-ray Scattering profiles (http://salilab.org/foxs).
Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease. Two international consortia this week report the whole genome sequences of the blood flukes Schistosoma mansoni and Schistosoma japonicum, two of the three major pathogens that cause schistosomiasis, also called bilharzia. Schistosomiasis is a 'neglected' tropical disease affecting more than 200 million people in 76 countries. Analyses of the new genome sequences provide insights into the molecular architecture and host interactions of these pathogens, as well as avenues for future development of targeted interventions for this disease. These are the first two flatworm genomes to be sequenced, so they offer new angles on the early events in animal evolution, in particular the determination of body pattern and the development of tissues into organs. Schistosoma mansoni and Schistosoma japonicum are the pathogenic agents that cause the tropical disease schistosomiasis. Here, and in an accompanying paper, the genomes of these two flatworms are sequenced and analysed. The results provide insights into the molecular architecture and host interactions of the flatworms, as well as avenues for future development of targeted interventions for schistosomiasis.
Saccharomyces cerevisiae is an increasingly attractive host for synthetic biology because of its long history in industrial fermentations. However, until recently, most synthetic biology systems have focused on bacteria. While there is a wealth of resources and literature about the biology of yeast, it can be daunting to navigate and extract the tools needed for engineering applications. Here we present a versatile engineering platform for yeast, which contains both a rapid, modular assembly method and a basic set of characterized parts. This platform provides a framework in which to create new designs, as well as data on promoters, terminators, degradation tags, and copy number to inform those designs. Additionally, we describe genome-editing tools for making modifications directly to the yeast chromosomes, which we find preferable to plasmids due to reduced variability in expression. With this toolkit, we strive to simplify the process of engineering yeast by standardizing the physical manipulations and suggesting best practices that together will enable more straightforward translation of materials and data from one group to another. Additionally, by relieving researchers of the burden of technical details, they can focus on higher-level aspects of experimental design.