Allen Institute
nonprofitSeattle, United States
Research output, citation impact, and the most-cited recent papers from Allen Institute (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Allen Institute
We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community.
Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the predictive, descriptive, relevant (PDR) framework for discussing interpretations. The PDR framework provides 3 overarching desiderata for evaluation: predictive accuracy, descriptive accuracy, and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post hoc categories, with subgroups including sparsity, modularity, and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often underappreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.
Large-scale genetic analysis of lethal phenotypes has elucidated the molecular underpinnings of many biological processes. Using the bacterial clustered regularly interspaced short palindromic repeats (CRISPR) system, we constructed a genome-wide single-guide RNA library to screen for genes required for proliferation and survival in a human cancer cell line. Our screen revealed the set of cell-essential genes, which was validated with an orthogonal gene-trap-based screen and comparison with yeast gene knockouts. This set is enriched for genes that encode components of fundamental pathways, are expressed at high levels, and contain few inactivating polymorphisms in the human population. We also uncovered a large group of uncharacterized genes involved in RNA processing, a number of whose products localize to the nucleolus. Last, screens in additional cell lines showed a high degree of overlap in gene essentiality but also revealed differences specific to each cell line and cancer type that reflect the developmental origin, oncogenic drivers, paralogous gene expression pattern, and chromosomal structure of each line. These results demonstrate the power of CRISPR-based screens and suggest a general strategy for identifying liabilities in cancer cells.
The mTOR complex 1 (mTORC1) protein kinase is a master growth regulator that is stimulated by amino acids. Amino acids activate the Rag guanosine triphosphatases (GTPases), which promote the translocation of mTORC1 to the lysosomal surface, the site of mTORC1 activation. We found that the vacuolar H(+)-adenosine triphosphatase ATPase (v-ATPase) is necessary for amino acids to activate mTORC1. The v-ATPase engages in extensive amino acid-sensitive interactions with the Ragulator, a scaffolding complex that anchors the Rag GTPases to the lysosome. In a cell-free system, ATP hydrolysis by the v-ATPase was necessary for amino acids to regulate the v-ATPase-Ragulator interaction and promote mTORC1 translocation. Results obtained in vitro and in human cells suggest that amino acid signaling begins within the lysosomal lumen. These results identify the v-ATPase as a component of the mTOR pathway and delineate a lysosome-associated machinery for amino acid sensing.
To facilitate scalable profiling of single cells, we developed split-pool ligation-based transcriptome sequencing (SPLiT-seq), a single-cell RNA-seq (scRNA-seq) method that labels the cellular origin of RNA through combinatorial barcoding. SPLiT-seq is compatible with fixed cells or nuclei, allows efficient sample multiplexing, and requires no customized equipment. We used SPLiT-seq to analyze 156,049 single-nucleus transcriptomes from postnatal day 2 and 11 mouse brains and spinal cords. More than 100 cell types were identified, with gene expression patterns corresponding to cellular function, regional specificity, and stage of differentiation. Pseudotime analysis revealed transcriptional programs driving four developmental lineages, providing a snapshot of early postnatal development in the murine central nervous system. SPLiT-seq provides a path toward comprehensive single-cell transcriptomic analysis of other similarly complex multicellular systems.
Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new goals, and (2) data inefficiency, i.e., the model requires several (and often costly) episodes of trial and error to converge, which makes it impractical to be applied to real-world scenarios. In this paper, we address these two issues and apply our model to target-driven visual navigation. To address the first issue, we propose an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization. To address the second issue, we propose the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine. Our framework enables agents to take actions and interact with objects. Hence, we can collect a huge number of training samples efficiently. We show that our proposed method (1) converges faster than the state-of-the-art deep reinforcement learning methods, (2) generalizes across targets and scenes, (3) generalizes to a real robot scenario with a small amount of fine-tuning (although the model is trained in simulation), (4) is end-to-end trainable and does not need feature engineering, feature matching between frames or 3D reconstruction of the environment.
In this paper we provide an extensive evaluation of fixation prediction and salient object segmentation algorithms as well as statistics of major datasets. Our analysis identifies serious design flaws of existing salient object benchmarks, called the dataset design bias, by over emphasising the stereotypical concepts of saliency. The dataset design bias does not only create the discomforting disconnection between fixations and salient object segmentation, but also misleads the algorithm designing. Based on our analysis, we propose a new high quality dataset that offers both fixation and salient object segmentation ground-truth. With fixations and salient object being presented simultaneously, we are able to bridge the gap between fixations and salient objects, and propose a novel method for salient object segmentation. Finally, we report significant benchmark progress on 3 existing datasets of segmenting salient objects.
From sensing leucine to metabolic control The mTORC1 protein kinase complex plays central roles in regulating cell growth and metabolism and is implicated in common human diseases such as diabetes and cancer. The level of the amino acid leucine tells an organism a lot about its physiological state, including how much food is available, how much insulin is going to be needed, and whether new muscle mass can be made (see the Perspective by Buel and Blenis). Wolfson et al. identified a biochemical sensor of leucine, Sestrin2, which connects the concentration of leucine to the control of organismal metabolism and growth. When leucine bound to Sestrin2, it was released from a complex with the mTORC1 regulatory factor GATOR2, activating the mTORC1 complex. Saxton et al. describe the crystal structure of Sestrin2 and show how it specifically detects leucine. Aylett et al. determined the structure of human mTORC1 by cryoelectron microscopy and the crystal structure of a regulatory subunit, Raptor. The results reveal the structural basis for the function and intricate regulation of this important enzyme, which is also a strategic drug target. Science , this issue p. 43 , p. 48 , p. 53 ; see also p. 25
A major proportion of the mammalian transcriptome comprises long RNAs that have little or no protein-coding capacity (ncRNAs). Only a handful of such transcripts have been examined in detail, and it is unknown whether this class of transcript is generally functional or merely artifact. Using in situ hybridization data from the Allen Brain Atlas, we identified 849 ncRNAs (of 1,328 examined) that are expressed in the adult mouse brain and found that the majority were associated with specific neuroanatomical regions, cell types, or subcellular compartments. Examination of their genomic context revealed that the ncRNAs were expressed from diverse places including intergenic, intronic, and imprinted loci and that many overlap with, or are transcribed antisense to, protein-coding genes of neurological importance. Comparisons between the expression profiles of ncRNAs and their associated protein-coding genes revealed complex relationships that, in combination with the specific expression profiles exhibited at both regional and subcellular levels, are inconsistent with the notion that they are transcriptional noise or artifacts of chromatin remodeling. Our results show that the majority of ncRNAs are expressed in the brain and provide strong evidence that the majority of processed transcripts with no protein-coding capacity function intrinsically as RNAs.
The mammalian target of rapamycin (mTOR) protein kinase is a master growth promoter that nucleates two complexes, mTORC1 and mTORC2. Despite the diverse processes controlled by mTOR, few substrates are known. We defined the mTOR-regulated phosphoproteome by quantitative mass spectrometry and characterized the primary sequence motif specificity of mTOR using positional scanning peptide libraries. We found that the phosphorylation response to insulin is largely mTOR dependent and that mTOR exhibits a unique preference for proline, hydrophobic, and aromatic residues at the +1 position. The adaptor protein Grb10 was identified as an mTORC1 substrate that mediates the inhibition of phosphoinositide 3-kinase typical of cells lacking tuberous sclerosis complex 2 (TSC2), a tumor suppressor and negative regulator of mTORC1. Our work clarifies how mTORC1 inhibits growth factor signaling and opens new areas of investigation in mTOR biology.
. Here we report a comprehensive and high-resolution transcriptomic and spatial cell-type atlas for the whole adult mouse brain. The cell-type atlas was created by combining a single-cell RNA-sequencing (scRNA-seq) dataset of around 7 million cells profiled (approximately 4.0 million cells passing quality control), and a spatial transcriptomic dataset of approximately 4.3 million cells using multiplexed error-robust fluorescence in situ hybridization (MERFISH). The atlas is hierarchically organized into 4 nested levels of classification: 34 classes, 338 subclasses, 1,201 supertypes and 5,322 clusters. We present an online platform, Allen Brain Cell Atlas, to visualize the mouse whole-brain cell-type atlas along with the single-cell RNA-sequencing and MERFISH datasets. We systematically analysed the neuronal and non-neuronal cell types across the brain and identified a high degree of correspondence between transcriptomic identity and spatial specificity for each cell type. The results reveal unique features of cell-type organization in different brain regions-in particular, a dichotomy between the dorsal and ventral parts of the brain. The dorsal part contains relatively fewer yet highly divergent neuronal types, whereas the ventral part contains more numerous neuronal types that are more closely related to each other. Our study also uncovered extraordinary diversity and heterogeneity in neurotransmitter and neuropeptide expression and co-expression patterns in different cell types. Finally, we found that transcription factors are major determinants of cell-type classification and identified a combinatorial transcription factor code that defines cell types across all parts of the brain. The whole mouse brain transcriptomic and spatial cell-type atlas establishes a benchmark reference atlas and a foundational resource for integrative investigations of cellular and circuit function, development and evolution of the mammalian brain.
Although we can increasingly measure transcription, chromatin, methylation, and other aspects of molecular biology at single-cell resolution, most assays survey only one aspect of cellular biology. Here we describe sci-CAR, a combinatorial indexing-based coassay that jointly profiles chromatin accessibility and mRNA (CAR) in each of thousands of single cells. As a proof of concept, we apply sci-CAR to 4825 cells, including a time series of dexamethasone treatment, as well as to 11,296 cells from the adult mouse kidney. With the resulting data, we compare the pseudotemporal dynamics of chromatin accessibility and gene expression, reconstruct the chromatin accessibility profiles of cell types defined by RNA profiles, and link cis-regulatory sites to their target genes on the basis of the covariance of chromatin accessibility and transcription across large numbers of single cells.
Understanding the spatial organization of gene expression with single-nucleotide resolution requires localizing the sequences of expressed RNA transcripts within a cell in situ. Here, we describe fluorescent in situ RNA sequencing (FISSEQ), in which stably cross-linked complementary DNA (cDNA) amplicons are sequenced within a biological sample. Using 30-base reads from 8102 genes in situ, we examined RNA expression and localization in human primary fibroblasts with a simulated wound-healing assay. FISSEQ is compatible with tissue sections and whole-mount embryos and reduces the limitations of optical resolution and noisy signals on single-molecule detection. Our platform enables massively parallel detection of genetic elements, including gene transcripts and molecular barcodes, and can be used to investigate cellular phenotype, gene regulation, and environment in situ.
INTRODUCTION The brain is responsible for cognition, behavior, and much of what makes us uniquely human. The development of the brain is a highly complex process, and this process is reliant on precise regulation of molecular and cellular events grounded in the spatiotemporal regulation of the transcriptome. Disruption of this regulation can lead to neuropsychiatric disorders. RATIONALE The regulatory, epigenomic, and transcriptomic features of the human brain have not been comprehensively compiled across time, regions, or cell types. Understanding the etiology of neuropsychiatric disorders requires knowledge not just of endpoint differences between healthy and diseased brains but also of the developmental and cellular contexts in which these differences arise. Moreover, an emerging body of research indicates that many aspects of the development and physiology of the human brain are not well recapitulated in model organisms, and therefore it is necessary that neuropsychiatric disorders be understood in the broader context of the developing and adult human brain. RESULTS Here we describe the generation and analysis of a variety of genomic data modalities at the tissue and single-cell levels, including transcriptome, DNA methylation, and histone modifications across multiple brain regions ranging in age from embryonic development through adulthood. We observed a widespread transcriptomic transition beginning during late fetal development and consisting of sharply decreased regional differences. This reduction coincided with increases in the transcriptional signatures of mature neurons and the expression of genes associated with dendrite development, synapse development, and neuronal activity, all of which were temporally synchronous across neocortical areas, as well as myelination and oligodendrocytes, which were asynchronous. Moreover, genes including MEF2C , SATB2 , and TCF4 , with genetic associations to multiple brain-related traits and disorders, converged in a small number of modules exhibiting spatial or spatiotemporal specificity. CONCLUSION We generated and applied our dataset to document transcriptomic and epigenetic changes across human development and then related those changes to major neuropsychiatric disorders. These data allowed us to identify genes, cell types, gene coexpression modules, and spatiotemporal loci where disease risk might converge, demonstrating the utility of the dataset and providing new insights into human development and disease. Spatiotemporal dynamics of human brain development and neuropsychiatric risks. Human brain development begins during embryonic development and continues through adulthood (top). Integrating data modalities (bottom left) revealed age- and cell type–specific properties and global patterns of transcriptional dynamics, including a late fetal transition (bottom middle). We related the variation in gene expression (brown, high; purple, low) to regulatory elements in the fetal and adult brains, cell type–specific signatures, and genetic loci associated with neuropsychiatric disorders (bottom right; gray circles indicate enrichment for corresponding features among module genes). Relationships depicted in this panel do not correspond to specific observations. CBC, cerebellar cortex; STR, striatum; HIP, hippocampus; MD, mediodorsal nucleus of thalamus; AMY, amygdala.
The Allen Brain Atlas (http://www.brain-map.org) provides a unique online public resource integrating extensive gene expression data, connectivity data and neuroanatomical information with powerful search and viewing tools for the adult and developing brain in mouse, human and non-human primate. Here, we review the resources available at the Allen Brain Atlas, describing each product and data type [such as in situ hybridization (ISH) and supporting histology, microarray, RNA sequencing, reference atlases, projection mapping and magnetic resonance imaging]. In addition, standardized and unique features in the web applications are described that enable users to search and mine the various data sets. Features include both simple and sophisticated methods for gene searches, colorimetric and fluorescent ISH image viewers, graphical displays of ISH, microarray and RNA sequencing data, Brain Explorer software for 3D navigation of anatomy and gene expression, and an interactive reference atlas viewer. In addition, cross data set searches enable users to query multiple Allen Brain Atlas data sets simultaneously. All of the Allen Brain Atlas resources can be accessed through the Allen Brain Atlas data portal.
. Here, using high-throughput transcriptomic and epigenomic profiling of more than 450,000 single nuclei in humans, marmoset monkeys and mice, we demonstrate a broadly conserved cellular makeup of this region, with similarities that mirror evolutionary distance and are consistent between the transcriptome and epigenome. The core conserved molecular identities of neuronal and non-neuronal cell types allow us to generate a cross-species consensus classification of cell types, and to infer conserved properties of cell types across species. Despite the overall conservation, however, many species-dependent specializations are apparent, including differences in cell-type proportions, gene expression, DNA methylation and chromatin state. Few cell-type marker genes are conserved across species, revealing a short list of candidate genes and regulatory mechanisms that are responsible for conserved features of homologous cell types, such as the GABAergic chandelier cells. This consensus transcriptomic classification allows us to use patch-seq (a combination of whole-cell patch-clamp recordings, RNA sequencing and morphological characterization) to identify corticospinal Betz cells from layer 5 in non-human primates and humans, and to characterize their highly specialized physiology and anatomy. These findings highlight the robust molecular underpinnings of cell-type diversity in M1 across mammals, and point to the genes and regulatory pathways responsible for the functional identity of cell types and their species-specific adaptations.
The gene expression program underlying the specification of human cell types is of fundamental interest. We generated human cell atlases of gene expression and chromatin accessibility in fetal tissues. For gene expression, we applied three-level combinatorial indexing to >110 samples representing 15 organs, ultimately profiling ~4 million single cells. We leveraged the literature and other atlases to identify and annotate hundreds of cell types and subtypes, both within and across tissues. Our analyses focused on organ-specific specializations of broadly distributed cell types (such as blood, endothelial, and epithelial), sites of fetal erythropoiesis (which notably included the adrenal gland), and integration with mouse developmental atlases (such as conserved specification of blood cells). These data represent a rich resource for the exploration of in vivo human gene expression in diverse tissues and cell types.
The mechanistic target of rapamycin complex 1 (mTORC1) protein kinase is a master growth regulator that responds to multiple environmental cues. Amino acids stimulate, in a Rag-, Ragulator-, and vacuolar adenosine triphosphatase-dependent fashion, the translocation of mTORC1 to the lysosomal surface, where it interacts with its activator Rheb. Here, we identify SLC38A9, an uncharacterized protein with sequence similarity to amino acid transporters, as a lysosomal transmembrane protein that interacts with the Rag guanosine triphosphatases (GTPases) and Ragulator in an amino acid-sensitive fashion. SLC38A9 transports arginine with a high Michaelis constant, and loss of SLC38A9 represses mTORC1 activation by amino acids, particularly arginine. Overexpression of SLC38A9 or just its Ragulator-binding domain makes mTORC1 signaling insensitive to amino acid starvation but not to Rag activity. Thus, SLC38A9 functions upstream of the Rag GTPases and is an excellent candidate for being an arginine sensor for the mTORC1 pathway.
Abstract Calcium imaging with protein-based indicators 1,2 is widely used to follow neural activity in intact nervous systems, but current protein sensors report neural activity at timescales much slower than electrical signalling and are limited by trade-offs between sensitivity and kinetics. Here we used large-scale screening and structure-guided mutagenesis to develop and optimize several fast and sensitive GCaMP-type indicators 3–8 . The resulting ‘jGCaMP8’ sensors, based on the calcium-binding protein calmodulin and a fragment of endothelial nitric oxide synthase, have ultra-fast kinetics (half-rise times of 2 ms) and the highest sensitivity for neural activity reported for a protein-based calcium sensor. jGCaMP8 sensors will allow tracking of large populations of neurons on timescales relevant to neural computation.