
Howard Hughes Medical Institute
facilityChevy Chase, Maryland, United States
Research output, citation impact, and the most-cited recent papers from Howard Hughes Medical Institute (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Howard Hughes Medical Institute
Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a Structural Similarity Index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000.
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Results for the final phase of the 1000 Genomes Project are presented including whole-genome sequencing, targeted exome sequencing, and genotyping on high-density SNP arrays for 2,504 individuals across 26 populations, providing a global reference data set to support biomedical genetics. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.
Clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) systems provide bacteria and archaea with adaptive immunity against viruses and plasmids by using CRISPR RNAs (crRNAs) to guide the silencing of invading nucleic acids. We show here that in a subset of these systems, the mature crRNA that is base-paired to trans-activating crRNA (tracrRNA) forms a two-RNA structure that directs the CRISPR-associated protein Cas9 to introduce double-stranded (ds) breaks in target DNA. At sites complementary to the crRNA-guide sequence, the Cas9 HNH nuclease domain cleaves the complementary strand, whereas the Cas9 RuvC-like domain cleaves the noncomplementary strand. The dual-tracrRNA:crRNA, when engineered as a single RNA chimera, also directs sequence-specific Cas9 dsDNA cleavage. Our study reveals a family of endonucleases that use dual-RNAs for site-specific DNA cleavage and highlights the potential to exploit the system for RNA-programmable genome editing.
A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be interpreted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly characterized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.
A new software suite, called Crystallography & NMR System (CNS), has been developed for macromolecular structure determination by X-ray crystallography or solution nuclear magnetic resonance (NMR) spectroscopy. In contrast to existing structure-determination programs, the architecture of CNS is highly flexible, allowing for extension to other structure-determination methods, such as electron microscopy and solid-state NMR spectroscopy. CNS has a hierarchical structure: a high-level hypertext markup language (HTML) user interface, task-oriented user input files, module files, a symbolic structure-determination language (CNS language), and low-level source code. Each layer is accessible to the user. The novice user may just use the HTML interface, while the more advanced user may use any of the other layers. The source code will be distributed, thus source-code modification is possible. The CNS language is sufficiently powerful and flexible that many new algorithms can be easily implemented in the CNS language without changes to the source code. The CNS language allows the user to perform operations on data structures, such as structure factors, electron-density maps, and atomic properties. The power of the CNS language has been demonstrated by the implementation of a comprehensive set of crystallographic procedures for phasing, density modification and refinement. User-friendly task-oriented input files are available for nearly all aspects of macromolecular structure determination by X-ray crystallography and solution NMR.
Studies of the human microbiome have revealed that even healthy individuals differ remarkably in the microbes that occupy habitats such as the gut, skin and vagina. Much of this diversity remains unexplained, although diet, environment, host genetics and early microbial exposure have all been implicated. Accordingly, to characterize the ecology of human-associated microbial communities, the Human Microbiome Project has analysed the largest cohort and set of distinct, clinically relevant body habitats so far. We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals. The project encountered an estimated 81–99% of the genera, enzyme families and community configurations occupied by the healthy Western microbiome. Metagenomic carriage of metabolic pathways was stable among individuals despite variation in community structure, and ethnic/racial background proved to be one of the strongest associations of both pathways and microbes with clinical metadata. These results thus delineate the range of structural and functional configurations normal in the microbial communities of a healthy population, enabling future characterization of the epidemiology, ecology and translational applications of the human microbiome. The Human Microbiome Project Consortium reports the first results of their analysis of microbial communities from distinct, clinically relevant body habitats in a human cohort; the insights into the microbial communities of a healthy population lay foundations for future exploration of the epidemiology, ecology and translational applications of the human microbiome. The Human Microbiome Project (HMP), supported by the National Institutes of Health Common Fund, has the goal of characterizing the microbial communities that inhabit and interact with the human body in sickness and in health. In two Articles in this issue of Nature, the HMP Consortium presents the first population-scale details of the organismal and functional composition of the microbiota across five areas of the body. An associated News & Views discusses the initial results — which, along with those of a series of co-publications, already constitute the most extensive catalogue of organisms and genes related to the human microbiome yet published — and highlights some of the major questions that the project will tackle in the next few years.
As vertebrate genome sequences near completion and research refocuses to their analysis, the issue of effective genome annotation display becomes critical. A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu. This browser displays assembly contigs and gaps, mRNA and expressed sequence tag alignments, multiple gene predictions, cross-species homologies, single nucleotide polymorphisms, sequence-tagged sites, radiation hybrid data, transposon repeats, and more as a stack of coregistered tracks. Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. One-half of the annotation tracks are computed at the University of California, Santa Cruz from publicly available sequence data; collaborators worldwide provide the rest. Users can stably add their own custom tracks to the browser for educational or research purposes. The conceptual and technical framework of the browser, its underlying MYSQL database, and overall use are described. The web site currently serves over 50,000 pages per day to over 3000 different users.
Breast cancer is the most common malignancy in United States women, accounting for >40,000 deaths each year. These breast tumors are comprised of phenotypically diverse populations of breast cancer cells. Using a model in which human breast cancer cells were grown in immunocompromised mice, we found that only a minority of breast cancer cells had the ability to form new tumors. We were able to distinguish the tumorigenic (tumor initiating) from the nontumorigenic cancer cells based on cell surface marker expression. We prospectively identified and isolated the tumorigenic cells as CD44(+)CD24(-/low)Lineage(-) in eight of nine patients. As few as 100 cells with this phenotype were able to form tumors in mice, whereas tens of thousands of cells with alternate phenotypes failed to form tumors. The tumorigenic subpopulation could be serially passaged: each time cells within this population generated new tumors containing additional CD44(+)CD24(-/low)Lineage(-) tumorigenic cells as well as the phenotypically diverse mixed populations of nontumorigenic cells present in the initial tumor. The ability to prospectively identify tumorigenic cancer cells will facilitate the elucidation of pathways that regulate their growth and survival. Furthermore, because these cells drive tumor development, strategies designed to target this population may lead to more effective therapies.
The ongoing revolution in high-throughput sequencing continues to democratize the ability of small groups of investigators to map the microbial component of the biosphere. In particular, the coevolution of new sequencing platforms and new software tools allows data acquisition and analysis on an unprecedented scale. Here we report the next stage in this coevolutionary arms race, using the Illumina GAIIx platform to sequence a diverse array of 25 environmental samples and three known "mock communities" at a depth averaging 3.1 million reads per sample. We demonstrate excellent consistency in taxonomic recovery and recapture diversity patterns that were previously reported on the basis of metaanalysis of many studies from the literature (notably, the saline/nonsaline split in environmental samples and the split between host-associated and free-living communities). We also demonstrate that 2,000 Illumina single-end reads are sufficient to recapture the same relationships among samples that we observe with the full dataset. The results thus open up the possibility of conducting large-scale studies analyzing thousands of samples simultaneously to survey microbial communities at an unprecedented spatial and temporal resolution.
Thirty years of brain imaging research has converged to define the brain's default network-a novel and only recently appreciated brain system that participates in internal modes of cognition. Here we synthesize past observations to provide strong evidence that the default network is a specific, anatomically defined brain system preferentially active when individuals are not focused on the external environment. Analysis of connectional anatomy in the monkey supports the presence of an interconnected brain system. Providing insight into function, the default network is active when individuals are engaged in internally focused tasks including autobiographical memory retrieval, envisioning the future, and conceiving the perspectives of others. Probing the functional anatomy of the network in detail reveals that it is best understood as multiple interacting subsystems. The medial temporal lobe subsystem provides information from prior experiences in the form of memories and associations that are the building blocks of mental simulation. The medial prefrontal subsystem facilitates the flexible use of this information during the construction of self-relevant mental simulations. These two subsystems converge on important nodes of integration including the posterior cingulate cortex. The implications of these functional and anatomical observations are discussed in relation to possible adaptive roles of the default network for using past experiences to plan for the future, navigate social interactions, and maximize the utility of moments when we are not otherwise engaged by the external world. We conclude by discussing the relevance of the default network for understanding mental disorders including autism, schizophrenia, and Alzheimer's disease.
We have designed a system for targeted gene expression that allows the selective activation of any cloned gene in a wide variety of tissue- and cell-specific patterns. The gene encoding the yeast transcriptional activator GAL4 is inserted randomly into the Drosophila genome to drive GAL4 expression from one of a diverse array of genomic enhancers. It is then possible to introduce a gene containing GAL4 binding sites within its promoter, to activate it in those cells where GAL4 is expressed, and to observe the effect of this directed misexpression on development. We have used GAL4-directed transcription to expand the domain of embryonic expression of the homeobox protein even-skipped. We show that even-skipped represses wingless and transforms cells that would normally secrete naked cuticle into denticle secreting cells. The GAL4 system can thus be used to study regulatory interactions during embryonic development. In adults, targeted expression can be used to generate dominant phenotypes for use in genetic screens. We have directed expression of an activated form of the Dras2 protein, resulting in dominant eye and wing defects that can be used in screens to identify other members of the Dras2 signal transduction pathway.
Information processing in the cerebral cortex involves interactions among distributed areas. Anatomical connectivity suggests that certain areas form local hierarchical relations such as within the visual system. Other connectivity patterns, particularly among association areas, suggest the presence of large-scale circuits without clear hierarchical relations. In this study the organization of networks in the human cerebrum was explored using resting-state functional connectivity MRI. Data from 1,000 subjects were registered using surface-based alignment. A clustering approach was employed to identify and replicate networks of functionally coupled regions across the cerebral cortex. The results revealed local networks confined to sensory and motor cortices as well as distributed networks of association regions. Within the sensory and motor cortices, functional connectivity followed topographic representations across adjacent areas. In association cortex, the connectivity patterns often showed abrupt transitions between network boundaries. Focused analyses were performed to better understand properties of network connectivity. A canonical sensory-motor pathway involving primary visual area, putative middle temporal area complex (MT+), lateral intraparietal area, and frontal eye field was analyzed to explore how interactions might arise within and between networks. Results showed that adjacent regions of the MT+ complex demonstrate differential connectivity consistent with a hierarchical pathway that spans networks. The functional connectivity of parietal and prefrontal association cortices was next explored. Distinct connectivity profiles of neighboring regions suggest they participate in distributed networks that, while showing evidence for interactions, are embedded within largely parallel, interdigitated circuits. We conclude by discussing the organization of these large-scale cerebral networks in relation to monkey anatomy and their potential evolutionary expansion in humans to support cognition.
BACKGROUND: Somatic mutations have the potential to encode "non-self" immunogenic antigens. We hypothesized that tumors with a large number of somatic mutations due to mismatch-repair defects may be susceptible to immune checkpoint blockade. METHODS: We conducted a phase 2 study to evaluate the clinical activity of pembrolizumab, an anti-programmed death 1 immune checkpoint inhibitor, in 41 patients with progressive metastatic carcinoma with or without mismatch-repair deficiency. Pembrolizumab was administered intravenously at a dose of 10 mg per kilogram of body weight every 14 days in patients with mismatch repair-deficient colorectal cancers, patients with mismatch repair-proficient colorectal cancers, and patients with mismatch repair-deficient cancers that were not colorectal. The coprimary end points were the immune-related objective response rate and the 20-week immune-related progression-free survival rate. RESULTS: The immune-related objective response rate and immune-related progression-free survival rate were 40% (4 of 10 patients) and 78% (7 of 9 patients), respectively, for mismatch repair-deficient colorectal cancers and 0% (0 of 18 patients) and 11% (2 of 18 patients) for mismatch repair-proficient colorectal cancers. The median progression-free survival and overall survival were not reached in the cohort with mismatch repair-deficient colorectal cancer but were 2.2 and 5.0 months, respectively, in the cohort with mismatch repair-proficient colorectal cancer (hazard ratio for disease progression or death, 0.10 [P<0.001], and hazard ratio for death, 0.22 [P=0.05]). Patients with mismatch repair-deficient noncolorectal cancer had responses similar to those of patients with mismatch repair-deficient colorectal cancer (immune-related objective response rate, 71% [5 of 7 patients]; immune-related progression-free survival rate, 67% [4 of 6 patients]). Whole-exome sequencing revealed a mean of 1782 somatic mutations per tumor in mismatch repair-deficient tumors, as compared with 73 in mismatch repair-proficient tumors (P=0.007), and high somatic mutation loads were associated with prolonged progression-free survival (P=0.02). CONCLUSIONS: This study showed that mismatch-repair status predicted clinical benefit of immune checkpoint blockade with pembrolizumab. (Funded by Johns Hopkins University and others; ClinicalTrials.gov number, NCT01876511.).
We derive a new self-organizing learning algorithm that maximizes the information transferred in a network of nonlinear units. The algorithm does not assume any knowledge of the input distributions, and is defined here for the zero-noise limit. Under these conditions, information maximization has extra properties not found in the linear case (Linsker 1989). The nonlinearities in the transfer function are able to pick up higher-order moments of the input distributions and perform something akin to true redundancy reduction between units in the output representation. This enables the network to separate statistically independent components in the inputs: a higher-order generalization of principal components analysis. We apply the network to the source separation (or cocktail party) problem, successfully separating unknown mixtures of up to 10 speakers. We also show that a variant on the network architecture is able to perform blind deconvolution (cancellation of unknown echoes and reverberation in a speech signal). Finally, we derive dependencies of information transfer on time delays. We suggest that information maximization provides a unifying framework for problems in "blind" signal processing.
DNA sequencing continues to decrease in cost with the Illumina HiSeq2000 generating up to 600 Gb of paired-end 100 base reads in a ten-day run. Here we present a protocol for community amplicon sequencing on the HiSeq2000 and MiSeq Illumina platforms, and apply that protocol to sequence 24 microbial communities from host-associated and free-living environments. A critical question as more sequencing platforms become available is whether biological conclusions derived on one platform are consistent with what would be derived on a different platform. We show that the protocol developed for these instruments successfully recaptures known biological results, and additionally that biological conclusions are consistent across sequencing platforms (the HiSeq2000 versus the MiSeq) and across the sequenced regions of amplicons.
We introduce a method for optically imaging intracellular proteins at nanometer spatial resolution. Numerous sparse subsets of photoactivatable fluorescent protein molecules were activated, localized (to approximately 2 to 25 nanometers), and then bleached. The aggregate position information from all subsets was then assembled into a superresolution image. We used this method--termed photoactivated localization microscopy--to image specific target proteins in thin sections of lysosomes and mitochondria; in fixed whole cells, we imaged vinculin at focal adhesions, actin within a lamellipodium, and the distribution of the retroviral protein Gag at the plasma membrane.
MicroRNAs (miRNAs) are small endogenous RNAs that pair to sites in mRNAs to direct post-transcriptional repression. Many sites that match the miRNA seed (nucleotides 2-7), particularly those in 3' untranslated regions (3'UTRs), are preferentially conserved. Here, we overhauled our tool for finding preferential conservation of sequence motifs and applied it to the analysis of human 3'UTRs, increasing by nearly threefold the detected number of preferentially conserved miRNA target sites. The new tool more efficiently incorporates new genomes and more completely controls for background conservation by accounting for mutational biases, dinucleotide conservation rates, and the conservation rates of individual UTRs. The improved background model enabled preferential conservation of a new site type, the "offset 6mer," to be detected. In total, >45,000 miRNA target sites within human 3'UTRs are conserved above background levels, and >60% of human protein-coding genes have been under selective pressure to maintain pairing to miRNAs. Mammalian-specific miRNAs have far fewer conserved targets than do the more broadly conserved miRNAs, even when considering only more recently emerged targets. Although pairing to the 3' end of miRNAs can compensate for seed mismatches, this class of sites constitutes less than 2% of all preferentially conserved sites detected. The new tool enables statistically powerful analysis of individual miRNA target sites, with the probability of preferentially conserved targeting (P(CT)) correlating with experimental measurements of repression. Our expanded set of target predictions (including conserved 3'-compensatory sites), are available at the TargetScan website, which displays the P(CT) for each site and each predicted target.
The innate immune system is a universal and ancient form of host defense against infection. Innate immune recognition relies on a limited number of germline-encoded receptors. These receptors evolved to recognize conserved products of microbial metabolism produced by microbial pathogens, but not by the host. Recognition of these molecular structures allows the immune system to distinguish infectious nonself from noninfectious self. Toll-like receptors play a major role in pathogen recognition and initiation of inflammatory and immune responses. Stimulation of Toll-like receptors by microbial products leads to the activation of signaling pathways that result in the induction of antimicrobial genes and inflammatory cytokines. In addition, stimulation of Toll-like receptors triggers dendritic cell maturation and results in the induction of costimulatory molecules and increased antigen-presenting capacity. Thus, microbial recognition by Toll-like receptors helps to direct adaptive immune responses to antigens derived from microbial pathogens.
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations. This report from the 1000 Genomes Project describes the genomes of 1,092 individuals from 14 human populations, providing a resource for common and low-frequency variant analysis in individuals from diverse populations; hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites, can be found in each individual. This report by the 1000 Genomes Project describes the genomes of 1,092 individuals from 14 human populations, providing a resource for common and low-frequency variant analysis in individuals from diverse populations. Integrative analyses reveal profiles of rare and common variants in different populations. The frequencies of rare variants vary across biological pathways, and hundreds of rare, non-coding variants at conserved sites — such as changes disrupting transcription-factor motifs — can be established for each individual.