Massachusetts General Hospital
Hospital / health systemBoston, United States
Research output, citation impact, and the most-cited recent papers from Massachusetts General Hospital (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Massachusetts General Hospital
Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Results for the final phase of the 1000 Genomes Project are presented including whole-genome sequencing, targeted exome sequencing, and genotyping on high-density SNP arrays for 2,504 individuals across 26 populations, providing a global reference data set to support biomedical genetics. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.
We present Model-based Analysis of ChIP-Seq data, MACS, which analyzes data generated by short read sequencers such as Solexa's Genome Analyzer. MACS empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome, allowing for more robust predictions. MACS compares favorably to existing ChIP-Seq peak-finding algorithms, and is freely available.
The National Institute on Aging and the Alzheimer's Association charged a workgroup with the task of revising the 1984 criteria for Alzheimer's disease (AD) dementia. The workgroup sought to ensure that the revised criteria would be flexible enough to be used by both general healthcare providers without access to neuropsychological testing, advanced imaging, and cerebrospinal fluid measures, and specialized investigators involved in research or in clinical trial studies who would have these tools available. We present criteria for all-cause dementia and for AD dementia. We retained the general framework of probable AD dementia from the 1984 criteria. On the basis of the past 27 years of experience, we made several changes in the clinical criteria for the diagnosis. We also retained the term possible AD dementia, but redefined it in a manner more focused than before. Biomarker evidence was also integrated into the diagnostic formulations for probable and possible AD dementia for use in research settings. The core clinical criteria for AD dementia will continue to be the cornerstone of the diagnosis in clinical practice, but biomarker evidence is expected to enhance the pathophysiological specificity of the diagnosis of AD dementia. Much work lies ahead for validating the biomarker diagnosis of AD dementia.
BACKGROUND: Thyroid nodules are a common clinical problem, and differentiated thyroid cancer is becoming increasingly prevalent. Since the American Thyroid Association's (ATA's) guidelines for the management of these disorders were revised in 2009, significant scientific advances have occurred in the field. The aim of these guidelines is to inform clinicians, patients, researchers, and health policy makers on published evidence relating to the diagnosis and management of thyroid nodules and differentiated thyroid cancer. METHODS: The specific clinical questions addressed in these guidelines were based on prior versions of the guidelines, stakeholder input, and input of task force members. Task force panel members were educated on knowledge synthesis methods, including electronic database searching, review and selection of relevant citations, and critical appraisal of selected studies. Published English language articles on adults were eligible for inclusion. The American College of Physicians Guideline Grading System was used for critical appraisal of evidence and grading strength of recommendations for therapeutic interventions. We developed a similarly formatted system to appraise the quality of such studies and resultant recommendations. The guideline panel had complete editorial independence from the ATA. Competing interests of guideline task force members were regularly updated, managed, and communicated to the ATA and task force members. RESULTS: The revised guidelines for the management of thyroid nodules include recommendations regarding initial evaluation, clinical and ultrasound criteria for fine-needle aspiration biopsy, interpretation of fine-needle aspiration biopsy results, use of molecular markers, and management of benign thyroid nodules. Recommendations regarding the initial management of thyroid cancer include those relating to screening for thyroid cancer, staging and risk assessment, surgical management, radioiodine remnant ablation and therapy, and thyrotropin suppression therapy using levothyroxine. Recommendations related to long-term management of differentiated thyroid cancer include those related to surveillance for recurrent disease using imaging and serum thyroglobulin, thyroid hormone therapy, management of recurrent and metastatic disease, consideration for clinical trials and targeted therapy, as well as directions for future research. CONCLUSIONS: We have developed evidence-based recommendations to inform clinical decision-making in the management of thyroid nodules and differentiated thyroid cancer. They represent, in our opinion, contemporary optimal care for patients with these disorders.
BACKGROUND: PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for faster and scalable implementations of key functions, such as logistic regression, linkage disequilibrium estimation, and genomic distance evaluation. In addition, GWAS and population-genetic data now frequently contain genotype likelihoods, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. FINDINGS: To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, [Formula: see text]-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. We have also developed an extension to the data format which adds low-overhead support for genotype likelihoods, phase, multiallelic variants, and reference vs. alternate alleles, which is the basis of our planned second release (PLINK 2.0). CONCLUSIONS: The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.
A technique called optical coherence tomography (OCT) has been developed for noninvasive cross-sectional imaging in biological systems. OCT uses low-coherence interferometry to produce a two-dimensional image of optical scattering from internal tissue microstructures in a way that is analogous to ultrasonic pulse-echo imaging. OCT has longitudinal and lateral spatial resolutions of a few micrometers and can detect reflected signals as small as approximately 10(-10) of the incident optical power. Tomographic imaging is demonstrated in vitro in the peripapillary area of the retina and in the coronary artery, two clinically relevant examples that are representative of transparent and turbid media, respectively.
The fifth edition of the WHO Classification of Tumors of the Central Nervous System (CNS), published in 2021, is the sixth version of the international standard for the classification of brain and spinal cord tumors. Building on the 2016 updated fourth edition and the work of the Consortium to Inform Molecular and Practical Approaches to CNS Tumor Taxonomy, the 2021 fifth edition introduces major changes that advance the role of molecular diagnostics in CNS tumor classification. At the same time, it remains wedded to other established approaches to tumor diagnosis such as histology and immunohistochemistry. In doing so, the fifth edition establishes some different approaches to both CNS tumor nomenclature and grading and it emphasizes the importance of integrated diagnoses and layered reports. New tumor types and subtypes are introduced, some based on novel diagnostic technologies such as DNA methylome profiling. The present review summarizes the major general changes in the 2021 fifth edition classification and the specific changes in each taxonomic category. It is hoped that this summary provides an overview to facilitate more in-depth exploration of the entire fifth edition of the WHO Classification of Tumors of the Central Nervous System.
BACKGROUND: Most patients with non-small-cell lung cancer have no response to the tyrosine kinase inhibitor gefitinib, which targets the epidermal growth factor receptor (EGFR). However, about 10 percent of patients have a rapid and often dramatic clinical response. The molecular mechanisms underlying sensitivity to gefitinib are unknown. METHODS: We searched for mutations in the EGFR gene in primary tumors from patients with non-small-cell lung cancer who had a response to gefitinib, those who did not have a response, and those who had not been exposed to gefitinib. The functional consequences of identified mutations were evaluated after the mutant proteins were expressed in cultured cells. RESULTS: Somatic mutations were identified in the tyrosine kinase domain of the EGFR gene in eight of nine patients with gefitinib-responsive lung cancer, as compared with none of the seven patients with no response (P<0.001). Mutations were either small, in-frame deletions or amino acid substitutions clustered around the ATP-binding pocket of the tyrosine kinase domain. Similar mutations were detected in tumors from 2 of 25 patients with primary non-small-cell lung cancer who had not been exposed to gefitinib (8 percent). All mutations were heterozygous, and identical mutations were observed in multiple patients, suggesting an additive specific gain of function. In vitro, EGFR mutants demonstrated enhanced tyrosine kinase activity in response to epidermal growth factor and increased sensitivity to inhibition by gefitinib. CONCLUSIONS: A subgroup of patients with non-small-cell lung cancer have specific mutations in the EGFR gene, which correlate with clinical responsiveness to the tyrosine kinase inhibitor gefitinib. These mutations lead to increased growth factor signaling and confer susceptibility to the inhibitor. Screening for such mutations in lung cancers may identify patients who will have a response to gefitinib.
The last decade has seen a sharp increase in the number of scientific publications describing physiological and pathological functions of extracellular vesicles (EVs), a collective term covering various subtypes of cell-released, membranous structures, called exosomes, microvesicles, microparticles, ectosomes, oncosomes, apoptotic bodies, and many other names. However, specific issues arise when working with these entities, whose size and amount often make them difficult to obtain as relatively pure preparations, and to characterize properly. The International Society for Extracellular Vesicles (ISEV) proposed Minimal Information for Studies of Extracellular Vesicles ("MISEV") guidelines for the field in 2014. We now update these "MISEV2014" guidelines based on evolution of the collective knowledge in the last four years. An important point to consider is that ascribing a specific function to EVs in general, or to subtypes of EVs, requires reporting of specific information beyond mere description of function in a crude, potentially contaminated, and heterogeneous preparation. For example, claims that exosomes are endowed with exquisite and specific activities remain difficult to support experimentally, given our still limited knowledge of their specific molecular machineries of biogenesis and release, as compared with other biophysically similar EVs. The MISEV2018 guidelines include tables and outlines of suggested protocols and steps to follow to document specific EV-associated functional activities. Finally, a checklist is provided with summaries of key points.
Cardiovascular diseases (CVDs), principally ischemic heart disease (IHD) and stroke, are the leading cause of global mortality and a major contributor to disability. This paper reviews the magnitude of total CVD burden, including 13 underlying causes of cardiovascular death and 9 related risk factors, using estimates from the Global Burden of Disease (GBD) Study 2019. GBD, an ongoing multinational collaboration to provide comparable and consistent estimates of population health over time, used all available population-level data sources on incidence, prevalence, case fatality, mortality, and health risks to produce estimates for 204 countries and territories from 1990 to 2019. Prevalent cases of total CVD nearly doubled from 271 million (95% uncertainty interval [UI]: 257 to 285 million) in 1990 to 523 million (95% UI: 497 to 550 million) in 2019, and the number of CVD deaths steadily increased from 12.1 million (95% UI:11.4 to 12.6 million) in 1990, reaching 18.6 million (95% UI: 17.1 to 19.7 million) in 2019. The global trends for disability-adjusted life years (DALYs) and years of life lost also increased significantly, and years lived with disability doubled from 17.7 million (95% UI: 12.9 to 22.5 million) to 34.4 million (95% UI:24.9 to 43.6 million) over that period. The total number of DALYs due to IHD has risen steadily since 1990, reaching 182 million (95% UI: 170 to 194 million) DALYs, 9.14 million (95% UI: 8.40 to 9.74 million) deaths in the year 2019, and 197 million (95% UI: 178 to 220 million) prevalent cases of IHD in 2019. The total number of DALYs due to stroke has risen steadily since 1990, reaching 143 million (95% UI: 133 to 153 million) DALYs, 6.55 million (95% UI: 6.00 to 7.02 million) deaths in the year 2019, and 101 million (95% UI: 93.2 to 111 million) prevalent cases of stroke in 2019. Cardiovascular diseases remain the leading cause of disease burden in the world. CVD burden continues its decades-long rise for almost all countries outside high-income countries, and alarmingly, the age-standardized rate of CVD has begun to rise in some locations where it was previously declining in high-income countries. There is an urgent need to focus on implementing existing cost-effective policies and interventions if the world is to meet the targets for Sustainable Development Goal 3 and achieve a 30% reduction in premature mortality due to noncommunicable diseases.
Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.
The World Health Organization (WHO) classification of tumors of the hematopoietic and lymphoid tissues was last updated in 2008. Since then, there have been numerous advances in the identification of unique biomarkers associated with some myeloid neoplasms and acute leukemias, largely derived from gene expression analysis and next-generation sequencing that can significantly improve the diagnostic criteria as well as the prognostic relevance of entities currently included in the WHO classification and that also suggest new entities that should be added. Therefore, there is a clear need for a revision to the current classification. The revisions to the categories of myeloid neoplasms and acute leukemia will be published in a monograph in 2016 and reflect a consensus of opinion of hematopathologists, hematologists, oncologists, and geneticists. The 2016 edition represents a revision of the prior classification rather than an entirely new classification and attempts to incorporate new clinical, prognostic, morphologic, immunophenotypic, and genetic data that have emerged since the last edition. The major changes in the classification and their rationale are presented here.
Abstract Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes 1 . Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.
Thirty years of brain imaging research has converged to define the brain's default network-a novel and only recently appreciated brain system that participates in internal modes of cognition. Here we synthesize past observations to provide strong evidence that the default network is a specific, anatomically defined brain system preferentially active when individuals are not focused on the external environment. Analysis of connectional anatomy in the monkey supports the presence of an interconnected brain system. Providing insight into function, the default network is active when individuals are engaged in internally focused tasks including autobiographical memory retrieval, envisioning the future, and conceiving the perspectives of others. Probing the functional anatomy of the network in detail reveals that it is best understood as multiple interacting subsystems. The medial temporal lobe subsystem provides information from prior experiences in the form of memories and associations that are the building blocks of mental simulation. The medial prefrontal subsystem facilitates the flexible use of this information during the construction of self-relevant mental simulations. These two subsystems converge on important nodes of integration including the posterior cingulate cortex. The implications of these functional and anatomical observations are discussed in relation to possible adaptive roles of the default network for using past experiences to plan for the future, navigate social interactions, and maximize the utility of moments when we are not otherwise engaged by the external world. We conclude by discussing the relevance of the default network for understanding mental disorders including autism, schizophrenia, and Alzheimer's disease.
Information processing in the cerebral cortex involves interactions among distributed areas. Anatomical connectivity suggests that certain areas form local hierarchical relations such as within the visual system. Other connectivity patterns, particularly among association areas, suggest the presence of large-scale circuits without clear hierarchical relations. In this study the organization of networks in the human cerebrum was explored using resting-state functional connectivity MRI. Data from 1,000 subjects were registered using surface-based alignment. A clustering approach was employed to identify and replicate networks of functionally coupled regions across the cerebral cortex. The results revealed local networks confined to sensory and motor cortices as well as distributed networks of association regions. Within the sensory and motor cortices, functional connectivity followed topographic representations across adjacent areas. In association cortex, the connectivity patterns often showed abrupt transitions between network boundaries. Focused analyses were performed to better understand properties of network connectivity. A canonical sensory-motor pathway involving primary visual area, putative middle temporal area complex (MT+), lateral intraparietal area, and frontal eye field was analyzed to explore how interactions might arise within and between networks. Results showed that adjacent regions of the MT+ complex demonstrate differential connectivity consistent with a hierarchical pathway that spans networks. The functional connectivity of parietal and prefrontal association cortices was next explored. Distinct connectivity profiles of neighboring regions suggest they participate in distributed networks that, while showing evidence for interactions, are embedded within largely parallel, interdigitated circuits. We conclude by discussing the organization of these large-scale cerebral networks in relation to monkey anatomy and their potential evolutionary expansion in humans to support cognition.
The apolipoprotein E type 4 allele (APOE-epsilon 4) is genetically associated with the common late onset familial and sporadic forms of Alzheimer's disease (AD). Risk for AD increased from 20% to 90% and mean age at onset decreased from 84 to 68 years with increasing number of APOE-epsilon 4 alleles in 42 families with late onset AD. Thus APOE-epsilon 4 gene dose is a major risk factor for late onset AD and, in these families, homozygosity for APOE-epsilon 4 was virtually sufficient to cause AD by age 80.
BACKGROUND: The use of warfarin reduces the rate of ischemic stroke in patients with atrial fibrillation but requires frequent monitoring and dose adjustment. Rivaroxaban, an oral factor Xa inhibitor, may provide more consistent and predictable anticoagulation than warfarin. METHODS: In a double-blind trial, we randomly assigned 14,264 patients with nonvalvular atrial fibrillation who were at increased risk for stroke to receive either rivaroxaban (at a daily dose of 20 mg) or dose-adjusted warfarin. The per-protocol, as-treated primary analysis was designed to determine whether rivaroxaban was noninferior to warfarin for the primary end point of stroke or systemic embolism. RESULTS: In the primary analysis, the primary end point occurred in 188 patients in the rivaroxaban group (1.7% per year) and in 241 in the warfarin group (2.2% per year) (hazard ratio in the rivaroxaban group, 0.79; 95% confidence interval [CI], 0.66 to 0.96; P<0.001 for noninferiority). In the intention-to-treat analysis, the primary end point occurred in 269 patients in the rivaroxaban group (2.1% per year) and in 306 patients in the warfarin group (2.4% per year) (hazard ratio, 0.88; 95% CI, 0.74 to 1.03; P<0.001 for noninferiority; P=0.12 for superiority). Major and nonmajor clinically relevant bleeding occurred in 1475 patients in the rivaroxaban group (14.9% per year) and in 1449 in the warfarin group (14.5% per year) (hazard ratio, 1.03; 95% CI, 0.96 to 1.11; P=0.44), with significant reductions in intracranial hemorrhage (0.5% vs. 0.7%, P=0.02) and fatal bleeding (0.2% vs. 0.5%, P=0.003) in the rivaroxaban group. CONCLUSIONS: In patients with atrial fibrillation, rivaroxaban was noninferior to warfarin for the prevention of stroke or systemic embolism. There was no significant between-group difference in the risk of major bleeding, although intracranial and fatal bleeding occurred less frequently in the rivaroxaban group. (Funded by Johnson & Johnson and Bayer; ROCKET AF ClinicalTrials.gov number, NCT00403767.).
BACKGROUND: Patients with advanced squamous-cell non-small-cell lung cancer (NSCLC) who have disease progression during or after first-line chemotherapy have limited treatment options. This randomized, open-label, international, phase 3 study evaluated the efficacy and safety of nivolumab, a fully human IgG4 programmed death 1 (PD-1) immune-checkpoint-inhibitor antibody, as compared with docetaxel in this patient population. METHODS: We randomly assigned 272 patients to receive nivolumab, at a dose of 3 mg per kilogram of body weight every 2 weeks, or docetaxel, at a dose of 75 mg per square meter of body-surface area every 3 weeks. The primary end point was overall survival. RESULTS: The median overall survival was 9.2 months (95% confidence interval [CI], 7.3 to 13.3) with nivolumab versus 6.0 months (95% CI, 5.1 to 7.3) with docetaxel. The risk of death was 41% lower with nivolumab than with docetaxel (hazard ratio, 0.59; 95% CI, 0.44 to 0.79; P<0.001). At 1 year, the overall survival rate was 42% (95% CI, 34 to 50) with nivolumab versus 24% (95% CI, 17 to 31) with docetaxel. The response rate was 20% with nivolumab versus 9% with docetaxel (P=0.008). The median progression-free survival was 3.5 months with nivolumab versus 2.8 months with docetaxel (hazard ratio for death or disease progression, 0.62; 95% CI, 0.47 to 0.81; P<0.001). The expression of the PD-1 ligand (PD-L1) was neither prognostic nor predictive of benefit. Treatment-related adverse events of grade 3 or 4 were reported in 7% of the patients in the nivolumab group as compared with 55% of those in the docetaxel group. CONCLUSIONS: Among patients with advanced, previously treated squamous-cell NSCLC, overall survival, response rate, and progression-free survival were significantly better with nivolumab than with docetaxel, regardless of PD-L1 expression level. (Funded by Bristol-Myers Squibb; CheckMate 017 ClinicalTrials.gov number, NCT01642004.).
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations. This report from the 1000 Genomes Project describes the genomes of 1,092 individuals from 14 human populations, providing a resource for common and low-frequency variant analysis in individuals from diverse populations; hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites, can be found in each individual. This report by the 1000 Genomes Project describes the genomes of 1,092 individuals from 14 human populations, providing a resource for common and low-frequency variant analysis in individuals from diverse populations. Integrative analyses reveal profiles of rare and common variants in different populations. The frequencies of rare variants vary across biological pathways, and hundreds of rare, non-coding variants at conserved sites — such as changes disrupting transcription-factor motifs — can be established for each individual.