NobleBlocks

National Human Genome Research Institute

facilityBethesda, Maryland, United States

Research output, citation impact, and the most-cited recent papers from National Human Genome Research Institute (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works
12.9K
Citations
5.4M
h-index
928
i10-index
23.3K
Also known as
National Center for Human Genome ResearchNational Human Genome Research Institute

Top-cited papers from National Human Genome Research Institute

Initial sequencing and analysis of the human genome
Eric S. Lander, Lauren Linton, Bruce W. Birren, Chad Nusbaum +4 more
2001· Nature24.5Kdoi:10.1038/35057062

The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

A global reference for human genetic variation
Corresponding authors, Adam Auton, Gonçalo R. Abecasis, David M. Altshuler +4 more
2015· Nature19.8Kdoi:10.1038/nature15393

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Results for the final phase of the 1000 Genomes Project are presented including whole-genome sequencing, targeted exome sequencing, and genotyping on high-density SNP arrays for 2,504 individuals across 26 populations, providing a global reference data set to support biomedical genetics. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.

Structure, function and diversity of the healthy human microbiome
Curtis Huttenhower, J. Fah Sathirapongsasuti, Nicola Segata,  Curtis Huttenhower +4 more
2012· Nature11.9Kdoi:10.1038/nature11234

Studies of the human microbiome have revealed that even healthy individuals differ remarkably in the microbes that occupy habitats such as the gut, skin and vagina. Much of this diversity remains unexplained, although diet, environment, host genetics and early microbial exposure have all been implicated. Accordingly, to characterize the ecology of human-associated microbial communities, the Human Microbiome Project has analysed the largest cohort and set of distinct, clinically relevant body habitats so far. We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals. The project encountered an estimated 81–99% of the genera, enzyme families and community configurations occupied by the healthy Western microbiome. Metagenomic carriage of metabolic pathways was stable among individuals despite variation in community structure, and ethnic/racial background proved to be one of the strongest associations of both pathways and microbes with clinical metadata. These results thus delineate the range of structural and functional configurations normal in the microbial communities of a healthy population, enabling future characterization of the epidemiology, ecology and translational applications of the human microbiome. The Human Microbiome Project Consortium reports the first results of their analysis of microbial communities from distinct, clinically relevant body habitats in a human cohort; the insights into the microbial communities of a healthy population lay foundations for future exploration of the epidemiology, ecology and translational applications of the human microbiome. The Human Microbiome Project (HMP), supported by the National Institutes of Health Common Fund, has the goal of characterizing the microbial communities that inhabit and interact with the human body in sickness and in health. In two Articles in this issue of Nature, the HMP Consortium presents the first population-scale details of the organismal and functional composition of the microbiota across five areas of the body. An associated News & Views discusses the initial results — which, along with those of a series of co-publications, already constitute the most extensive catalogue of organisms and genes related to the human microbiome yet published — and highlights some of the major questions that the project will tackle in the next few years.

The mutational constraint spectrum quantified from variation in 141,456 humans
Konrad J. Karczewski, Laurent C. Francioli, Grace Tiao, Beryl B. Cummings +4 more
2020· Nature10.0Kdoi:10.1038/s41586-020-2308-7

Abstract Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes 1 . Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.

The Cancer Genome Atlas Pan-Cancer analysis project
John N. Weinstein, Eric A Collisson, Gordon B. Mills, Kenna Shaw +4 more
2013· Nature Genetics9.4Kdoi:10.1038/ng.2764

Current clinical practice is organized according to tissue or organ of origin of tumors. Now, The Cancer Genome Atlas (TCGA) Research Network has started to identify genomic and other molecular commonalities among a dozen different types of cancer. Emerging similarities and contrasts will form the basis for targeted therapies of the future and for repurposing existing therapies by molecular rather than histological similarities of the diseases. The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.

Mutation in the α-Synuclein Gene Identified in Families with Parkinson's Disease
Mihael H. Polymeropoulos, Christian Lavedan, Elisabeth Leroy, Susan Ide +4 more
1997· Science8.2Kdoi:10.1126/science.276.5321.2045

Parkinson's disease (PD) is a common neurodegenerative disorder with a lifetime incidence of approximately 2 percent. A pattern of familial aggregation has been documented for the disorder, and it was recently reported that a PD susceptibility gene in a large Italian kindred is located on the long arm of human chromosome 4. A mutation was identified in the alpha-synuclein gene, which codes for a presynaptic protein thought to be involved in neuronal plasticity, in the Italian kindred and in three unrelated families of Greek origin with autosomal dominant inheritance for the PD phenotype. This finding of a specific molecular alteration associated with PD will facilitate the detailed understanding of the pathophysiology of the disorder.

An integrated map of genetic variation from 1,092 human genomes
 Zamin Iqbal ,  Zamin Iqbal,  Andy Rimmer,  Anjali Gupta-Hinch +4 more
2012· Nature8.2Kdoi:10.1038/nature11632

By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations. This report from the 1000 Genomes Project describes the genomes of 1,092 individuals from 14 human populations, providing a resource for common and low-frequency variant analysis in individuals from diverse populations; hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites, can be found in each individual. This report by the 1000 Genomes Project describes the genomes of 1,092 individuals from 14 human populations, providing a resource for common and low-frequency variant analysis in individuals from diverse populations. Integrative analyses reveal profiles of rare and common variants in different populations. The frequencies of rare variants vary across biological pathways, and hundreds of rare, non-coding variants at conserved sites — such as changes disrupting transcription-factor motifs — can be established for each individual.

Canu: scalable and accurate long-read assembly via adaptive <i>k</i> -mer weighting and repeat separation
Sergey Koren, Brian P. Walenz, Konstantin Berlin, Jason Miller +2 more
2017· Genome Research8.1Kdoi:10.1101/gr.215087.116

Long-read single-molecule sequencing has revolutionized de novo genome assembly and enabled the automated reconstruction of reference-quality genomes. However, given the relatively high error rates of such technologies, efficient and accurate assembly of large repeats and closely related haplotypes remains challenging. We address these issues with Canu, a successor of Celera Assembler that is specifically designed for noisy single-molecule sequences. Canu introduces support for nanopore sequencing, halves depth-of-coverage requirements, and improves assembly continuity while simultaneously reducing runtime by an order of magnitude on large genomes versus Celera Assembler 8.2. These advances result from new overlapping and assembly algorithms, including an adaptive overlapping strategy based on tf-idf weighted MinHash and a sparse assembly graph construction that avoids collapsing diverged repeats and haplotypes. We demonstrate that Canu can reliably assemble complete microbial genomes and near-complete eukaryotic chromosomes using either Pacific Biosciences (PacBio) or Oxford Nanopore technologies and achieves a contig NG50 of &gt;21 Mbp on both human and Drosophila melanogaster PacBio data sets. For assembly structures that cannot be linearly represented, Canu provides graph-based assembly outputs in graphical fragment assembly (GFA) format for analysis or integration with complementary phasing and scaffolding techniques. The combination of such highly resolved assembly graphs with long-range scaffolding information promises the complete and automated assembly of complex genomes.

Integrated genomic analyses of ovarian carcinoma
Debra Bell, Andrew Berchuck, Andrew Berchuck, Michael J. Birrer +4 more
2011· Nature8.1Kdoi:10.1038/nature10166

A catalogue of molecular aberrations that cause ovarian cancer is critical for developing and deploying therapies that will improve patients’ lives. The Cancer Genome Atlas project has analysed messenger RNA expression, microRNA expression, promoter methylation and DNA copy number in 489 high-grade serous ovarian adenocarcinomas and the DNA sequences of exons from coding genes in 316 of these tumours. Here we report that high-grade serous ovarian cancer is characterized by TP53 mutations in almost all tumours (96%); low prevalence but statistically recurrent somatic mutations in nine further genes including NF1, BRCA1, BRCA2, RB1 and CDK12; 113 significant focal DNA copy number aberrations; and promoter methylation events involving 168 genes. Analyses delineated four ovarian cancer transcriptional subtypes, three microRNA subtypes, four promoter methylation subtypes and a transcriptional signature associated with survival duration, and shed new light on the impact that tumours with BRCA1/2 (BRCA1 or BRCA2) and CCNE1 aberrations have on survival. Pathway analyses suggested that homologous recombination is defective in about half of the tumours analysed, and that NOTCH and FOXM1 signalling are involved in serous ovarian cancer pathophysiology. The Cancer Genome Atlas (TCGA) project reports here its analysis of messenger RNA and microRNA expression, promoter methylation, DNA copy number and exome sequences in 489 high-grade serous ovarian adenocarcinomas. The analyses help establish new tumour subtypes. Among other insights is the finding that while the gene encoding p53 tumour suppressor is mutated in almost all tumours, nine other loci including NF1, BRCA1, BRCA2, RB1 and CDK12 carry recurrent albeit low-prevalence mutations. Homologous recombination is defective in about half of the tumours studied, and Notch and FOXM1 signalling are involved in the pathophysiology.

A map of human genome variation from population-scale sequencing
 Min Hu,  Yuan Chen,  James Stalker,  Richard M. Durbin  +4 more
2010· Nature8.1Kdoi:10.1038/nature09534

The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother–father–child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10−8 per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research. This issue of Nature contains the first publication from The 1000 Genomes Project, an international collaboration that will produce an extensive public catalogue of human genetic variation. The plan, in fact, is to sequence about 2,000 unidentified individuals from 20 populations around the world. This first paper presents the results from the project's pilot phase, testing three different strategies for genome-wide sequencing with high-throughput platforms: low-coverage whole-genome sequencing of 179 individuals in three population groups, high-coverage sequencing of two mother–father–child trios, and exon-targeted sequencing of 697 individuals from seven populations. The goal of the 1000 Genomes Project is to provide in-depth information on variation in human genome sequences. In the pilot phase reported here, different strategies for genome-wide sequencing, using high-throughput sequencing platforms, were developed and compared. The resulting data set includes more than 95% of the currently accessible variants found in any individual, and can be used to inform association and functional studies.

Comprehensive molecular characterization of gastric adenocarcinoma
Adam J. Bass, Natalie Tasman, Brady Bernard, Vésteinn Thórsson +4 more
2014· Nature6.5Kdoi:10.1038/nature13480

Gastric cancer is a leading cause of cancer deaths, but analysis of its molecular and clinical characteristics has been complicated by histological and aetiological heterogeneity. Here we describe a comprehensive molecular evaluation of 295 primary gastric adenocarcinomas as part of The Cancer Genome Atlas (TCGA) project. We propose a molecular classification dividing gastric cancer into four subtypes: tumours positive for Epstein–Barr virus, which display recurrent PIK3CA mutations, extreme DNA hypermethylation, and amplification of JAK2, CD274 (also known as PD-L1) and PDCD1LG2 (also known as PD-L2); microsatellite unstable tumours, which show elevated mutation rates, including mutations of genes encoding targetable oncogenic signalling proteins; genomically stable tumours, which are enriched for the diffuse histological variant and mutations of RHOA or fusions involving RHO-family GTPase-activating proteins; and tumours with chromosomal instability, which show marked aneuploidy and focal amplification of receptor tyrosine kinases. Identification of these subtypes provides a roadmap for patient stratification and trials of targeted therapies. The Cancer Genome Atlas reports on molecular evaluation of 295 primary gastric adenocarcinomas and proposes a new classification of gastric cancers into 4 subtypes, which should help with clinical assessment and trials of targeted therapies. This contribution from The Cancer Genome Atlas (TCGA) project describes the molecular evaluation of 295 primary gastric adenocarcinomas. Based on the results, the authors propose a novel classification separating gastric cancers into four subtypes according to: Epstein–Barr virus positive status, microsatellite instability, chromosomal instability or genomic stability. Given the histologic and etiologic heterogeneity of gastric cancer identification of these subtypes, using a schema that can readily be applied to patient samples should help with patient stratification and trials of targeted therapies.

Comprehensive molecular profiling of lung adenocarcinoma
Eric A. Collisson ,  Barry S. Taylor,  Levi Garraway,  Chip Stewart +4 more
2014· Nature5.8Kdoi:10.1038/nature13385

Adenocarcinoma of the lung is the leading cause of cancer death worldwide. Here we report molecular profiling of 230 resected lung adenocarcinomas using messenger RNA, microRNA and DNA sequencing integrated with copy number, methylation and proteomic analyses. High rates of somatic mutation were seen (mean 8.9 mutations per megabase). Eighteen genes were statistically significantly mutated, including RIT1 activating mutations and newly described loss-of-function MGA mutations which are mutually exclusive with focal MYC amplification. EGFR mutations were more frequent in female patients, whereas mutations in RBM10 were more common in males. Aberrations in NF1, MET, ERBB2 and RIT1 occurred in 13% of cases and were enriched in samples otherwise lacking an activated oncogene, suggesting a driver role for these events in certain tumours. DNA and mRNA sequence from the same tumour highlighted splicing alterations driven by somatic genomic changes, including exon 14 skipping in MET mRNA in 4% of cases. MAPK and PI(3)K pathway activity, when measured at the protein level, was explained by known mutations in only a fraction of cases, suggesting additional, unexplained mechanisms of pathway activation. These data establish a foundation for classification and further investigations of lung adenocarcinoma molecular pathogenesis. An integrated transcriptome, genome, methylome and proteome analysis of over 200 lung adenocarcinomas reveals high rates of somatic mutations, 18 statistically significantly mutated genes including RIT1 and MGA, splicing changes, and alterations in MAPK and PI(3)K pathway activity. This report from The Cancer Genome Atlas Research Network presents molecular profiling of 230 resected untreated lung adenocarcinomas. Integrated analyses of transcriptome, genome, methylome and proteome collectively identify high rates of somatic mutation, significantly mutated genes including RIT1 and MGA, splicing alterations driven by somatic genomic changes, and point to as yet unidentified lesions that alter MAPK and PI(3)K pathway activity. These data establish a foundation for classification and further investigations of the leading cause of cancer death worldwide.

High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries
Chirag Jain, Luis M. Rodriguez‐R, Adam M. Phillippy, Konstantinos T. Konstantinidis +1 more
2018· Nature Communications5.6Kdoi:10.1038/s41467-018-07641-9

A fundamental question in microbiology is whether there is continuum of genetic diversity among genomes, or clear species boundaries prevail instead. Whole-genome similarity metrics such as Average Nucleotide Identity (ANI) help address this question by facilitating high resolution taxonomic analysis of thousands of genomes from diverse phylogenetic lineages. To scale to available genomes and beyond, we present FastANI, a new method to estimate ANI using alignment-free approximate sequence mapping. FastANI is accurate for both finished and draft genomes, and is up to three orders of magnitude faster compared to alignment-based approaches. We leverage FastANI to compute pairwise ANI values among all prokaryotic genomes available in the NCBI database. Our results reveal clear genetic discontinuity, with 99.8% of the total 8 billion genome pairs analyzed conforming to >95% intra-species and <83% inter-species ANI values. This discontinuity is manifested with or without the most frequently sequenced species, and is robust to historic additions in the genome databases.

Genetic effects on gene expression across human tissues
 Taru Tukiainen,  Katherine H. Huang,  Kristin G. Ardlie,  Daniel G. MacArthur +4 more
2017· Nature4.6Kdoi:10.1038/nature24277

Characterization of the molecular function of the human genome and its variation across individuals is essential for identifying the cellular mechanisms that underlie human genetic traits and diseases. The Genotype-Tissue Expression (GTEx) project aims to characterize variation in gene expression levels across individuals and diverse tissues of the human body, many of which are not easily accessible. Here we describe genetic effects on gene expression levels across 44 human tissues. We find that local genetic variation affects gene expression levels for the majority of genes, and we further identify inter-chromosomal genetic effects for 93 genes and 112 loci. On the basis of the identified genetic effects, we characterize patterns of tissue specificity, compare local and distal effects, and evaluate the functional properties of the genetic effects. We also demonstrate that multi-tissue, multi-individual data can be used to identify genes and pathways affected by human disease-associated variation, enabling a mechanistic interpretation of gene regulation and the genetic basis of disease.

The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019
Annalisa Buniello, Jacqueline A. L. MacArthur, María Cerezo, Laura W. Harris +4 more
2018· Nucleic Acids Research4.6Kdoi:10.1093/nar/gky1120

The GWAS Catalog delivers a high-quality curated collection of all published genome-wide association studies enabling investigations to identify causal variants, understand disease mechanisms, and establish targets for novel therapies. The scope of the Catalog has also expanded to targeted and exome arrays with 1000 new associations added for these technologies. As of September 2018, the Catalog contains 5687 GWAS comprising 71673 variant-trait associations from 3567 publications. New content includes 284 full P-value summary statistics datasets for genome-wide and new targeted array studies, representing 6 × 109 individual variant-trait statistics. In the last 12 months, the Catalog's user interface was accessed by ∼90000 unique users who viewed >1 million pages. We have improved data access with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database. Summary statistics provision is supported by a new format proposed as a community standard for summary statistics data representation. This format was derived from our experience in standardizing heterogeneous submissions, mapping formats and in harmonizing content. Availability: https://www.ebi.ac.uk/gwas/.

A Draft Sequence of the Neandertal Genome
Richard E. Green, Johannes Krause, Adrian W. Briggs, Tomislav Maričić +4 more
2010· Science4.5Kdoi:10.1126/science.1188021

Neandertals, the closest evolutionary relatives of present-day humans, lived in large parts of Europe and western Asia before disappearing 30,000 years ago. We present a draft sequence of the Neandertal genome composed of more than 4 billion nucleotides from three individuals. Comparisons of the Neandertal genome to the genomes of five present-day humans from different parts of the world identify a number of genomic regions that may have been affected by positive selection in ancestral modern humans, including genes involved in metabolism and in cognitive and skeletal development. We show that Neandertals shared more genetic variants with present-day humans in Eurasia than with present-day humans in sub-Saharan Africa, suggesting that gene flow from Neandertals into the ancestors of non-Africans occurred before the divergence of Eurasian groups from each other.

α-Synuclein Locus Triplication Causes Parkinson's Disease
A. B. Singleton, Andrew Singleton, Matthew J. Farrer, Janel Johnson +4 more
2003· Science4.4Kdoi:10.1126/science.1090278

Impaired Autophagic-Lysosomal Fusion in Parkinson's Patient Midbrain Neurons Occurs through Loss of ykt6 and Is Rescued by Farnesyltransferase Inhibition,

Potential etiologic and functional implications of genome-wide association loci for human diseases and traits
Lucia A. Hindorff, Praveen Sethupathy, Heather Junkins, Erin M. Ramos +3 more
2009· Proceedings of the National Academy of Sciences4.2Kdoi:10.1073/pnas.0903103106

We have developed an online catalog of SNP-trait associations from published genome-wide association studies for use in investigating genomic characteristics of trait/disease-associated SNPs (TASs). Reported TASs were common [median risk allele frequency 36%, interquartile range (IQR) 21%-53%] and were associated with modest effect sizes [median odds ratio (OR) 1.33, IQR 1.20-1.61]. Among 20 genomic annotation sets, reported TASs were significantly overrepresented only in nonsynonymous sites [OR = 3.9 (2.2-7.0), p = 3.5 x 10(-7)] and 5kb-promoter regions [OR = 2.3 (1.5-3.6), p = 3 x 10(-4)] compared to SNPs randomly selected from genotyping arrays. Although 88% of TASs were intronic (45%) or intergenic (43%), TASs were not overrepresented in introns and were significantly depleted in intergenic regions [OR = 0.44 (0.34-0.58), p = 2.0 x 10(-9)]. Only slightly more TASs than expected by chance were predicted to be in regions under positive selection [OR = 1.3 (0.8-2.1), p = 0.2]. This new online resource, together with bioinformatic predictions of the underlying functionality at trait/disease-associated loci, is well-suited to guide future investigations of the role of common variants in complex disease etiology.

Comprehensive genomic characterization of squamous cell lung cancers
 Gad Getz,  Stephen E. Schumacher,  Petar Stojanov,  Sachet Shukla +4 more
2012· Nature4.0Kdoi:10.1038/nature11404

Lung squamous cell carcinoma is a common type of lung cancer, causing approximately 400,000 deaths per year worldwide. Genomic alterations in squamous cell lung cancers have not been comprehensively characterized, and no molecularly targeted agents have been specifically developed for its treatment. As part of The Cancer Genome Atlas, here we profile 178 lung squamous cell carcinomas to provide a comprehensive landscape of genomic and epigenomic alterations. We show that the tumour type is characterized by complex genomic alterations, with a mean of 360 exonic mutations, 165 genomic rearrangements, and 323 segments of copy number alteration per tumour. We find statistically recurrent mutations in 11 genes, including mutation of TP53 in nearly all specimens. Previously unreported loss-of-function mutations are seen in the HLA-A class I major histocompatibility gene. Significantly altered pathways included NFE2L2 and KEAP1 in 34%, squamous differentiation genes in 44%, phosphatidylinositol-3-OH kinase pathway genes in 47%, and CDKN2A and RB1 in 72% of tumours. We identified a potential therapeutic target in most tumours, offering new avenues of investigation for the treatment of squamous cell lung cancers. Comprehensive analyses of 178 lung squamous cell carcinomas by The Cancer Genome Atlas project show that the tumour type is characterized by complex genomic alterations, with statistically recurrent mutations in 11 genes, including TP53 in nearly all samples; a potential therapeutic target is identified in most of the samples studied. The Cancer Genome Atlas consortium has analysed 178 lung squamous cell carcinomas, a common type of lung cancer for which comprehensive genomic analyses have not previously been available. The researchers report that this tumour type is characterized by complex genomic alterations, with recurrent mutations in 18 genes, including TP53 in nearly all samples. They also report frequent mutations in squamous differentiation genes. Collectively, these analyses identify potential therapeutic targets worthy of further investigation.

The repertoire of mutational signatures in human cancer
Ludmil B. Alexandrov, Jaegil Kim, Nicholas J. Haradhvala, Mi Ni Huang +4 more
2020· Nature3.7Kdoi:10.1038/s41586-020-1943-3

Abstract Somatic mutations in cancer genomes are caused by multiple mutational processes, each of which generates a characteristic mutational signature 1 . Here, as part of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium 2 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA), we characterized mutational signatures using 84,729,690 somatic mutations from 4,645 whole-genome and 19,184 exome sequences that encompass most types of cancer. We identified 49 single-base-substitution, 11 doublet-base-substitution, 4 clustered-base-substitution and 17 small insertion-and-deletion signatures. The substantial size of our dataset, compared with previous analyses 3–15 , enabled the discovery of new signatures, the separation of overlapping signatures and the decomposition of signatures into components that may represent associated—but distinct—DNA damage, repair and/or replication mechanisms. By estimating the contribution of each signature to the mutational catalogues of individual cancer genomes, we revealed associations of signatures to exogenous or endogenous exposures, as well as to defective DNA-maintenance processes. However, many signatures are of unknown cause. This analysis provides a systematic perspective on the repertoire of mutational processes that contribute to the development of human cancer.