National Institutes of Health
governmentBethesda, Maryland, United States
Research output, citation impact, and the most-cited recent papers from National Institutes of Health (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from National Institutes of Health
The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
BACKGROUND: Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications. RESULTS: We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site. CONCLUSION: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Results for the final phase of the 1000 Genomes Project are presented including whole-genome sequencing, targeted exome sequencing, and genotyping on high-density SNP arrays for 2,504 individuals across 26 populations, providing a global reference data set to support biomedical genetics. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.
The purification of homogeneous glutathione S-transferases B and C from rat liver is described. Kinetic and physical properties of these enzymes are compared with those of homogeneous transferases A and E. The letter designations for the transferases are based on the reverse order of elution from carboxymethylcellulose, the purification step in which the transferases are separated from each other. Transferase B was purified on the basis of its ability to conjugate iodomethane with glutathione, whereas transferase C was purified on the basis of conjugation with 1,2-dichloro-4-nitrobenzene. Although each of the four enzymes can be identified by its reactivity with specific substrates, all of the enzymes are active to differing degrees in the conjugation of glutathione with p-nitrobenzyl chloride. Assay conditions for a variety of substrates are included. All four glutathione transferases have a molecular weight of 45,000 and are dissociable into subunits of approximately 25,000 daltons. Despite the similar physical properties and overlapping substrate specificities of these enzymes, only transferases A and C are immunologically related.
There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders-representing academia, industry, funding agencies, and scholarly publishers-have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.
A new software suite, called Crystallography & NMR System (CNS), has been developed for macromolecular structure determination by X-ray crystallography or solution nuclear magnetic resonance (NMR) spectroscopy. In contrast to existing structure-determination programs, the architecture of CNS is highly flexible, allowing for extension to other structure-determination methods, such as electron microscopy and solid-state NMR spectroscopy. CNS has a hierarchical structure: a high-level hypertext markup language (HTML) user interface, task-oriented user input files, module files, a symbolic structure-determination language (CNS language), and low-level source code. Each layer is accessible to the user. The novice user may just use the HTML interface, while the more advanced user may use any of the other layers. The source code will be distributed, thus source-code modification is possible. The CNS language is sufficiently powerful and flexible that many new algorithms can be easily implemented in the CNS language without changes to the source code. The CNS language allows the user to perform operations on data structures, such as structure factors, electron-density maps, and atomic properties. The power of the CNS language has been demonstrated by the implementation of a comprehensive set of crystallographic procedures for phasing, density modification and refinement. User-friendly task-oriented input files are available for nearly all aspects of macromolecular structure determination by X-ray crystallography and solution NMR.
Abstract CHARMM ( C hemistry at HAR vard M acromolecular M echanics) is a highly flexible computer program which uses empirical energy functions to model macromolecular systems. The program can read or model build structures, energy minimize them by first‐ or second‐derivative techniques, perform a normal mode or molecular dynamics simulation, and analyze the structural, equilibrium, and dynamic properties determined in these calculations. The operations that CHARMM can perform are described, and some implementation details are given. A set of parameters for the empirical energy function and a sample run are included.
BACKGROUND: PLINK 1 is a widely used open-source C/C++ toolset for genome-wide association studies (GWAS) and research in population genetics. However, the steady accumulation of data from imputation and whole-genome sequencing studies has exposed a strong need for faster and scalable implementations of key functions, such as logistic regression, linkage disequilibrium estimation, and genomic distance evaluation. In addition, GWAS and population-genetic data now frequently contain genotype likelihoods, phase information, and/or multiallelic variants, none of which can be represented by PLINK 1's primary data format. FINDINGS: To address these issues, we are developing a second-generation codebase for PLINK. The first major release from this codebase, PLINK 1.9, introduces extensive use of bit-level parallelism, [Formula: see text]-time/constant-space Hardy-Weinberg equilibrium and Fisher's exact tests, and many other algorithmic improvements. In combination, these changes accelerate most operations by 1-4 orders of magnitude, and allow the program to handle datasets too large to fit in RAM. We have also developed an extension to the data format which adds low-overhead support for genotype likelihoods, phase, multiallelic variants, and reference vs. alternate alleles, which is the basis of our planned second release (PLINK 2.0). CONCLUSIONS: The second-generation versions of PLINK will offer dramatic improvements in performance and compatibility. For the first time, users without access to high-end computing resources can perform several essential analyses of the feature-rich and very large genetic datasets coming into use.
The Gene Expression Omnibus (GEO) project was initiated in response to the growing demand for a public repository for high-throughput gene expression data. GEO provides a flexible and open design that facilitates submission, storage and retrieval of heterogeneous data sets from high-throughput gene expression and genomic hybridization experiments. GEO is not intended to replace in house gene expression databases that benefit from coherent data sets, and which are constructed to facilitate a particular analytic method, but rather complement these by acting as a tertiary, central data distribution hub. The three central data entities of GEO are platforms, samples and series, and were designed with gene expression and genomic hybridization experiments in mind. A platform is, essentially, a list of probes that define what set of molecules may be detected. A sample describes the set of molecules that are being probed and references a single platform used to generate its molecular abundance data. A series organizes samples into the meaningful data sets which make up an experiment. The GEO repository is publicly accessible through the World Wide Web at http://www.ncbi.nlm.nih.gov/geo.
The National High Blood Pressure Education Program presents the complete Seventh Report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. Like its predecessors, the purpose is to provide an evidence-based approach to the prevention and management of hypertension. The key messages of this report are these: in those older than age 50, systolic blood pressure (BP) of greater than 140 mm Hg is a more important cardiovascular disease (CVD) risk factor than diastolic BP; beginning at 115/75 mm Hg, CVD risk doubles for each increment of 20/10 mm Hg; those who are normotensive at 55 years of age will have a 90% lifetime risk of developing hypertension; prehypertensive individuals (systolic BP 120-139 mm Hg or diastolic BP 80-89 mm Hg) require health-promoting lifestyle modifications to prevent the progressive rise in blood pressure and CVD; for uncomplicated hypertension, thiazide diuretic should be used in drug treatment for most, either alone or combined with drugs from other classes; this report delineates specific high-risk conditions that are compelling indications for the use of other antihypertensive drug classes (angiotensin-converting enzyme inhibitors, angiotensin-receptor blockers, beta-blockers, calcium channel blockers); two or more antihypertensive medications will be required to achieve goal BP (<140/90 mm Hg, or <130/80 mm Hg) for patients with diabetes and chronic kidney disease; for patients whose BP is more than 20 mm Hg above the systolic BP goal or more than 10 mm Hg above the diastolic BP goal, initiation of therapy using two agents, one of which usually will be a thiazide diuretic, should be considered; regardless of therapy or care, hypertension will be controlled only if patients are motivated to stay on their treatment plan. Positive experiences, trust in the clinician, and empathy improve patient motivation and satisfaction. This report serves as a guide, and the committee continues to recognize that the responsible physician's judgment remains paramount.
Studies of the human microbiome have revealed that even healthy individuals differ remarkably in the microbes that occupy habitats such as the gut, skin and vagina. Much of this diversity remains unexplained, although diet, environment, host genetics and early microbial exposure have all been implicated. Accordingly, to characterize the ecology of human-associated microbial communities, the Human Microbiome Project has analysed the largest cohort and set of distinct, clinically relevant body habitats so far. We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals. The project encountered an estimated 81–99% of the genera, enzyme families and community configurations occupied by the healthy Western microbiome. Metagenomic carriage of metabolic pathways was stable among individuals despite variation in community structure, and ethnic/racial background proved to be one of the strongest associations of both pathways and microbes with clinical metadata. These results thus delineate the range of structural and functional configurations normal in the microbial communities of a healthy population, enabling future characterization of the epidemiology, ecology and translational applications of the human microbiome. The Human Microbiome Project Consortium reports the first results of their analysis of microbial communities from distinct, clinically relevant body habitats in a human cohort; the insights into the microbial communities of a healthy population lay foundations for future exploration of the epidemiology, ecology and translational applications of the human microbiome. The Human Microbiome Project (HMP), supported by the National Institutes of Health Common Fund, has the goal of characterizing the microbial communities that inhabit and interact with the human body in sickness and in health. In two Articles in this issue of Nature, the HMP Consortium presents the first population-scale details of the organismal and functional composition of the microbiota across five areas of the body. An associated News & Views discusses the initial results — which, along with those of a series of co-publications, already constitute the most extensive catalogue of organisms and genes related to the human microbiome yet published — and highlights some of the major questions that the project will tackle in the next few years.
The last decade has seen a sharp increase in the number of scientific publications describing physiological and pathological functions of extracellular vesicles (EVs), a collective term covering various subtypes of cell-released, membranous structures, called exosomes, microvesicles, microparticles, ectosomes, oncosomes, apoptotic bodies, and many other names. However, specific issues arise when working with these entities, whose size and amount often make them difficult to obtain as relatively pure preparations, and to characterize properly. The International Society for Extracellular Vesicles (ISEV) proposed Minimal Information for Studies of Extracellular Vesicles ("MISEV") guidelines for the field in 2014. We now update these "MISEV2014" guidelines based on evolution of the collective knowledge in the last four years. An important point to consider is that ascribing a specific function to EVs in general, or to subtypes of EVs, requires reporting of specific information beyond mere description of function in a crude, potentially contaminated, and heterogeneous preparation. For example, claims that exosomes are endowed with exquisite and specific activities remain difficult to support experimentally, given our still limited knowledge of their specific molecular machineries of biogenesis and release, as compared with other biophysically similar EVs. The MISEV2018 guidelines include tables and outlines of suggested protocols and steps to follow to document specific EV-associated functional activities. Finally, a checklist is provided with summaries of key points.
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides data analysis and retrieval resources for the data in GenBank and other biological data made available through NCBI's website. NCBI resources include Entrez, PubMed, PubMed Central, LocusLink, the NCBI Taxonomy Browser, BLAST, BLAST Link (BLink), Electronic PCR, OrfFinder, Spidey, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosome Aberration Project (CCAP), Entrez Genomes and related tools, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups (COGs) database, Retroviral Genotyping Tools, SARS Coronavirus Resource, SAGEmap, Gene Expression Omnibus (GEO), Online Mendelian Inheritance in Man (OMIM), the Molecular Modeling Database (MMDB), the Conserved Domain Database (CDD) and the Conserved Domain Architecture Retrieval Tool (CDART). Augmenting many of the web applications are custom implementations of the BLAST program optimized to search specialized data sets. All of the resources can be accessed through the NCBI home page at: http://www.ncbi.nlm.nih.gov.
The Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) is an international public repository for high-throughput microarray and next-generation sequence functional genomic data sets submitted by the research community. The resource supports archiving of raw data, processed data and metadata which are indexed, cross-linked and searchable. All data are freely available for download in a variety of formats. GEO also provides several web-based tools and strategies to assist users to query, analyse and visualize data. This article reports current status and recent database developments, including the release of GEO2R, an R-based web application that helps users analyse GEO data.
Nonalcoholic fatty liver disease (NAFLD) is characterized by hepatic steatosis in the absence of a history of significant alcohol use or other known liver disease. Nonalcoholic steatohepatitis (NASH) is the progressive form of NAFLD. The Pathology Committee of the NASH Clinical Research Network designed and validated a histological feature scoring system that addresses the full spectrum of lesions of NAFLD and proposed a NAFLD activity score (NAS) for use in clinical trials. The scoring system comprised 14 histological features, 4 of which were evaluated semi-quantitatively: steatosis (0-3), lobular inflammation (0-2), hepatocellular ballooning (0-2), and fibrosis (0-4). Another nine features were recorded as present or absent. An anonymized study set of 50 cases (32 from adult hepatology services, 18 from pediatric hepatology services) was assembled, coded, and circulated. For the validation study, agreement on scoring and a diagnostic categorization ("NASH," "borderline," or "not NASH") were evaluated by using weighted kappa statistics. Inter-rater agreement on adult cases was: 0.84 for fibrosis, 0.79 for steatosis, 0.56 for injury, and 0.45 for lobular inflammation. Agreement on diagnostic category was 0.61. Using multiple logistic regression, five features were independently associated with the diagnosis of NASH in adult biopsies: steatosis (P = .009), hepatocellular ballooning (P = .0001), lobular inflammation (P = .0001), fibrosis (P = .0001), and the absence of lipogranulomas (P = .001). The proposed NAS is the unweighted sum of steatosis, lobular inflammation, and hepatocellular ballooning scores. In conclusion, we present a strong scoring system and NAS for NAFLD and NASH with reasonable inter-rater reproducibility that should be useful for studies of both adults and children with any degree of NAFLD. NAS of > or =5 correlated with a diagnosis of NASH, and biopsies with scores of less than 3 were diagnosed as "not NASH."
Abstract Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes 1 . Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases.
The outbreak of a novel coronavirus (2019-nCoV) represents a pandemic threat that has been declared a public health emergency of international concern. The CoV spike (S) glycoprotein is a key target for vaccines, therapeutic antibodies, and diagnostics. To facilitate medical countermeasure development, we determined a 3.5-angstrom-resolution cryo-electron microscopy structure of the 2019-nCoV S trimer in the prefusion conformation. The predominant state of the trimer has one of the three receptor-binding domains (RBDs) rotated up in a receptor-accessible conformation. We also provide biophysical and structural evidence that the 2019-nCoV S protein binds angiotensin-converting enzyme 2 (ACE2) with higher affinity than does severe acute respiratory syndrome (SARS)-CoV S. Additionally, we tested several published SARS-CoV RBD-specific monoclonal antibodies and found that they do not have appreciable binding to 2019-nCoV S, suggesting that antibody cross-reactivity may be limited between the two RBDs. The structure of 2019-nCoV S should enable the rapid development and evaluation of medical countermeasures to address the ongoing public health crisis.
BACKGROUND: A short battery of physical performance tests was used to assess lower extremity function in more than 5,000 persons age 71 years and older in three communities. METHODS: Balance, gait, strength, and endurance were evaluated by examining ability to stand with the feet together in the side-by-side, semi-tandem, and tandem positions, time to walk 8 feet, and time to rise from a chair and return to the seated position 5 times. RESULTS: A wide distribution of performance was observed for each test. Each test and a summary performance scale, created by summing categorical rankings of performance on each test, were strongly associated with self-report of disability. Both self-report items and performance tests were independent predictors of short-term mortality and nursing home admission in multivariate analyses. However, evidence is presented that the performance tests provide information not available from self-report items. Of particular importance is the finding that in those at the high end of the functional spectrum, who reported almost no disability, the performance test scores distinguished a gradient of risk for mortality and nursing home admission. Additionally, within subgroups with identical self-report profiles, there were systematic differences in physical performance related to age and sex. CONCLUSION: This study provides evidence that performance measures can validly characterize older persons across a broad spectrum of lower extremity function. Performance and self-report measures may complement each other in providing useful information about functional status.
Importance: Approximately 80% of US adults and adolescents are insufficiently active. Physical activity fosters normal growth and development and can make people feel, function, and sleep better and reduce risk of many chronic diseases. Objective: To summarize key guidelines in the Physical Activity Guidelines for Americans, 2nd edition (PAG). Process and Evidence Synthesis: The 2018 Physical Activity Guidelines Advisory Committee conducted a systematic review of the science supporting physical activity and health. The committee addressed 38 questions and 104 subquestions and graded the evidence based on consistency and quality of the research. Evidence graded as strong or moderate was the basis of the key guidelines. The Department of Health and Human Services (HHS) based the PAG on the 2018 Physical Activity Guidelines Advisory Committee Scientific Report. Recommendations: The PAG provides information and guidance on the types and amounts of physical activity to improve a variety of health outcomes for multiple population groups. Preschool-aged children (3 through 5 years) should be physically active throughout the day to enhance growth and development. Children and adolescents aged 6 through 17 years should do 60 minutes or more of moderate-to-vigorous physical activity daily. Adults should do at least 150 minutes to 300 minutes a week of moderate-intensity, or 75 minutes to 150 minutes a week of vigorous-intensity aerobic physical activity, or an equivalent combination of moderate- and vigorous-intensity aerobic activity. They should also do muscle-strengthening activities on 2 or more days a week. Older adults should do multicomponent physical activity that includes balance training as well as aerobic and muscle-strengthening activities. Pregnant and postpartum women should do at least 150 minutes of moderate-intensity aerobic activity a week. Adults with chronic conditions or disabilities, who are able, should follow the key guidelines for adults and do both aerobic and muscle-strengthening activities. Recommendations emphasize that moving more and sitting less will benefit nearly everyone. Individuals performing the least physical activity benefit most by even modest increases in moderate-to-vigorous physical activity. Additional benefits occur with more physical activity. Both aerobic and muscle-strengthening physical activity are beneficial. Conclusions and Relevance: The Physical Activity Guidelines for Americans, 2nd edition, provides information and guidance on the types and amounts of physical activity that provide substantial health benefits. Health professionals and policy makers should facilitate awareness of the guidelines and promote the health benefits of physical activity and support efforts to implement programs, practices, and policies to facilitate increased physical activity and to improve the health of the US population.