NobleBlocks
Joint Genome Institute logo

Joint Genome Institute

facilityBerkeley, United States

Research output, citation impact, and the most-cited recent papers from Joint Genome Institute (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works
12.7K
Citations
3.5M
h-index
891
i10-index
14.6K
Also known as
Joint Genome InstituteLawrence Berkeley National Laboratory Joint Genome InstituteU.S. Department of Energy Joint Genome InstituteU.S. Department of Energy Office of Science Lawrence Berkeley National Laboratory Joint Genome InstituteUnited States Department of Energy Joint Genome InstituteUnited States Department of Energy Office of Science Lawrence Berkeley National Laboratory Joint Genome Institute

Top-cited papers from Joint Genome Institute

Initial sequencing and analysis of the human genome
Eric S. Lander, Lauren Linton, Bruce W. Birren, Chad Nusbaum +4 more
2001· Nature24.5Kdoi:10.1038/35057062

The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

Prodigal: prokaryotic gene recognition and translation initiation site identification
Doug Hyatt, Gwo-Liang Chen, Philip LoCascio, Miriam Land +2 more
2010· BMC Bioinformatics12.9Kdoi:10.1186/1471-2105-11-119

BACKGROUND: The quality of automated gene prediction in microbial organisms has improved steadily over the past decade, but there is still room for improvement. Increasing the number of correct identifications, both of genes and of the translation initiation sites for each gene, and reducing the overall number of false positives, are all desirable goals. RESULTS: With our years of experience in manually curating genomes for the Joint Genome Institute, we developed a new gene prediction algorithm called Prodigal (PROkaryotic DYnamic programming Gene-finding ALgorithm). With Prodigal, we focused specifically on the three goals of improved gene structure prediction, improved translation initiation site recognition, and reduced false positives. We compared the results of Prodigal to existing gene-finding methods to demonstrate that it met each of these objectives. CONCLUSION: We built a fast, lightweight, open source gene prediction program called Prodigal http://compbio.ornl.gov/prodigal/. Prodigal achieved good results compared to existing methods, and we believe it will be a valuable asset to automated microbial annotation pipelines.

Structure, function and diversity of the healthy human microbiome
Curtis Huttenhower, J. Fah Sathirapongsasuti, Nicola Segata,  Curtis Huttenhower +4 more
2012· Nature11.9Kdoi:10.1038/nature11234

Studies of the human microbiome have revealed that even healthy individuals differ remarkably in the microbes that occupy habitats such as the gut, skin and vagina. Much of this diversity remains unexplained, although diet, environment, host genetics and early microbial exposure have all been implicated. Accordingly, to characterize the ecology of human-associated microbial communities, the Human Microbiome Project has analysed the largest cohort and set of distinct, clinically relevant body habitats so far. We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals. The project encountered an estimated 81–99% of the genera, enzyme families and community configurations occupied by the healthy Western microbiome. Metagenomic carriage of metabolic pathways was stable among individuals despite variation in community structure, and ethnic/racial background proved to be one of the strongest associations of both pathways and microbes with clinical metadata. These results thus delineate the range of structural and functional configurations normal in the microbial communities of a healthy population, enabling future characterization of the epidemiology, ecology and translational applications of the human microbiome. The Human Microbiome Project Consortium reports the first results of their analysis of microbial communities from distinct, clinically relevant body habitats in a human cohort; the insights into the microbial communities of a healthy population lay foundations for future exploration of the epidemiology, ecology and translational applications of the human microbiome. The Human Microbiome Project (HMP), supported by the National Institutes of Health Common Fund, has the goal of characterizing the microbial communities that inhabit and interact with the human body in sickness and in health. In two Articles in this issue of Nature, the HMP Consortium presents the first population-scale details of the organismal and functional composition of the microbiota across five areas of the body. An associated News & Views discusses the initial results — which, along with those of a series of co-publications, already constitute the most extensive catalogue of organisms and genes related to the human microbiome yet published — and highlights some of the major questions that the project will tackle in the next few years.

Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB
Todd Z. DeSantis, Philip Hugenholtz, N. Larsen, Mark Rojas +4 more
2006· Applied and Environmental Microbiology11.2Kdoi:10.1128/aem.03006-05

A 16S rRNA gene database (http://greengenes.lbl.gov) addresses limitations of public repositories by providing chimera screening, standard alignment, and taxonomic classification using multiple published taxonomies. It was found that there is incongruent taxonomic nomenclature among curators even at the phylum level. Putative chimeras were identified in 3% of environmental sequences and in 0.2% of records derived from isolates. Environmental sequences were classified into 100 phylum-level lineages in the Archaea and Bacteria.

Phytozome: a comparative platform for green plant genomics
David Goodstein, Shengqiang Shu, Russell W. Howson, Rochak Neupane +4 more
2011· Nucleic Acids Research5.7Kdoi:10.1093/nar/gkr944

The number of sequenced plant genomes and associated genomic resources is growing rapidly with the advent of both an increased focus on plant genomics from funding agencies, and the application of inexpensive next generation sequencing. To interact with this increasing body of data, we have developed Phytozome (http://www.phytozome.net), a comparative hub for plant genome and gene family data and analysis. Phytozome provides a view of the evolutionary history of every plant gene at the level of sequence, gene structure, gene family and genome organization, while at the same time providing access to the sequences and functional annotations of a growing number (currently 25) of complete plant genomes, including all the land plants and selected algae sequenced at the Joint Genome Institute, as well as selected species sequenced elsewhere. Through a comprehensive plant genome database and web portal, these data and analyses are available to the broader plant science research community, providing powerful comparative genomics tools that help to link model systems with other plants of economic and ecological importance.

The Genome of Black Cottonwood, <i>Populus trichocarpa</i> (Torr. &amp; Gray)
Gerald A. Tuskan, Stephen DiFazio, Stefan Jansson, Jöerg Bohlmann +4 more
2006· Science4.4Kdoi:10.1126/science.1128691

We report the draft genome of the black cottonwood tree, Populus trichocarpa. Integration of shotgun sequence assembly with genetic mapping enabled chromosome-scale reconstruction of the genome. More than 45,000 putative protein-coding genes were identified. Analysis of the assembled genome revealed a whole-genome duplication event; about 8000 pairs of duplicated genes from that event survived in the Populus genome. A second, older duplication event is indistinguishably coincident with the divergence of the Populus and Arabidopsis lineages. Nucleotide substitution, tandem gene duplication, and gross chromosomal rearrangement appear to proceed substantially more slowly in Populus than in Arabidopsis. Populus has more protein-coding genes than Arabidopsis, ranging on average from 1.4 to 1.6 putative Populus homologs for each Arabidopsis gene. However, the relative frequency of protein domains in the two genomes is similar. Overrepresented exceptions in Populus include genes associated with lignocellulosic wall biosynthesis, meristem development, disease resistance, and metabolite transport.

MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies
Dongwan Kang, Feng Li, Edward Kirton, Ashleigh Thomas +3 more
2019· PeerJ3.9Kdoi:10.7717/peerj.7359

We previously reported on MetaBAT, an automated metagenome binning software tool to reconstruct single genomes from microbial communities for subsequent analyses of uncultivated microbial species. MetaBAT has become one of the most popular binning tools largely due to its computational efficiency and ease of use, especially in binning experiments with a large number of samples and a large assembly. MetaBAT requires users to choose parameters to fine-tune its sensitivity and specificity. If those parameters are not chosen properly, binning accuracy can suffer, especially on assemblies of poor quality. Here, we developed MetaBAT 2 to overcome this problem. MetaBAT 2 uses a new adaptive binning algorithm to eliminate manual parameter tuning. We also performed extensive software engineering optimization to increase both computational and memory efficiency. Comparing MetaBAT 2 to alternative software tools on over 100 real world metagenome assemblies shows superior accuracy and computing speed. Binning a typical metagenome assembly takes only a few minutes on a single commodity workstation. We therefore recommend the community adopts MetaBAT 2 for their metagenome binning experiments. MetaBAT 2 is open source software and available at https://bitbucket.org/berkeleylab/metabat .

The Sorghum bicolor genome and the diversification of grasses
Andrew H. Paterson, John Bowers, Rémy Bruggmann, Inna Dubchak +4 more
2009· Nature3.2Kdoi:10.1038/nature07723

Sorghum, an African grass related to sugar cane and maize, is grown for food, feed, fibre and fuel. We present an initial analysis of the ∼730-megabase Sorghum bicolor (L.) Moench genome, placing ∼98% of genes in their chromosomal context using whole-genome shotgun sequence validated by genetic, physical and syntenic information. Genetic recombination is largely confined to about one-third of the sorghum genome with gene order and density similar to those of rice. Retrotransposon accumulation in recombinationally recalcitrant heterochromatin explains the ∼75% larger genome size of sorghum compared with rice. Although gene and repetitive DNA distributions have been preserved since palaeopolyploidization ∼70 million years ago, most duplicated gene sets lost one member before the sorghum–rice divergence. Concerted evolution makes one duplicated chromosomal segment appear to be only a few million years old. About 24% of genes are grass-specific and 7% are sorghum-specific. Recent gene and microRNA duplications may contribute to sorghum’s drought tolerance. The Sorghum bicolor genome sequence is published this week. Sorghum is a cereal grown widely as food, animal feed, fibre and fuel. Tolerant to hot, dry conditions, it is a staple for large populations in the West African Sahel region. Comparisons of the genome with those of maize and rice shed light on the evolution of grasses and of C4 photosynthesis, which is particularly efficient at assimilating carbon at high temperatures. In addition, protein coding genes and miRNAs that could contribute to sorghum's drought tolerance may also be found. Sorghum yield improvement has lagged behind that of other crops and the availability of the genome sequence could provide a vital boost to work on its improvement. Sorghum is an African grass that is grown for food, animal feed and fuel. The current paper presents an initial analysis of the ∼730 megabase genome of Sorghum bicolor. Genome analysis and its comparison with maize and rice shed light on grass genome evolution and also provide insights into the evolution of C4 photosynthesis, as well as protein coding genes and miRNAs that might contribute to sorghum's drought tolerance.

Automatic annotation of organellar genomes with DOGMA
Stacia K. Wyman, Robert K. Jansen, Jeffrey L. Boore
2004· Bioinformatics3.1Kdoi:10.1093/bioinformatics/bth352

Abstract Summary: The Dual Organellar GenoMe Annotator (DOGMA) automates the annotation of organellar (plant chloroplast and animal mitochondrial) genomes. It is a Web-based package that allows the use of BLAST searches against a custom database, and conservation of basepairing in the secondary structure of animal mitochondrial tRNAs to identify and annotate genes. DOGMA provides a graphical user interface for viewing and editing annotations. Annotations are stored on our password-protected server to enable repeated sessions of working on the same genome. Finished annotations can be extracted for direct submission to GenBank. Availability: http://phylocluster.biosci.utexas.edu/dogma/ Supplementary information: Detailed documentation and tutorials for annotating both animal mitochondrial and plant chloroplast genomes can be found on the DOGMA home page.

VISTA: computational tools for comparative genomics
Kelly A. Frazer, Lior Pachter, Alexandre Poliakov, Edward M. Rubin +1 more
2004· Nucleic Acids Research2.9Kdoi:10.1093/nar/gkh458

Comparison of DNA sequences from different species is a fundamental method for identifying functional elements in genomes. Here, we describe the VISTA family of tools created to assist biologists in carrying out this task. Our first VISTA server at http://www-gsd.lbl.gov/vista/ was launched in the summer of 2000 and was designed to align long genomic sequences and visualize these alignments with associated functional annotations. Currently the VISTA site includes multiple comparative genomics tools and provides users with rich capabilities to browse pre-computed whole-genome alignments of large vertebrate genomes and other groups of organisms with VISTA Browser, to submit their own sequences of interest to several VISTA servers for various types of comparative analysis and to obtain detailed comparative analysis results for a set of cardiovascular genes. We illustrate capabilities of the VISTA site by the analysis of a 180 kb interval on human chromosome 5 that encodes for the kinesin family member 3A (KIF3A) protein.

A framework for human microbiome research
Ravi Sanka, Johannes B. Goll, Jason Miller, Leslie Foster +4 more
2012· Nature2.7Kdoi:10.1038/nature11209

A variety of microbial communities and their genes (the microbiome) exist throughout the human body, with fundamental roles in human health and disease. The National Institutes of Health (NIH)-funded Human Microbiome Project Consortium has established a population-scale framework to develop metagenomic protocols, resulting in a broad range of quality-controlled resources and data including standardized methods for creating, processing and interpreting distinct types of high-throughput metagenomic data available to the scientific community. Here we present resources from a population of 242 healthy adults sampled at 15 or 18 body sites up to three times, which have generated 5,177 microbial taxonomic profiles from 16S ribosomal RNA genes and over 3.5 terabases of metagenomic sequence so far. In parallel, approximately 800 reference strains isolated from the human body have been sequenced. Collectively, these data represent the largest resource describing the abundance and variety of the human microbiome, while providing a framework for current and future studies. The Human Microbiome Project Consortium has established a population-scale framework to study a variety of microbial communities that exist throughout the human body, enabling the generation of a range of quality-controlled data as well as community resources. The Human Microbiome Project (HMP), supported by the National Institutes of Health Common Fund, has the goal of characterizing the microbial communities that inhabit and interact with the human body in sickness and in health. In two Articles in this issue of Nature, the HMP Consortium presents the first population-scale details of the organismal and functional composition of the microbiota across five areas of the body. An associated News & Views discusses the initial results — which, along with those of a series of co-publications, already constitute the most extensive catalogue of organisms and genes related to the human microbiome yet published — and highlights some of the major questions that the project will tackle in the next few years.

The <i>Chlamydomonas</i> Genome Reveals the Evolution of Key Animal and Plant Functions
Sabeeha Merchant, Simon Prochnik, Olivier Vallon, Elizabeth H. Harris +4 more
2007· Science2.7Kdoi:10.1126/science.1143609

Chlamydomonas reinhardtii is a unicellular green alga whose lineage diverged from land plants over 1 billion years ago. It is a model system for studying chloroplast-based photosynthesis, as well as the structure, assembly, and function of eukaryotic flagella (cilia), which were inherited from the common ancestor of plants and animals, but lost in land plants. We sequenced the approximately 120-megabase nuclear genome of Chlamydomonas and performed comparative phylogenomic analyses, identifying genes encoding uncharacterized proteins that are likely associated with the function and biogenesis of chloroplasts or eukaryotic flagella. Analyses of the Chlamydomonas genome advance our understanding of the ancestral eukaryotic cell, reveal previously unknown genes associated with photosynthetic and flagellar functions, and establish links between ciliopathy and the composition and function of flagella.

Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea
Robert M. Bowers, Nikos C. Kyrpides, Ramūnas Stepanauskas, Miranda Harmon‐Smith +4 more
2017· Nature Biotechnology2.7Kdoi:10.1038/nbt.3893

We present two standards developed by the Genomic Standards Consortium (GSC) for reporting bacterial and archaeal genome sequences. Both are extensions of the Minimum Information about Any (x) Sequence (MIxS). The standards are the Minimum Information about a Single Amplified Genome (MISAG) and the Minimum Information about a Metagenome-Assembled Genome (MIMAG), including, but not limited to, assembly quality, and estimates of genome completeness and contamination. These standards can be used in combination with other GSC checklists, including the Minimum Information about a Genome Sequence (MIGS), Minimum Information about a Metagenomic Sequence (MIMS), and Minimum Information about a Marker Gene Sequence (MIMARKS). Community-wide adoption of MISAG and MIMAG will facilitate more robust comparative genomic analyses of bacterial and archaeal diversity.

Expanded encyclopaedias of DNA elements in the human and mouse genomes
Federico Abascal, Reyes Acosta, Nicholas J. Addleman, Jessika Adrian +4 more
2020· Nature2.5Kdoi:10.1038/s41586-020-2493-4

Abstract The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal ( https://www.encodeproject.org ), including phase II ENCODE 1 and Roadmap Epigenomics 2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis -regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org ) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.

Insights into the phylogeny and coding potential of microbial dark matter
Christian Rinke, Patrick Schwientek, Alexander Sczyrba, Natalia Ivanova +4 more
2013· Nature2.4Kdoi:10.1038/nature12352

Genome sequencing enhances our understanding of the biological world by providing blueprints for the evolutionary and functional diversity that shapes the biosphere. However, microbial genomes that are currently available are of limited phylogenetic breadth, owing to our historical inability to cultivate most microorganisms in the laboratory. We apply single-cell genomics to target and sequence 201 uncultivated archaeal and bacterial cells from nine diverse habitats belonging to 29 major mostly uncharted branches of the tree of life, so-called ‘microbial dark matter’. With this additional genomic information, we are able to resolve many intra- and inter-phylum-level relationships and to propose two new superphyla. We uncover unexpected metabolic features that extend our understanding of biology and challenge established boundaries between the three domains of life. These include a novel amino acid use for the opal stop codon, an archaeal-type purine synthesis in Bacteria and complete sigma factors in Archaea similar to those in Bacteria. The single-cell genomes also served to phylogenetically anchor up to 20% of metagenomic reads in some habitats, facilitating organism-level interpretation of ecosystem function. This study greatly expands the genomic representation of the tree of life and provides a systematic step towards a better understanding of biological evolution on our planet. Uncultivated archaeal and bacterial cells of major uncharted branches of the tree of life are targeted and sequenced using single-cell genomics; this enables resolution of many intra- and inter-phylum-level relationships, uncovers unexpected metabolic features that challenge established boundaries between the three domains of life, and leads to the proposal of two new superphyla. Currently available genome sequences give us a narrow view of the remarkable diversity of microorganisms because the vast majority of them have never been cultivated in pure culture. Here Tanja Woyke and colleagues use single-cell genomics to target and sequence 201 uncultivated archaeal and bacterial cells from nine diverse habitats. This information reveals numerous intra- and inter-phylum relationships and a number of unexpected metabolic features. On the basis of the new data the authors propose taxonomic revisions to the archaeal and bacterial domains, including a proposal to reorganizing the Archaea into three superphyla.

Frontiers, Opportunities, and Challenges in Biochemical and Chemical Catalysis of CO<sub>2</sub> Fixation
Aaron M. Appel, John E. Bercaw, Andrew B. Bocarsly, Holger Dobbek +4 more
2013· Chemical Reviews2.1Kdoi:10.1021/cr300463y

Two major energy-related problems confront the world in the&#13;\nnext 50 years. First, increased worldwide competition for&#13;\ngradually depleting fossil fuel reserves (derived from past&#13;\nphotosynthesis) will lead to higher costs, both monetarily and politically. Second, atmospheric CO_2 levels are at their highest recorded level since records began. Further increases are predicted to produce large and uncontrollable impacts on the world climate. These projected impacts extend beyond climate to ocean acidification, because the ocean is a major sink for atmospheric CO2.1 Providing a future energy supply that is secure and CO_2-neutral will require switching to nonfossil energy sources such as wind, solar, nuclear, and geothermal energy and developing methods for transforming the energy produced by these new sources into forms that can be stored, transported, and used upon demand.

MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities
Dongwan Kang, Jeff Froula, Rob Egan, Zhong Wang
2015· PeerJ2.1Kdoi:10.7717/peerj.1165

Grouping large genomic fragments assembled from shotgun metagenomic sequences to deconvolute complex microbial communities, or metagenome binning, enables the study of individual organisms and their interactions. Because of the complex nature of these communities, existing metagenome binning methods often miss a large number of microbial species. In addition, most of the tools are not scalable to large datasets. Here we introduce automated software called MetaBAT that integrates empirical probabilistic distances of genome abundance and tetranucleotide frequency for accurate metagenome binning. MetaBAT outperforms alternative methods in accuracy and computational efficiency on both synthetic and real metagenome datasets. It automatically forms hundreds of high quality genome bins on a very large assembly consisting millions of contigs in a matter of hours on a single node. MetaBAT is open source software and available at https://bitbucket.org/berkeleylab/metabat.

AmiGO: online access to ontology and annotation data
Seth Carbon, Amelia Ireland, Chris Mungall, Shengqiang Shu +4 more
2008· Bioinformatics2.1Kdoi:10.1093/bioinformatics/btn615

Abstract AmiGO is a web application that allows users to query, browse and visualize ontologies and related gene product annotation (association) data. AmiGO can be used online at the Gene Ontology (GO) website to access the data provided by the GO Consortium1; it can also be downloaded and installed to browse local ontologies and annotations.2 AmiGO is free open source software developed and maintained by the GO Consortium. Availability: http://amigo.geneontology.org Download: http://sourceforge.net/projects/geneontology/ Contact: sjcarbon@berkeleybop.org

The Genome of the Diatom <i>Thalassiosira Pseudonana</i> : Ecology, Evolution, and Metabolism
E. Virginia Armbrust, John A. Berges, Chris Bowler, Beverley R. Green +4 more
2004· Science2.0Kdoi:10.1126/science.1101156

Diatoms are unicellular algae with plastids acquired by secondary endosymbiosis. They are responsible for approximately 20% of global carbon fixation. We report the 34 million-base pair draft nuclear genome of the marine diatom Thalassiosira pseudonana and its 129 thousand-base pair plastid and 44 thousand-base pair mitochondrial genomes. Sequence and optical restriction mapping revealed 24 diploid nuclear chromosomes. We identified novel genes for silicic acid transport and formation of silica-based cell walls, high-affinity iron uptake, biosynthetic enzymes for several types of polyunsaturated fatty acids, use of a range of nitrogenous compounds, and a complete urea cycle, all attributes that allow diatoms to prosper in aquatic environments.

The <i>Physcomitrella</i> Genome Reveals Evolutionary Insights into the Conquest of Land by Plants
Stefan A. Rensing, Daniel Lang, Andreas Zimmer, Astrid Terry +4 more
2007· Science1.9Kdoi:10.1126/science.1150646

We report the draft genome sequence of the model moss Physcomitrella patens and compare its features with those of flowering plants, from which it is separated by more than 400 million years, and unicellular aquatic algae. This comparison reveals genomic changes concomitant with the evolutionary movement to land, including a general increase in gene family complexity; loss of genes associated with aquatic environments (e.g., flagellar arms); acquisition of genes for tolerating terrestrial stresses (e.g., variation in temperature and water availability); and the development of the auxin and abscisic acid signaling pathways for coordinating multicellular growth and dehydration response. The Physcomitrella genome provides a resource for phylogenetic inferences about gene function and for experimental analysis of plant processes through this plant's unique facility for reverse genetics.