
National Institute of Genetics
facilityMishima, Japan
Research output, citation impact, and the most-cited recent papers from National Institute of Genetics (Japan). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from National Institute of Genetics
Bacillus subtilis is the best-characterized member of the Gram-positive bacteria. Its genome of 4,214,810 base pairs comprises 4,100 protein-coding genes. Of these protein-coding genes, 53% are represented once, while a quarter of the genome corresponds to several gene families that have been greatly expanded by gene duplication, the largest family containing 77 putative ATP-binding transport proteins. In addition, a large proportion of the genetic capacity is devoted to the utilization of a variety of carbon sources, including many plant-derived molecules. The identification of five signal peptidase genes, as well as several genes for components of the secretion apparatus, is important given the capacity of Bacillus strains to secrete large amounts of industrially important enzymes. Many of the genes are involved in the synthesis of secondary metabolites, including antibiotics, that are more typically associated with Streptomyces species. The genome contains at least ten prophages or remnants of prophages, indicating that bacteriophage infection has played an important evolutionary role in horizontal gene transfer, in particular in the propagation of bacterial pathogenesis.
The combination of significantly lower cost and increased speed of sequencing has resulted in an explosive growth of data submitted into the primary next-generation sequence data archive, the Sequence Read Archive (SRA). The preservation of experimental data is an important part of the scientific record, and increasing numbers of journals and funding agencies require that next-generation sequence data are deposited into the SRA. The SRA was established as a public repository for the next-generation sequence data and is operated by the International Nucleotide Sequence Database Collaboration (INSDC). INSDC partners include the National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EBI) and the DNA Data Bank of Japan (DDBJ). The SRA is accessible at http://www.ncbi.nlm.nih.gov/Traces/sra from NCBI, at http://www.ebi.ac.uk/ena from EBI and at http://trace.ddbj.nig.ac.jp from DDBJ. In this article, we present the content and structure of the SRA, detail our support for sequencing platforms and provide recommended data submission levels and formats. We also briefly outline our response to the challenge of data growth.
Macroautophagy mediates the bulk degradation of cytoplasmic components. It accounts for the degradation of most long-lived proteins: cytoplasmic constituents, including organelles, are sequestered into autophagosomes, which subsequently fuse with lysosomes, where degradation occurs. Although the possible involvement of autophagy in homeostasis, development, cell death, and pathogenesis has been repeatedly pointed out, systematic in vivo analysis has not been performed in mammals, mainly because of a limitation of monitoring methods. To understand where and when autophagy occurs in vivo, we have generated transgenic mice systemically expressing GFP fused to LC3, which is a mammalian homologue of yeast Atg8 (Aut7/Apg8) and serves as a marker protein for autophagosomes. Fluorescence microscopic analyses revealed that autophagy is differently induced by nutrient starvation in most tissues. In some tissues, autophagy even occurs actively without starvation treatments. Our results suggest that the regulation of autophagy is organ dependent and the role of autophagy is not restricted to the starvation response. This transgenic mouse model is a useful tool to study mammalian autophagy.
We report the draft genome sequence of the model moss Physcomitrella patens and compare its features with those of flowering plants, from which it is separated by more than 400 million years, and unicellular aquatic algae. This comparison reveals genomic changes concomitant with the evolutionary movement to land, including a general increase in gene family complexity; loss of genes associated with aquatic environments (e.g., flagellar arms); acquisition of genes for tolerating terrestrial stresses (e.g., variation in temperature and water availability); and the development of the auxin and abscisic acid signaling pathways for coordinating multicellular growth and dehydration response. The Physcomitrella genome provides a resource for phylogenetic inferences about gene function and for experimental analysis of plant processes through this plant's unique facility for reverse genetics.
Crop domestications are long-term selection experiments that have greatly advanced human civilization. The domestication of cultivated rice (Oryza sativa L.) ranks as one of the most important developments in history. However, its origins and domestication processes are controversial and have long been debated. Here we generate genome sequences from 446 geographically diverse accessions of the wild rice species Oryza rufipogon, the immediate ancestral progenitor of cultivated rice, and from 1,083 cultivated indica and japonica varieties to construct a comprehensive map of rice genome variation. In the search for signatures of selection, we identify 55 selective sweeps that have occurred during domestication. In-depth analyses of the domestication sweeps and genome-wide patterns reveal that Oryza sativa japonica rice was first domesticated from a specific population of O. rufipogon around the middle area of the Pearl River in southern China, and that Oryza sativa indica rice was subsequently developed from crosses between japonica rice and local wild rice as the initial cultivars spread into South East and South Asia. The domestication-associated traits are analysed through high-resolution genetic mapping. This study provides an important resource for rice breeding and an effective genomics approach for crop domestication research. Whole-genome sequences of wild rice and cultivated rice varieties are used to produce a map of rice genome variation, and show that rice was probably first domesticated in southern China. Cultivated rice (Oryza sativa) is thought to have been domesticated from wild rice (Oryza rufipogon) thousands of years ago. This Chinese/Japanese collaboration reports whole-genome sequences from 446 wild rice isolates from across Asia and Oceana, and from more than 1,000 indica and japonica subspecies of cultivated rice. The resulting map of genome variation will be an important resource for rice breeding and for crop-domestication research.
FLOWERING LOCUS T (FT) is a conserved promoter of flowering that acts downstream of various regulatory pathways, including one that mediates photoperiodic induction through CONSTANS (CO), and is expressed in the vasculature of cotyledons and leaves. A bZIP transcription factor, FD, preferentially expressed in the shoot apex is required for FT to promote flowering. FD and FT are interdependent partners through protein interaction and act at the shoot apex to promote floral transition and to initiate floral development through transcriptional activation of a floral meristem identity gene, APETALA1 (AP1). FT may represent a long-distance signal in flowering.
The first chordates appear in the fossil record at the time of the Cambrian explosion, nearly 550 million years ago. The modern ascidian tadpole represents a plausible approximation to these ancestral chordates. To illuminate the origins of chordate and vertebrates, we generated a draft of the protein-coding portion of the genome of the most studied ascidian, Ciona intestinalis . The Ciona genome contains ∼16,000 protein-coding genes, similar to the number in other invertebrates, but only half that found in vertebrates. Vertebrate gene families are typically found in simplified form in Ciona , suggesting that ascidians contain the basic ancestral complement of genes involved in cell signaling and development. The ascidian genome has also acquired a number of lineage-specific innovations, including a group of genes engaged in cellulose metabolism that are related to those in bacteria and fungi.
Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.
Lancelets (‘amphioxus’) are the modern survivors of an ancient chordate lineage, with a fossil record dating back to the Cambrian period. Here we describe the structure and gene content of the highly polymorphic ∼520-megabase genome of the Florida lancelet Branchiostoma floridae, and analyse it in the context of chordate evolution. Whole-genome comparisons illuminate the murky relationships among the three chordate groups (tunicates, lancelets and vertebrates), and allow not only reconstruction of the gene complement of the last common chordate ancestor but also partial reconstruction of its genomic organization, as well as a description of two genome-wide duplications and subsequent reorganizations in the vertebrate lineage. These genome-scale events shaped the vertebrate genome and provided additional genetic variation for exploitation during vertebrate evolution. This issue sees the publication of the draft genome sequence of an animal that has been studied by biologists for many years as a model for a primitive chordate. The amphioxus or lancelet is a small worm-like creature, usually to be found buried in sand on the sea floor. Comparative analysis of the genome of the Florida lancelet, Branchiostoma floridae, reveals 17 ancestral chordate linkage groups conserved in the modern amphioxus and vertebrate genomes despite more than half a billion years of independent evolution. From this it possible to make a virtual reconstruction of the 17 chromosomes of the last common chordate ancestor. This reconstruction conforms that two rounds of whole genome duplication have occurred during evolution of the jawed vertebrate lineage. And it illuminates the murky relationships between the three chordate groups, the tunicates, lancelets and vertebrates. The cover shows four adult amphioxus collected in Apalachee Bay, Florida, with anterior towards the top and dorsal to the right. Yellow ovals are gonads. (Photo by Nicholas Putnam, DOE Joint Genome Institute.
Mammalian cells were observed to die under conditions in which nutrients were depleted and, simultaneously, macroautophagy was inhibited either genetically (by a small interfering RNA targeting Atg5, Atg6/Beclin 1-1, Atg10, or Atg12) or pharmacologically (by 3-methyladenine, hydroxychloroquine, bafilomycin A1, or monensin). Cell death occurred through apoptosis (type 1 cell death), since it was reduced by stabilization of mitochondrial membranes (with Bcl-2 or vMIA, a cytomegalovirus-derived gene) or by caspase inhibition. Under conditions in which the fusion between lysosomes and autophagosomes was inhibited, the formation of autophagic vacuoles was enhanced at a preapoptotic stage, as indicated by accumulation of LC3-II protein, ultrastructural studies, and an increase in the acidic vacuolar compartment. Cells exhibiting a morphology reminiscent of (autophagic) type 2 cell death, however, recovered, and only cells with a disrupted mitochondrial transmembrane potential were beyond the point of no return and inexorably died even under optimal culture conditions. All together, these data indicate that autophagy may be cytoprotective, at least under conditions of nutrient depletion, and point to an important cross talk between type 1 and type 2 cell death pathways.
Rat LC3, a homologue of yeast Atg8 (Aut7/Apg8), localizes to autophagosomal membranes after post-translational modifications. The C-terminal fragment of LC3 is cleaved immediately following synthesis to yield a cytosolic form called LC3-I. A subpopulation of LC3-I is further converted to an autophagosome-associating form, LC3-II. Because yeast Atg8 is conjugated with phosphatidylethanolamine (PE) by a ubiquitin-like system, it has been hypothesized that LC3 is modified in a similar manner. Here, we show that [(14)C]-ethanolamine was preferentially incorporated into LC3-II, suggesting that LC3-II is a PE-conjugated form. LC3-II can be a substrate of mammalian Atg4B, a homologue of yeast Atg8-PE deconjugase, supporting the idea that LC3-II is LC3-PE. Moreover, two other mammalian homologues of yeast Atg8, gamma-aminobutyric-acid-type-A-receptor-associated protein (GABARAP) and Golgi-associated ATPase enhancer of 16 kDa (GATE16) also generate form II, which are recovered in membrane fractions. Generation of the form II correlates with autophagosome association of GABARAP and GATE16. These results suggest that all mammalian Atg8 homologues receive a common modification to associate with autophagosomal membrane as the form II.
Abstract Summary: CRISPRdirect is a simple and functional web server for selecting rational CRISPR/Cas targets from an input sequence. The CRISPR/Cas system is a promising technique for genome engineering which allows target-specific cleavage of genomic DNA guided by Cas9 nuclease in complex with a guide RNA (gRNA), that complementarily binds to a ∼20 nt targeted sequence. The target sequence requirements are twofold. First, the 5′-NGG protospacer adjacent motif (PAM) sequence must be located adjacent to the target sequence. Second, the target sequence should be specific within the entire genome in order to avoid off-target editing. CRISPRdirect enables users to easily select rational target sequences with minimized off-target sites by performing exhaustive searches against genomic sequences. The server currently incorporates the genomic sequences of human, mouse, rat, marmoset, pig, chicken, frog, zebrafish, Ciona, fruit fly, silkworm, Caenorhabditis elegans, Arabidopsis, rice, Sorghum and budding yeast. Availability: Freely available at http://crispr.dbcls.jp/. Contact: y-naito@dbcls.rois.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.
Sexual reproduction is an ancient feature of life on earth, and the familiar X and Y chromosomes in humans and other model species have led to the impression that sex determination mechanisms are old and conserved. In fact, males and females are determined by diverse mechanisms that evolve rapidly in many taxa. Yet this diversity in primary sex-determining signals is coupled with conserved molecular pathways that trigger male or female development. Conflicting selection on different parts of the genome and on the two sexes may drive many of these transitions, but few systems with rapid turnover of sex determination mechanisms have been rigorously studied. Here we survey our current understanding of how and why sex determination evolves in animals and plants and identify important gaps in our knowledge that present exciting research opportunities to characterize the evolutionary forces and molecular pathways underlying the evolution of sex determination.
Summary: We developed a prokaryotic genome annotation pipeline, DFAST, that also supports genome submission to public sequence databases. DFAST was originally started as an on-line annotation server, and to date, over 7000 jobs have been processed since its first launch in 2016. Here, we present a newly implemented background annotation engine for DFAST, which is also available as a standalone command-line program. The new engine can annotate a typical-sized bacterial genome within 10 min, with rich information such as pseudogenes, translation exceptions and orthologous gene assignment between given reference genomes. In addition, the modular framework of DFAST allows users to customize the annotation workflow easily and will also facilitate extensions for new functions and incorporation of new tools in the future. Availability and implementation: The software is implemented in Python 3 and runs in both Python 2.7 and 3.4-on Macintosh and Linux systems. It is freely available at https://github.com/nigyta/dfast_core/under the GPLv3 license with external binaries bundled in the software distribution. An on-line version is also available at https://dfast.nig.ac.jp/. Contact: yn@nig.ac.jp. Supplementary information: Supplementary data are available at Bioinformatics online.
The medaka fish (Oryzias latipes) is a popular pet in Japan and more recently a laboratory model organism for developmental genetics and evolutionary biology. Now the medaka's genome has been sequenced and analysed by a large Japanese consortium. Cichlids and stickleback, which are emerging model systems for understanding the genetic basis of vertebrate speciation, are evolutionarily closer to medaka than zebrafish, so the medaka's genome sequence will yield valuable insights into 400 million years of vertebrate genome evolution. The medaka fish (Oryzias latipes) has long been a popular pet in Japan and more recently a laboratory model organism; it now has its genome sequenced and analysed by a Japanese consortium. Teleosts comprise more than half of all vertebrate species and have adapted to a variety of marine and freshwater habitats1. Their genome evolution and diversification are important subjects for the understanding of vertebrate evolution. Although draft genome sequences of two pufferfishes have been published2,3, analysis of more fish genomes is desirable. Here we report a high-quality draft genome sequence of a small egg-laying freshwater teleost, medaka (Oryzias latipes). Medaka is native to East Asia and an excellent model system for a wide range of biology, including ecotoxicology, carcinogenesis, sex determination4,5,6 and developmental genetics7. In the assembled medaka genome (700 megabases), which is less than half of the zebrafish genome, we predicted 20,141 genes, including ∼2,900 new genes, using 5′-end serial analysis of gene expression tag information. We found single nucleotide polymorphisms (SNPs) at an average rate of 3.42% between the two inbred strains derived from two regional populations; this is the highest SNP rate seen in any vertebrate species. Analyses based on the dense SNP information show a strict genetic separation of 4 million years (Myr) between the two populations, and suggest that differential selective pressures acted on specific gene categories. Four-way comparisons with the human, pufferfish (Tetraodon), zebrafish and medaka genomes revealed that eight major interchromosomal rearrangements took place in a remarkably short period of ∼50 Myr after the whole-genome duplication event in the teleost ancestor and afterwards, intriguingly, the medaka genome preserved its ancestral karyotype for more than 300 Myr.
SUMMARY A new model of mutational production of alleles was proposed which may be appropriate to estimate the number of electrophoretically detectable alleles maintained in a finite population. The model assumes that the entire allelic states are expressed by integers (…, A −1 , A 0 , A 1 , …) and that if an allele changes state by mutation the change occurs in such a way that it moves either one step in the positive direction or one step in the negative direction (see also Fig. 1). It was shown that for this model the ‘effective’ number of selectively neutral alleles maintained in a population of the effective size N e under mutation rate υ per generation is given by When 4 N e υ is small, this differs little from the conventional formula by Kimura & Crow, i.e. n e = 1 + 4 N e υ, but it gives a much smaller estimate than this when 4 N e υ is large.
Abstract The SARS-CoV-2 Omicron BA.1 variant emerged in 2021 1 and has multiple mutations in its spike protein 2 . Here we show that the spike protein of Omicron has a higher affinity for ACE2 compared with Delta, and a marked change in its antigenicity increases Omicron’s evasion of therapeutic monoclonal and vaccine-elicited polyclonal neutralizing antibodies after two doses. mRNA vaccination as a third vaccine dose rescues and broadens neutralization. Importantly, the antiviral drugs remdesivir and molnupiravir retain efficacy against Omicron BA.1. Replication was similar for Omicron and Delta virus isolates in human nasal epithelial cultures. However, in lung cells and gut cells, Omicron demonstrated lower replication. Omicron spike protein was less efficiently cleaved compared with Delta. The differences in replication were mapped to the entry efficiency of the virus on the basis of spike-pseudotyped virus assays. The defect in entry of Omicron pseudotyped virus to specific cell types effectively correlated with higher cellular RNA expression of TMPRSS2 , and deletion of TMPRSS2 affected Delta entry to a greater extent than Omicron. Furthermore, drug inhibitors targeting specific entry pathways 3 demonstrated that the Omicron spike inefficiently uses the cellular protease TMPRSS2, which promotes cell entry through plasma membrane fusion, with greater dependency on cell entry through the endocytic pathway. Consistent with suboptimal S1/S2 cleavage and inability to use TMPRSS2, syncytium formation by the Omicron spike was substantially impaired compared with the Delta spike. The less efficient spike cleavage of Omicron at S1/S2 is associated with a shift in cellular tropism away from TMPRSS2-expressing cells, with implications for altered pathogenesis.
Although many de novo genome assembly projects have recently been conducted using high-throughput sequencers, assembling highly heterozygous diploid genomes is a substantial challenge due to the increased complexity of the de Bruijn graph structure predominantly used. To address the increasing demand for sequencing of nonmodel and/or wild-type samples, in most cases inbred lines or fosmid-based hierarchical sequencing methods are used to overcome such problems. However, these methods are costly and time consuming, forfeiting the advantages of massive parallel sequencing. Here, we describe a novel de novo assembler, Platanus, that can effectively manage high-throughput data from heterozygous samples. Platanus assembles DNA fragments (reads) into contigs by constructing de Bruijn graphs with automatically optimized k-mer sizes followed by the scaffolding of contigs based on paired-end information. The complicated graph structures that result from the heterozygosity are simplified during not only the contig assembly step but also the scaffolding step. We evaluated the assembly results on eukaryotic samples with various levels of heterozygosity. Compared with other assemblers, Platanus yields assembly results that have a larger scaffold NG50 length without any accompanying loss of accuracy in both simulated and real data. In addition, Platanus recorded the largest scaffold NG50 values for two of the three low-heterozygosity species used in the de novo assembly contest, Assemblathon 2. Platanus therefore provides a novel and efficient approach for the assembly of gigabase-sized highly heterozygous genomes and is an attractive alternative to the existing assemblers designed for genomes of lower heterozygosity.
By using two models of evolutionary base substitutions--"three-substitution-type" and "two-frequency-class" models--some formulae are derived which permit a simple estimation of the evolutionary distances (and also the evolutionary rates when the divergence times are known) through comparative studies of DNA (and RNA) sequences. These formulae are applied to estimate the base substitution rates at the first, second, and third positions of codons in genes for presomatotropins, preproinsulins, and alpha- and beta-globins (using comparisons involving mammals). Also, formulae for estimating the synonymous component (at the third codon position) and the standard errors are obtained. It is pointed out that the rates of synonymous base substitutions not only are very high but also are roughly equal to each other between genes even when amino acid-altering substitution rates are quite different and that this is consistent with the neutral mutation-random drift hypothesis of molecular evolution.
Ancient polyploidization events have shaped diverse eukaryotic genomes 1 , including two rounds of whole-genome duplication at the base of the vertebrate radiation 2 . While polyploidy is rare in amniotes, presumably owing to constraints on sex chromosome dosage Polyploidy provides raw material for evolutionary diversification because gene duplicates To explore the origins and consequences of tetraploidy in the African clawed frog, we sequenced the Xenopus laevis genome and compared it to the related diploid X. tropicalis genome. We characterize the allotetraploid origin of X. laevis by partitioning its genome into two homoeologous subgenomes, marked by distinct families of 'fossil' transposable elements. On the basis of the activity of these elements and the age of hundreds of unitary pseudogenes, we estimate that the two diploid progenitor species diverged around 34 million years ago (Ma) and combined to form an allotetraploid around 17-18 Ma. More than 56% of all genes were retained in two homoeologous copies. Protein function, gene expression, and the amount of conserved flanking sequence all correlate with retention rates. The subgenomes have evolved asymmetrically, with one chromosome set more often preserving the ancestral state and the other experiencing more gene loss, deletion, rearrangement, and reduced gene expression.