NobleBlocks

BGI Genomics

companyShenzhen, China

Research output, citation impact, and the most-cited recent papers from BGI Genomics. Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works
767
Citations
121.7K
h-index
153
i10-index
1.3K
Also known as
BGI Genomics华大基因股份有限公司

Top-cited papers from BGI Genomics

PopLDdecay: a fast and effective tool for linkage disequilibrium decay analysis based on variant call format files
Chi Zhang, Shan‐Shan Dong, Junyang Xu, Weiming He +1 more
2018· Bioinformatics2.0Kdoi:10.1093/bioinformatics/bty875

MOTIVATION: Linkage disequilibrium (LD) decay is of great interest in population genetic studies. However, no tool is available now to do LD decay analysis from variant call format (VCF) files directly. In addition, generation of pair-wise LD measurements for whole genome SNPs usually resulting in large storage wasting files. RESULTS: We developed PopLDdecay, an open source software, for LD decay analysis from VCF files. It is fast and is able to handle large number of variants from sequencing data. It is also storage saving by avoiding exporting pair-wise results of LD measurements. Subgroup analyses are also supported. AVAILABILITY AND IMPLEMENTATION: PopLDdecay is freely available at https://github.com/BGI-shenzhen/PopLDdecay.

Genomic variation in 3,010 diverse accessions of Asian cultivated rice
Wensheng Wang, Ramil Mauleon, Zhiqiang Hu, Dmytro Chebotarov +4 more
2018· Nature1.9Kdoi:10.1038/s41586-018-0063-9

Here we analyse genetic variation, population structure and diversity among 3,010 diverse Asian cultivated rice (Oryza sativa L.) genomes from the 3,000 Rice Genomes Project. Our results are consistent with the five major groups previously recognized, but also suggest several unreported subpopulations that correlate with geographic location. We identified 29 million single nucleotide polymorphisms, 2.4 million small indels and over 90,000 structural variations that contribute to within- and between-population variation. Using pan-genome analyses, we identified more than 10,000 novel full-length protein-coding genes and a high number of presence-absence variations. The complex patterns of introgression observed in domestication genes are consistent with multiple independent rice domestication events. The public availability of data from the 3,000 Rice Genomes Project provides a resource for rice genomics research and breeding.

Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays
Ao Chen, Sha Liao, Mengnan Cheng, Kailong Ma +4 more
2022· Cell1.6Kdoi:10.1016/j.cell.2022.04.003

Spatially resolved transcriptomic technologies are promising tools to study complex biological processes such as mammalian embryogenesis. However, the imbalance between resolution, gene capture, and field of view of current methodologies precludes their systematic application to analyze relatively large and three-dimensional mid- and late-gestation embryos. Here, we combined DNA nanoball (DNB)-patterned arrays and in situ RNA capture to create spatial enhanced resolution omics-sequencing (Stereo-seq). We applied Stereo-seq to generate the mouse organogenesis spatiotemporal transcriptomic atlas (MOSTA), which maps with single-cell resolution and high sensitivity the kinetics and directionality of transcriptional variation during mouse organogenesis. We used this information to gain insight into the molecular basis of spatial cell heterogeneity and cell fate specification in developing tissues such as the dorsal midbrain. Our panoramic atlas will facilitate in-depth investigation of longstanding questions concerning normal and abnormal mammalian development.

Draft genome sequence of <i>Camellia sinensis</i> var. <i>sinensis</i> provides insights into the evolution of the tea genome and tea quality
Chaoling Wei, Hua Yang, Songbo Wang, Jian Zhao +4 more
2018· Proceedings of the National Academy of Sciences965doi:10.1073/pnas.1719622115

(CSA), is calculated to ∼0.38 to 1.54 million years ago (Mya). Analysis of genic collinearity reveals that the tea genome is the product of two rounds of whole-genome duplications (WGDs) that occurred ∼30 to 40 and ∼90 to 100 Mya. We provide evidence that these WGD events, and subsequent paralogous duplications, had major impacts on the copy numbers of secondary metabolite genes, particularly genes critical to producing three key quality compounds: catechins, theanine, and caffeine. Analyses of transcriptome and phytochemistry data show that amplification and transcriptional divergence of genes encoding a large acyltransferase family and leucoanthocyanidin reductases are associated with the characteristic young leaf accumulation of monomeric galloylated catechins in tea, while functional divergence of a single member of the glutamine synthetase gene family yielded theanine synthetase. This genome sequence will facilitate understanding of tea genome evolution and tea metabolite pathways, and will promote germplasm utilization for breeding improved tea varieties.

LDBlockShow: a fast and convenient tool for visualizing linkage disequilibrium and haplotype blocks based on variant call format files
Shan‐Shan Dong, Weiming He, Jingjing Ji, Chi Zhang +2 more
2020· Briefings in Bioinformatics570doi:10.1093/bib/bbaa227

The triangular correlation heatmap aiming to visualize the linkage disequilibrium (LD) pattern and haplotype block structure of SNPs is ubiquitous component of population-based genetic studies. However, current tools suffered from the problem of time and memory consuming. Here, we developed LDBlockShow, an open source software, for visualizing LD and haplotype blocks from variant call format files. It is time and memory saving. In a test dataset with 100 SNPs from 60 000 subjects, it was at least 10.60 times faster and used only 0.03-13.33% of physical memory as compared with other tools. In addition, it could generate figures that simultaneously display additional statistical context (e.g. association P-values) and genomic region annotations. It can also compress the SVG files with a large number of SNPs and support subgroup analysis. This fast and convenient tool will facilitate the visualization of LD and haplotype blocks for geneticists.

Genome sequence of Gossypium herbaceum and genome updates of Gossypium arboreum and Gossypium hirsutum provide insights into cotton A-genome evolution
Gai Huang, Zhiguo Wu, Richard G. Percy, Mingzhou Bai +4 more
2020· Nature Genetics456doi:10.1038/s41588-020-0607-4

Abstract Upon assembling the first Gossypium herbaceum (A 1 ) genome and substantially improving the existing Gossypium arboreum (A 2 ) and Gossypium hirsutum ((AD) 1 ) genomes, we showed that all existing A-genomes may have originated from a common ancestor, referred to here as A 0 , which was more phylogenetically related to A 1 than A 2 . Further, allotetraploid formation was shown to have preceded the speciation of A 1 and A 2 . Both A-genomes evolved independently, with no ancestor–progeny relationship. Gaussian probability density function analysis indicates that several long-terminal-repeat bursts that occurred from 5.7 million years ago to less than 0.61 million years ago contributed compellingly to A-genome size expansion, speciation and evolution. Abundant species-specific structural variations in genic regions changed the expression of many important genes, which may have led to fiber cell improvement in (AD) 1 . Our findings resolve existing controversial concepts surrounding A-genome origins and provide valuable genomic resources for cotton genetic improvement.

Phylogenomics reveals multiple losses of nitrogen-fixing root nodule symbiosis
Maximilian Griesmann, Yue Chang, Xin Liu, Yue Song +4 more
2018· Science456doi:10.1126/science.aat1743

Genomic traces of symbiosis loss A symbiosis between certain bacteria and their plant hosts delivers fixed nitrogen to the plants. Griesmann et al. sequenced several plant genomes to analyze why nitrogen-fixing symbiosis is irregularly scattered through the evolutionary tree (see the Perspective by Nagy). Various genomes carried traces of lost pathways that could have supported nitrogen-fixing symbiosis. It seems that this symbiosis, which relies on multiple pathways and complex interorganismal signaling, is susceptible to selection and prone to being lost over evolutionary time. Science , this issue p. eaat1743 ; see also p. 125

The asparagus genome sheds light on the origin and evolution of a young Y chromosome
Alex Harkess, Jinsong Zhou, Chunyan Xu, John E. Bowers +4 more
2017· Nature Communications345doi:10.1038/s41467-017-01064-8

Sex chromosomes evolved from autosomes many times across the eukaryote phylogeny. Several models have been proposed to explain this transition, some involving male and female sterility mutations linked in a region of suppressed recombination between X and Y (or Z/W, U/V) chromosomes. Comparative and experimental analysis of a reference genome assembly for a double haploid YY male garden asparagus (Asparagus officinalis L.) individual implicates separate but linked genes as responsible for sex determination. Dioecy has evolved recently within Asparagus and sex chromosomes are cytogenetically identical with the Y, harboring a megabase segment that is missing from the X. We show that deletion of this entire region results in a male-to-female conversion, whereas loss of a single suppressor of female development drives male-to-hermaphrodite conversion. A single copy anther-specific gene with a male sterile Arabidopsis knockout phenotype is also in the Y-specific region, supporting a two-gene model for sex chromosome evolution.

Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement
Ning Yang, Jie Liu, Qiang Gao, Songtao Gui +4 more
2019· Nature Genetics331doi:10.1038/s41588-019-0427-6

Maize is one of the most important crops globally, and it shows remarkable genetic diversity. Knowledge of this diversity could help in crop improvement; however, gold-standard genomes have been elucidated only for modern temperate varieties. Here, we present a high-quality reference genome (contig N50 of 15.78 megabases) of the maize small-kernel inbred line, which is derived from a tropical landrace. Using haplotype maps derived from B73, Mo17 and SK, we identified 80,614 polymorphic structural variants across 521 diverse lines. Approximately 22% of these variants could not be detected by traditional single-nucleotide-polymorphism-based approaches, and some of them could affect gene expression and trait performance. To illustrate the utility of the diverse SK line, we used it to perform map-based cloning of a major effect quantitative trait locus controlling kernel weight—a key trait selected during maize improvement. The underlying candidate gene ZmBARELY ANY MERISTEM1d provides a target for increasing crop yields. A high-quality reference genome of the maize SK inbred line and analyses between the tropical SK line and two other maize genomes, B73 and Mo17, provide insights into structural variation and crop improvement.

Diversification and independent domestication of Asian and European pears
Jun Wu, Yingtao Wang, Jiabao Xu, Schuyler S. Korban +4 more
2018· Genome biology309doi:10.1186/s13059-018-1452-y

BACKGROUND: Pear (Pyrus) is a globally grown fruit, with thousands of cultivars in five domesticated species and dozens of wild species. However, little is known about the evolutionary history of these pear species and what has contributed to the distinct phenotypic traits between Asian pears and European pears. RESULTS: We report the genome resequencing of 113 pear accessions from worldwide collections, representing both cultivated and wild pear species. Based on 18,302,883 identified SNPs, we conduct phylogenetics, population structure, gene flow, and selective sweep analyses. Furthermore, we propose a model for the divergence, dissemination, and independent domestication of Asian and European pears in which pear, after originating in southwest China and then being disseminated throughout central Asia, has eventually spread to western Asia, and then on to Europe. We find evidence for rapid evolution and balancing selection for S-RNase genes that have contributed to the maintenance of self-incompatibility, thus promoting outcrossing and accounting for pear genome diversity across the Eurasian continent. In addition, separate selective sweep signatures between Asian pears and European pears, combined with co-localized QTLs and differentially expressed genes, underline distinct phenotypic fruit traits, including flesh texture, sugar, acidity, aroma, and stone cells. CONCLUSIONS: This study provides further clarification of the evolutionary history of pear along with independent domestication of Asian and European pears. Furthermore, it provides substantive and valuable genomic resources that will significantly advance pear improvement and molecular breeding efforts.

Liriodendron genome sheds light on angiosperm phylogeny and species–pair differentiation
Jinhui Chen, Zhaodong Hao, Xuanmin Guang, Chenxi Zhao +4 more
2018· Nature Plants306doi:10.1038/s41477-018-0323-6

Abstract The genus Liriodendron belongs to the family Magnoliaceae, which resides within the magnoliids, an early diverging lineage of the Mesangiospermae. However, the phylogenetic relationship of magnoliids with eudicots and monocots has not been conclusively resolved and thus remains to be determined 1–6 . Liriodendron is a relict lineage from the Tertiary with two distinct species—one East Asian ( L. chinense (Hemsley) Sargent) and one eastern North American ( L. tulipifera Linn)—identified as a vicariad species pair. However, the genetic divergence and evolutionary trajectories of these species remain to be elucidated at the whole-genome level 7 . Here, we report the first de novo genome assembly of a plant in the Magnoliaceae, L. chinense . Phylogenetic analyses suggest that magnoliids are sister to the clade consisting of eudicots and monocots, with rapid diversification occurring in the common ancestor of these three lineages. Analyses of population genetic structure indicate that L. chinense has diverged into two lineages—the eastern and western groups—in China. While L. tulipifera in North America is genetically positioned between the two L. chinense groups, it is closer to the eastern group. This result is consistent with phenotypic observations that suggest that the eastern and western groups of China may have diverged long ago, possibly before the intercontinental differentiation between L. chinense and L. tulipifera . Genetic diversity analyses show that L. chinense has tenfold higher genetic diversity than L. tulipifera , suggesting that the complicated regions comprising east–west-orientated mountains and the Yangtze river basin (especially near 30° N latitude) in East Asia offered more successful refugia than the south–north-orientated mountain valleys in eastern North America during the Quaternary glacial period.

A reference-grade wild soybean genome
Min Xie, Claire Chung, Man‐Wah Li, Fuk‐Ling Wong +4 more
2019· Nature Communications280doi:10.1038/s41467-019-09142-9

Efficient crop improvement depends on the application of accurate genetic information contained in diverse germplasm resources. Here we report a reference-grade genome of wild soybean accession W05, with a final assembled genome size of 1013.2 Mb and a contig N50 of 3.3 Mb. The analytical power of the W05 genome is demonstrated by several examples. First, we identify an inversion at the locus determining seed coat color during domestication. Second, a translocation event between chromosomes 11 and 13 of some genotypes is shown to interfere with the assignment of QTLs. Third, we find a region containing copy number variations of the Kunitz trypsin inhibitor (KTI) genes. Such findings illustrate the power of this assembly in the analysis of large structural variations in soybean germplasm collections. The wild soybean genome assembly has wide applications in comparative genomic and evolutionary studies, as well as in crop breeding and improvement programs.

Human Endophthalmitis Caused By Pseudorabies Virus Infection, China, 2017
Jingwen Ai, Shanshan Weng, Qi Cheng, Peng Cui +4 more
2018· Emerging infectious diseases267doi:10.3201/eid2406.171612

We report human endophthalmitis caused by pseudorabies virus infection after exposure to sewage on a hog farm in China. High-throughput sequencing and real-time PCR of vitreous humor showed pseudorabies virus sequences. This case showed that pseudorabies virus might infect humans after direct contact with contaminants.

Chromosome-level reference genome and alternative splicing atlas of moso bamboo ( <i>Phyllostachys edulis</i> )
Hansheng Zhao, Zhimin Gao, Le Wang, Jiongliang Wang +4 more
2018· GigaScience257doi:10.1093/gigascience/giy115

Background: Bamboo is one of the most important nontimber forestry products worldwide. However, a chromosome-level reference genome is lacking, and an evolutionary view of alternative splicing (AS) in bamboo remains unclear despite emerging omics data and improved technologies. Results: Here, we provide a chromosome-level de novo genome assembly of moso bamboo (Phyllostachys edulis) using additional abundance sequencing data and a Hi-C scaffolding strategy. The significantly improved genome is a scaffold N50 of 79.90 Mb, approximately 243 times longer than the previous version. A total of 51,074 high-quality protein-coding loci with intact structures were identified using single-molecule real-time sequencing and manual verification. Moreover, we provide a comprehensive AS profile based on the identification of 266,711 unique AS events in 25,225 AS genes by large-scale transcriptomic sequencing of 26 representative bamboo tissues using both the Illumina and Pacific Biosciences sequencing platforms. Through comparisons with orthologous genes in related plant species, we observed that the AS genes are concentrated among more conserved genes that tend to accumulate higher transcript levels and share less tissue specificity. Furthermore, gene family expansion, abundant AS, and positive selection were identified in crucial genes involved in the lignin biosynthetic pathway of moso bamboo. Conclusions: These fundamental studies provide useful information for future in-depth analyses of comparative genome and AS features. Additionally, our results highlight a global perspective of AS during evolution and diversification in bamboo.

Origin and evolution of qingke barley in Tibet
Xingquan Zeng, Yu Guo, Qijun Xu, Martin Mascher +4 more
2018· Nature Communications253doi:10.1038/s41467-018-07920-5

Tibetan barley (Hordeum vulgare L., qingke) is the principal cereal cultivated on the Tibetan Plateau for at least 3,500 years, but its origin and domestication remain unclear. Here, based on deep-coverage whole-genome and published exome-capture resequencing data for a total of 437 accessions, we show that contemporary qingke is derived from eastern domesticated barley and it is introduced to southern Tibet most likely via north Pakistan, India, and Nepal between 4,500 and 3,500 years ago. The low genetic diversity of qingke suggests Tibet can be excluded as a center of origin or domestication for barley. The rapid decrease in genetic diversity from eastern domesticated barley to qingke can be explained by a founder effect from 4,500 to 2,000 years ago. The haplotypes of the five key domestication genes of barley support a feral or hybridization origin for Tibetan weedy barley and reject the hypothesis of native Tibetan wild barley.

Metagenomic analysis revealed the potential role of gut microbiome in gout
Yong‐Liang Chu, Silong Sun, Yufen Huang, Qiang Gao +4 more
2021· npj Biofilms and Microbiomes225doi:10.1038/s41522-021-00235-2

Emerging evidence indicates an association between gut microbiome and arthritis diseases including gout. However, how and which gut bacteria affect host urate degradation and inflammation in gout remains unclear. Here we performed a metagenome analysis on 307 fecal samples from 102 gout patients and 86 healthy controls. Gout metagenomes significantly differed from those of healthy controls. The relative abundances of Prevotella, Fusobacterium, and Bacteroides were increased in gout, whereas those of Enterobacteriaceae and butyrate-producing species were decreased. Functionally, gout patients had greater abundances for genes in fructose, mannose metabolism and lipid A biosynthesis, and lower for genes in urate degradation and short chain fatty acid production. A three-pronged association between metagenomic species, functions and clinical parameters revealed that decreased abundances of species in Enterobacteriaceae were associated with reduced amino acid metabolism and environmental sensing, which together contribute to increased serum uric acid and C-reactive protein levels in gout. A random forest classifier based on three gut microbial genes showed high predictivity for gout in both discovery and validation cohorts (0.91 and 0.80 accuracy), with high specificity in the context of other chronic disorders. Longitudinal analysis showed that uric-acid-lowering and anti-inflammatory drugs partially restored gut microbiota after 24-week treatment. Comparative analysis with obesity, type 2 diabetes, ankylosing spondylitis and rheumatoid arthritis indicated that gout metagenomes were more similar to those of autoimmune than metabolic diseases. Our results suggest that gut dysbiosis was associated with dysregulated host urate degradation and systemic inflammation and may be used as non-invasive diagnostic markers for gout.

Musa balbisiana genome reveals subgenome evolution and functional divergence
Zhuo Wang, Hongxia Miao, Juhua Liu, Biyu Xu +4 more
2019· Nature Plants223doi:10.1038/s41477-019-0452-6

Banana cultivars (Musa ssp.) are diploid, triploid and tetraploid hybrids derived from Musa acuminata and Musa balbisiana. We presented a high-quality draft genome assembly of M. balbisiana with 430 Mb (87%) assembled into 11 chromosomes. We identified that the recent divergence of M. acuminata (A-genome) and M. balbisiana (B-genome) occurred after lineage-specific whole-genome duplication, and that the B-genome may be more sensitive to the fractionation process compared to the A-genome. Homoeologous exchanges occurred frequently between A- and B-subgenomes in allopolyploids. Genomic variation within progenitors resulted in functional divergence of subgenomes. Global homoeologue expression dominance occurred between subgenomes of the allotriploid. Gene families related to ethylene biosynthesis and starch metabolism exhibited significant expansion at the pathway level and wide homoeologue expression dominance in the B-subgenome of the allotriploid. The independent origin of 1-aminocyclopropane-1-carboxylic acid oxidase (ACO) homoeologue gene pairs and tandem duplication-driven expansion of ACO genes in the B-subgenome contributed to rapid and major ethylene production post-harvest in allotriploid banana fruits. The findings of this study provide greater context for understanding fruit biology, and aid the development of tools for breeding optimal banana cultivars.

A Clostridia-rich microbiota enhances bile acid excretion in diarrhea-predominant irritable bowel syndrome
Ling Zhao, Wei Yang, Yang Chen, Fengjie Huang +4 more
2019· Journal of Clinical Investigation190doi:10.1172/jci130976

An excess of fecal bile acids (BAs) is thought to be one of the mechanisms for diarrhea-predominant irritable bowel syndrome (IBS-D). However, the factors causing excessive BA excretion remain incompletely studied. Given the importance of gut microbiota in BA metabolism, we hypothesized that gut dysbiosis might contribute to excessive BA excretion in IBS-D. By performing BA-related metabolic and metagenomic analyses in 290 IBS-D patients and 89 healthy volunteers, we found that 24.5% of IBS-D patients exhibited excessive excretion of total BAs and alteration of BA-transforming bacteria in feces. Notably, the increase in Clostridia bacteria (e.g., C. scindens) was positively associated with the levels of fecal BAs and serum 7α-hydroxy-4-cholesten-3-one (C4), but negatively correlated with serum fibroblast growth factor 19 (FGF19) concentration. Furthermore, colonization with Clostridia-rich IBS-D fecal microbiota or C. scindens individually enhanced serum C4 and hepatic conjugated BAs but reduced ileal FGF19 expression in mice. Inhibition of Clostridium species with vancomycin yielded opposite results. Clostridia-derived BAs suppressed the intestinal FGF19 expression in vitro and in vivo. In conclusion, this study demonstrates that the Clostridia-rich microbiota contributes to excessive BA excretion in IBS-D patients, which provides a mechanistic hypothesis with testable clinical implications.

The pomegranate (<i>Punica granatum</i> L.) genome and the genomics of punicalagin biosynthesis
Gaihua Qin, Chunyan Xu, Ray Ming, Haibao Tang +4 more
2017· The Plant Journal179doi:10.1111/tpj.13625

Pomegranate (Punica granatum L.) is a perennial fruit crop grown since ancient times that has been planted worldwide and is known for its functional metabolites, particularly punicalagins. We have sequenced and assembled the pomegranate genome with 328 Mb anchored into nine pseudo-chromosomes and annotated 29 229 gene models. A Myrtales lineage-specific whole-genome duplication event was detected that occurred in the common ancestor before the divergence of pomegranate and Eucalyptus. Repetitive sequences accounted for 46.1% of the assembled genome. We found that the integument development gene INNER NO OUTER (INO) was under positive selection and potentially contributed to the development of the fleshy outer layer of the seed coat, an edible part of pomegranate fruit. The genes encoding the enzymes for synthesis and degradation of lignin, hemicelluloses and cellulose were also differentially expressed between soft- and hard-seeded varieties, reflecting differences in their accumulation in cultivars differing in seed hardness. Candidate genes for punicalagin biosynthesis were identified and their expression patterns indicated that gallic acid synthesis in tissues could follow different biochemical pathways. The genome sequence of pomegranate provides a valuable resource for the dissection of many biological and biochemical traits and also provides important insights for the acceleration of breeding. Elucidation of the biochemical pathway(s) involved in punicalagin biosynthesis could assist breeding efforts to increase production of this bioactive compound.

High-resolution silkworm pan-genome provides genetic insights into artificial selection and ecological adaptation
Xiaoling Tong, Minjin Han, Kunpeng Lu, Shuaishuai Tai +4 more
2022· Nature Communications168doi:10.1038/s41467-022-33366-x

Abstract The silkworm Bombyx mori is an important economic insect for producing silk, the “queen of fabrics”. The currently available genomes limit the understanding of its genetic diversity and the discovery of valuable alleles for breeding. Here, we deeply re-sequence 1,078 silkworms and assemble long-read genomes for 545 representatives. We construct a high-resolution pan-genome dataset representing almost the entire genomic content in the silkworm. We find that the silkworm population harbors a high density of genomic variants and identify 7308 new genes, 4260 (22%) core genes, and 3,432,266 non-redundant structure variations (SVs). We reveal hundreds of genes and SVs that may contribute to the artificial selection (domestication and breeding) of silkworm. Further, we focus on four genes responsible, respectively, for two economic (silk yield and silk fineness) and two ecologically adaptive traits (egg diapause and aposematic coloration). Taken together, our population-scale genomic resources will promote functional genomics studies and breeding improvement for silkworm.