Kunming Institute of Zoology

facilityKunming, China

Research output, citation impact, and the most-cited recent papers from Kunming Institute of Zoology (China). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works

10.6K

Citations

939.9K

h-index

333

i10-index

15.0K

Also known as

Kunming Institute of Zoology中国科学院昆明动物研究所

Top-cited papers from Kunming Institute of Zoology

Towards complete and error-free genome assemblies of all vertebrate species

Arang Rhie, Shane McCarthy, Olivier Fédrigo, Joana Damas +4 more

2021· Nature3.0Kdoi:10.1038/s41586-021-03451-0

Abstract High-quality and complete reference genome assemblies are fundamental for the application of genomics to biology, disease, and biodiversity conservation. However, such assemblies are available for only a few non-microbial species 1–4 . To address this issue, the international Genome 10K (G10K) consortium 5,6 has worked over a five-year period to evaluate and develop cost-effective methods for assembling highly accurate and nearly complete reference genomes. Here we present lessons learned from generating assemblies for 16 species that represent six major vertebrate lineages. We confirm that long-read sequencing technologies are essential for maximizing genome quality, and that unresolved complex repeats and haplotype heterozygosity are major sources of assembly error when not handled correctly. Our assemblies correct substantial errors, add missing sequence in some of the best historical reference genomes, and reveal biological discoveries. These include the identification of many false gene duplications, increases in gene sizes, chromosome rearrangements that are specific to lineages, a repeated independent chromosome breakpoint in bat genomes, and a canonical GC-rich pattern in protein-coding genes and their regulatory regions. Adopting these lessons, we have embarked on the Vertebrate Genomes Project (VGP), an international effort to generate high-quality, complete reference genomes for all of the roughly 70,000 extant vertebrate species and to help to enable a new era of discovery across the life sciences.

Guidelines for the use and interpretation of assays for monitoring autophagy (4th edition)1

Daniel J. Klionsky, Amal Kamal Abdel‐Aziz, Sara Abdelfatah, Mahmoud Abdellatif +4 more

2021· Autophagy2.6Kdoi:10.1080/15548627.2020.1797280

autophagic responses. Here, we critically discuss current methods of assessing autophagy and the information they can, or cannot, provide. Our ultimate goal is to encourage intellectual and technical innovation in the field.

The role of m6A modification in the biological functions and diseases

Xiulin Jiang, Baiyang Liu, Zhi Nie, Lincan Duan +4 more

2021· Signal Transduction and Targeted Therapy2.3Kdoi:10.1038/s41392-020-00450-x

-methyladenosine (m6A) is the most prevalent, abundant and conserved internal cotranscriptional modification in eukaryotic RNAs, especially within higher eukaryotic cells. m6A modification is modified by the m6A methyltransferases, or writers, such as METTL3/14/16, RBM15/15B, ZC3H3, VIRMA, CBLL1, WTAP, and KIAA1429, and, removed by the demethylases, or erasers, including FTO and ALKBH5. It is recognized by m6A-binding proteins YTHDF1/2/3, YTHDC1/2 IGF2BP1/2/3 and HNRNPA2B1, also known as "readers". Recent studies have shown that m6A RNA modification plays essential role in both physiological and pathological conditions, especially in the initiation and progression of different types of human cancers. In this review, we discuss how m6A RNA methylation influences both the physiological and pathological progressions of hematopoietic, central nervous and reproductive systems. We will mainly focus on recent progress in identifying the biological functions and the underlying molecular mechanisms of m6A RNA methylation, its regulators and downstream target genes, during cancer progression in above systems. We propose that m6A RNA methylation process offer potential targets for cancer therapy in the future.

Analysis of shared heritability in common disorders of the brain

Verneri Anttila, Brendan Bulik‐Sullivan, Hilary K. Finucane, Raymond K. Walters +4 more

2018· Science2.0Kdoi:10.1126/science.aap8757

Disorders of the brain can exhibit considerable epidemiological comorbidity and often share symptoms, provoking debate about their etiologic overlap. We quantified the genetic sharing of 25 brain disorders from genome-wide association studies of 265,218 patients and 784,643 control participants and assessed their relationship to 17 phenotypes from 1,191,588 individuals. Psychiatric disorders share common variant risk, whereas neurological disorders appear more distinct from one another and from the psychiatric disorders. We also identified significant sharing between disorders and a number of brain phenotypes, including cognitive measures. Further, we conducted simulations to explore how statistical power, diagnostic misclassification, and phenotypic heterogeneity affect genetic correlations. These results highlight the importance of common genetic variation as a risk factor for brain disorders and the value of heritability-based methods in understanding their etiology.

The sequence and de novo assembly of the giant panda genome

Ruiqiang Li, Wei Fan, Geng Tian, Hongmei Zhu +4 more

2009· Nature1.2Kdoi:10.1038/nature08696

Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes. The genome of the giant panda — specifically of the female Beijing Olympics mascot Jingjing — has been determined using short-read sequencing technology, a first for such a complex genome. It consists of some 2.4 billion DNA base pairs, compared to 3 billion in humans, and contains around 21,000 protein-encoding genes, similar to the human genome. Genomic diversity reflected in the sequence is high, raising hopes that despite a population of only about 2,500, conservation efforts can keep the species from extinction. Intriguingly, the panda appears to have all the genes needed for a carnivorous digestive system but lacks digestive cellulase genes. It may therefore depend on its gut microbiome to handle its famously limited bamboo diet. Taste may be a diet-limiting factor: loss of function of the T1R1 gene means that pandas may not experience the umami taste associated with high-protein foods. Technical aspects of this work pave the way for the use of next-generation sequencing for rapid de novo assembly of large eukaryotic genomes. Here, a draft sequence of the giant panda genome is assembled using next-generation sequencing technology alone. Genome analysis reveals a low divergence rate in comparison with dog and human genomes and insights into panda-specific traits; for example, the giant panda's bamboo diet may be more dependent on its gut microbiome than its own genetic composition.

Comparative genomics reveals insights into avian genome evolution and adaptation

Guojie Zhang, Cai Li, Qiye Li, Bo Li +4 more

2014· Science1.2Kdoi:10.1126/science.1251385

Birds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.

Earth BioGenome Project: Sequencing life for the future of life

Harris A. Lewin, Gene E. Robinson, W. John Kress, William J. Baker +4 more

2018· Proceedings of the National Academy of Sciences1.1Kdoi:10.1073/pnas.1720115115

Increasing our understanding of Earth's biodiversity and responsibly stewarding its resources are among the most crucial scientific and social challenges of the new millennium. These challenges require fundamental new knowledge of the organization, evolution, functions, and interactions among millions of the planet's organisms. Herein, we present a perspective on the Earth BioGenome Project (EBP), a moonshot for biology that aims to sequence, catalog, and characterize the genomes of all of Earth's eukaryotic biodiversity over a period of 10 years. The outcomes of the EBP will inform a broad range of major issues facing humanity, such as the impact of climate change on biodiversity, the conservation of endangered species and ecosystems, and the preservation and enhancement of ecosystem services. We describe hurdles that the project faces, including data-sharing policies that ensure a permanent, freely available resource for future scientific discovery while respecting access and benefit sharing guidelines of the Nagoya Protocol. We also describe scientific and organizational challenges in executing such an ambitious project, and the structure proposed to achieve the project's goals. The far-reaching potential benefits of creating an open digital repository of genomic information for life on Earth can be realized only by a coordinated international effort.

Apoptosis, autophagy, necroptosis, and cancer metastasis

Zhenyi Su, Zuozhang Yang, Yongqing Xu, Yongbin Chen +1 more

2015· Molecular Cancer1.1Kdoi:10.1186/s12943-015-0321-5

Metastasis is a crucial hallmark of cancer progression, which involves numerous factors including the degradation of the extracellular matrix (ECM), the epithelial-to-mesenchymal transition (EMT), tumor angiogenesis, the development of an inflammatory tumor microenvironment, and defects in programmed cell death. Programmed cell death, such as apoptosis, autophagy, and necroptosis, plays crucial roles in metastatic processes. Malignant tumor cells must overcome these various forms of cell death to metastasize. This review summarizes the recent advances in the understanding of the mechanisms by which key regulators of apoptosis, autophagy, and necroptosis participate in cancer metastasis and discusses the crosstalk between apoptosis, autophagy, and necroptosis involved in the regulation of cancer metastasis.

The Genomes of Oryza sativa: A History of Duplications

Jun Yu, Jun Wang, Wei Lin, Songgang Li +4 more

2005· PLoS Biology1.0Kdoi:10.1371/journal.pbio.0030038

We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000-40,000. Only 2%-3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family.

The yak genome and adaptation to life at high altitude

Qiang Qiu, Guojie Zhang, Tao Ma, Wubin Qian +4 more

2012· Nature Genetics1.0Kdoi:10.1038/ng.2343

Domestic yaks (Bos grunniens) provide meat and other necessities for Tibetans living at high altitude on the Qinghai-Tibetan Plateau and in adjacent regions. Comparison between yak and the closely related low-altitude cattle (Bos taurus) is informative in studying animal adaptation to high altitude. Here, we present the draft genome sequence of a female domestic yak generated using Illumina-based technology at 65-fold coverage. Genomic comparisons between yak and cattle identify an expansion in yak of gene families related to sensory perception and energy metabolism, as well as an enrichment of protein domains involved in sensing the extracellular environment and hypoxic stress. Positively selected and rapidly evolving genes in the yak lineage are also found to be significantly enriched in functional categories and pathways related to hypoxia and nutrition metabolism. These findings may have important implications for understanding adaptation to high altitude in other animal species and for hypoxia-related diseases in humans.

Genetic Evidence for an East Asian Origin of Domestic Dogs

Peter Savolainen, Ya‐Ping Zhang, Jing Luo, Joakim Lundeberg +1 more

2002· Science961doi:10.1126/science.1073906

The origin of the domestic dog from wolves has been established, but the number of founding events, as well as where and when these occurred, is not known. To address these questions, we examined the mitochondrial DNA (mtDNA) sequence variation among 654 domestic dogs representing all major dog populations worldwide. Although our data indicate several maternal origins from wolf, >95% of all sequences belonged to three phylogenetic groups universally represented at similar frequencies, suggesting a common origin from a single gene pool for all dog populations. A larger genetic variation in East Asia than in other regions and the pattern of phylogeographic variation suggest an East Asian origin for the domestic dog, approximately 15,000 years ago.

Whole-genome sequence of a flatfish provides insights into ZW sex chromosome evolution and adaptation to a benthic lifestyle

Songlin Chen, Guojie Zhang, Changwei Shao, Quanfei Huang +4 more

2014· Nature Genetics859doi:10.1038/ng.2890

Songlin Chen and colleagues sequenced the whole genomes of a male (ZZ) and a female (ZW) Chinese half-smooth tongue sole, Cynoglossus semilaevis. Their analysis provides insights into the structure and evolution of the sex chromosomes and adaptation to the benthic lifestyle of this flatfish. Genetic sex determination by W and Z chromosomes has developed independently in different groups of organisms. To better understand the evolution of sex chromosomes and the plasticity of sex-determination mechanisms, we sequenced the whole genomes of a male (ZZ) and a female (ZW) half-smooth tongue sole (Cynoglossus semilaevis). In addition to insights into adaptation to a benthic lifestyle, we find that the sex chromosomes of these fish are derived from the same ancestral vertebrate protochromosome as the avian W and Z chromosomes. Notably, the same gene on the Z chromosome, dmrt1, which is the male-determining gene in birds, showed convergent evolution of features that are compatible with a similar function in tongue sole. Comparison of the relatively young tongue sole sex chromosomes with those of mammals and birds identified events that occurred during the early phase of sex-chromosome evolution. Pertinent to the current debate about heterogametic sex-chromosome decay, we find that massive gene loss occurred in the wake of sex-chromosome 'birth'.

PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme

Aimin Li, Junying Zhang, Zhongyin Zhou

2014· BMC Bioinformatics853doi:10.1186/1471-2105-15-311

BACKGROUND: High-throughput transcriptome sequencing (RNA-seq) technology promises to discover novel protein-coding and non-coding transcripts, particularly the identification of long non-coding RNAs (lncRNAs) from de novo sequencing data. This requires tools that are not restricted by prior gene annotations, genomic sequences and high-quality sequencing. RESULTS: We present an alignment-free tool called PLEK (predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer scheme), which uses a computational pipeline based on an improved k-mer scheme and a support vector machine (SVM) algorithm to distinguish lncRNAs from messenger RNAs (mRNAs), in the absence of genomic sequences or annotations. The performance of PLEK was evaluated on well-annotated mRNA and lncRNA transcripts. 10-fold cross-validation tests on human RefSeq mRNAs and GENCODE lncRNAs indicated that our tool could achieve accuracy of up to 95.6%. We demonstrated the utility of PLEK on transcripts from other vertebrates using the model built from human datasets. PLEK attained >90% accuracy on most of these datasets. PLEK also performed well using a simulated dataset and two real de novo assembled transcriptome datasets (sequenced by PacBio and 454 platforms) with relatively high indel sequencing errors. In addition, PLEK is approximately eightfold faster than a newly developed alignment-free tool, named Coding-Non-Coding Index (CNCI), and 244 times faster than the most popular alignment-based tool, Coding Potential Calculator (CPC), in a single-threading running manner. CONCLUSIONS: PLEK is an efficient alignment-free computational tool to distinguish lncRNAs from mRNAs in RNA-seq transcriptomes of species lacking reference genomes. PLEK is especially suitable for PacBio or 454 sequencing data and large-scale transcriptome data. Its open-source software can be freely downloaded from https://sourceforge.net/projects/plek/files/.

Natural selection on EPAS1 ( HIF2α ) associated with low hemoglobin concentration in Tibetan highlanders

Cynthia M. Beall, Gianpiero L. Cavalleri, Libin Deng, Robert C. Elston +4 more

2010· Proceedings of the National Academy of Sciences838doi:10.1073/pnas.1002443107

By impairing both function and survival, the severe reduction in oxygen availability associated with high-altitude environments is likely to act as an agent of natural selection. We used genomic and candidate gene approaches to search for evidence of such genetic selection. First, a genome-wide allelic differentiation scan (GWADS) comparing indigenous highlanders of the Tibetan Plateau (3,200-3,500 m) with closely related lowland Han revealed a genome-wide significant divergence across eight SNPs located near EPAS1. This gene encodes the transcription factor HIF2alpha, which stimulates production of red blood cells and thus increases the concentration of hemoglobin in blood. Second, in a separate cohort of Tibetans residing at 4,200 m, we identified 31 EPAS1 SNPs in high linkage disequilibrium that correlated significantly with hemoglobin concentration. The sex-adjusted hemoglobin concentration was, on average, 0.8 g/dL lower in the major allele homozygotes compared with the heterozygotes. These findings were replicated in a third cohort of Tibetans residing at 4,300 m. The alleles associating with lower hemoglobin concentrations were correlated with the signal from the GWADS study and were observed at greatly elevated frequencies in the Tibetan cohorts compared with the Han. High hemoglobin concentrations are a cardinal feature of chronic mountain sickness offering one plausible mechanism for selection. Alternatively, as EPAS1 is pleiotropic in its effects, selection may have operated on some other aspect of the phenotype. Whichever of these explanations is correct, the evidence for genetic selection at the EPAS1 locus from the GWADS study is supported by the replicated studies associating function with the allelic variants.

A Global Deal For Nature: Guiding principles, milestones, and targets

Eric Dinerstein, Carly Vynne, Enric Sala, Anup R. Joshi +4 more

2019· Science Advances772doi:10.1126/sciadv.aaw2869

The Global Deal for Nature (GDN) is a time-bound, science-driven plan to save the diversity and abundance of life on Earth. Pairing the GDN and the Paris Climate Agreement would avoid catastrophic climate change, conserve species, and secure essential ecosystem services. New findings give urgency to this union: Less than half of the terrestrial realm is intact, yet conserving all native ecosystems-coupled with energy transition measures-will be required to remain below a 1.5°C rise in average global temperature. The GDN targets 30% of Earth to be formally protected and an additional 20% designated as climate stabilization areas, by 2030, to stay below 1.5°C. We highlight the 67% of terrestrial ecoregions that can meet 30% protection, thereby reducing extinction threats and carbon emissions from natural reservoirs. Freshwater and marine targets included here extend the GDN to all realms and provide a pathway to ensuring a more livable biosphere.

Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring

Douglas W. Yu, Yinqiu Ji, Brent C. Emerson, Xiaoyang Wang +3 more

2012· Methods in Ecology and Evolution771doi:10.1111/j.2041-210x.2012.00198.x

Summary 1. Traditional biodiversity assessment is costly in time, money and taxonomic expertise. Moreover, data are frequently collected in ways (e.g. visual bird lists) that are unsuitable for auditing by neutral parties, which is necessary for dispute resolution. 2. We present protocols for the extraction of ecological, taxonomic and phylogenetic information from bulk samples of arthropods. The protocols combine mass trapping of arthropods, mass‐PCR amplification of the COI barcode gene, pyrosequencing and bioinformatic analysis, which together we call ‘metabarcoding’. 3. We construct seven communities of arthropods (mostly insects) and show that it is possible to recover a substantial proportion of the original taxonomic information. We further demonstrate, for the first time, that metabarcoding allows for the precise estimation of pairwise community dissimilarity (beta diversity) and within‐community phylogenetic diversity (alpha diversity), despite the inevitable loss of taxonomic information inherent to metabarcoding. 4. Alpha and beta diversity metrics are the raw materials of ecology and the environmental sciences, facilitating assessment of the state of the environment with a broad and efficient measure of biodiversity.

Phylogenetic analysis of global hepatitis E virus sequences: genetic diversity, subtypes and zoonosis

Ling Lü, Chunhua Li, Curt H. Hagedorn

2005· Reviews in Medical Virology736doi:10.1002/rmv.482

Nucleotide sequences from a total of 421 HEV isolates were retrieved from Genbank and analysed. Phylogenetically, HEV was classified into four major genotypes. Genotype 1 was more conserved and classified into five subtypes. The number of genotype 2 sequences was limited but can be classified into two subtypes. Genotypes 3 and 4 were extremely diverse and can be subdivided into ten and seven subtypes. Geographically, genotype 1 was isolated from tropical and several subtropical countries in Asia and Africa, and genotype 2 was from Mexico, Nigeria, and Chad; whereas genotype 3 was identified almost worldwide including Asia, Europe, Oceania, North and South America. In contrast, genotype 4 was found exclusively in Asia. It is speculated that genotype 3 originated in the western hemisphere and was imported to several Asian countries such as Japan, Korea and Taiwan, while genotype 4 has been indigenous and likely restricted to Asia. Genotypes 3 and 4 were not only identified in swine but also in wild animals such as boar and a deer. Furthermore, in most areas where genotypes 3 and 4 were characterised, sequences from both humans and animals were highly conserved, indicating they originated from the same infectious sources. Based upon nucleotide differences from five phylogenies, it is proposed that five, two, ten and seven subtypes for HEV genotypes 1, 2, 3 and 4 be designated alphabetised subtypes. Accordingly, a total of 24 subtypes (1a, 1b, 1c, 1d, 1e, 2a, 2b, 3a, 3b, 3c, 3d, 3e, 3f, 3g, 3h, 3i, 3j, 4a, 4b, 4c, 4d, 4e, 4f and 4g) were given.

Reliable, verifiable and efficient monitoring of biodiversity via metabarcoding

Yinqiu Ji, Louise A. Ashton, Scott M. Pedley, David P. Edwards +4 more

2013· Ecology Letters683doi:10.1111/ele.12162

To manage and conserve biodiversity, one must know what is being lost, where, and why, as well as which remedies are likely to be most effective. Metabarcoding technology can characterise the species compositions of mass samples of eukaryotes or of environmental DNA. Here, we validate metabarcoding by testing it against three high-quality standard data sets that were collected in Malaysia (tropical), China (subtropical) and the United Kingdom (temperate) and that comprised 55,813 arthropod and bird specimens identified to species level with the expenditure of 2,505 person-hours of taxonomic expertise. The metabarcode and standard data sets exhibit statistically correlated alpha- and beta-diversities, and the two data sets produce similar policy conclusions for two conservation applications: restoration ecology and systematic conservation planning. Compared with standard biodiversity data sets, metabarcoded samples are taxonomically more comprehensive, many times quicker to produce, less reliant on taxonomic expertise and auditable by third parties, which is essential for dispute resolution.

Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder

Kangning Dong, Shihua Zhang

2022· Nature Communications658doi:10.1038/s41467-022-29439-6

Recent advances in spatially resolved transcriptomics have enabled comprehensive measurements of gene expression patterns while retaining the spatial context of the tissue microenvironment. Deciphering the spatial context of spots in a tissue needs to use their spatial information carefully. To this end, we develop a graph attention auto-encoder framework STAGATE to accurately identify spatial domains by learning low-dimensional latent embeddings via integrating spatial information and gene expression profiles. To better characterize the spatial similarity at the boundary of spatial domains, STAGATE adopts an attention mechanism to adaptively learn the similarity of neighboring spots, and an optional cell type-aware module through integrating the pre-clustering of gene expressions. We validate STAGATE on diverse spatial transcriptomics datasets generated by different platforms with different spatial resolutions. STAGATE could substantially improve the identification accuracy of spatial domains, and denoise the data while preserving spatial expression patterns. Importantly, STAGATE could be extended to multiple consecutive sections to reduce batch effects between sections and extracting three-dimensional (3D) expression domains from the reconstructed 3D tissue effectively.

Progressive Cactus is a multiple-genome aligner for the thousand-genome era

Joel Armstrong, Glenn Hickey, Mark Diekhans, Ian T. Fiddes +4 more

2020· Nature639doi:10.1038/s41586-020-2871-y

Abstract New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies 1–3 . For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database 4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies 5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus 6 , a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.

Search all NobleBlocks papers mentioning “Kunming Institute of Zoology” →