Bioinformatics Institute
facilitySingapore, Singapore
Research output, citation impact, and the most-cited recent papers from Bioinformatics Institute (Singapore). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Bioinformatics Institute
Polymerase chain reaction (PCR) is a basic molecular biology technique with a multiplicity of uses, including deoxyribonucleic acid cloning and sequencing, functional analysis of genes, diagnosis of diseases, genotyping and discovery of genetic variants. Reliable primer design is crucial for successful PCR, and for over a decade, the open-source Primer3 software has been widely used for primer design, often in high-throughput genomics applications. It has also been incorporated into numerous publicly available software packages and web services. During this period, we have greatly expanded Primer3's functionality. In this article, we describe Primer3's current capabilities, emphasizing recent improvements. The most notable enhancements incorporate more accurate thermodynamic models in the primer design process, both to improve melting temperature prediction and to reduce the likelihood that primers will form hairpins or dimers. Additional enhancements include more precise control of primer placement-a change motivated partly by opportunities to use whole-genome sequences to improve primer specificity. We also added features to increase ease of use, including the ability to save and re-use parameter settings and the ability to require that individual primers not be used in more than one primer pair. We have made the core code more modular and provided cleaner programming interfaces to further ease integration with other software. These improvements position Primer3 for continued use with genome-scale data in the decade ahead.
There are currently few therapeutic options for patients with pancreatic cancer, and new insights into the pathogenesis of this lethal disease are urgently needed. Toward this end, we performed a comprehensive genetic analysis of 24 pancreatic cancers. We first determined the sequences of 23,219 transcripts, representing 20,661 protein-coding genes, in these samples. Then, we searched for homozygous deletions and amplifications in the tumor DNA by using microarrays containing probes for approximately 10(6) single-nucleotide polymorphisms. We found that pancreatic cancers contain an average of 63 genetic alterations, the majority of which are point mutations. These alterations defined a core set of 12 cellular signaling pathways and processes that were each genetically altered in 67 to 100% of the tumors. Analysis of these tumors' transcriptomes with next-generation sequencing-by-synthesis technologies provided independent evidence for the importance of these pathways and processes. Our data indicate that genetically altered core pathways and regulatory processes only become evident once the coding regions of the genome are analyzed in depth. Dysregulation of these core pathways and processes through mutation can explain the major features of pancreatic tumorigenesis.
The assembly of long reads from Pacific Biosciences and Oxford Nanopore Technologies typically requires resource-intensive error-correction and consensus-generation steps to obtain high-quality assemblies. We show that the error-correction step can be omitted and that high-quality consensus sequences can be generated efficiently with a SIMD-accelerated, partial-order alignment-based, stand-alone consensus module called Racon. Based on tests with PacBio and Oxford Nanopore data sets, we show that Racon coupled with miniasm enables consensus genomes with similar or better quality than state-of-the-art methods while being an order of magnitude faster.
Secreted membrane-enclosed vesicles, collectively called extracellular vesicles (EVs), which include exosomes, ectosomes, microvesicles, microparticles, apoptotic bodies and other EV subsets, encompass a very rapidly growing scientific field in biology and medicine. Importantly, it is currently technically challenging to obtain a totally pure EV fraction free from non-vesicular components for functional studies, and therefore there is a need to establish guidelines for analyses of these vesicles and reporting of scientific studies on EV biology. Here, the International Society for Extracellular Vesicles (ISEV) provides researchers with a minimal set of biochemical, biophysical and functional standards that should be used to attribute any specific biological cargo or functions to EVs.
The Sorting Intolerant from Tolerant (SIFT) algorithm predicts the effect of coding variants on protein function. It was first introduced in 2001, with a corresponding website that provides users with predictions on their variants. Since its release, SIFT has become one of the standard tools for characterizing missense variation. We have updated SIFT's genome-wide prediction tool since our last publication in 2009, and added new features to the insertion/deletion (indel) tool. We also show accuracy metrics on independent data sets. The original developers have hosted the SIFT web server at FHCRC, JCVI and the web server is currently located at BII. The URL is http://sift-dna.org (24 May 2012, date last accessed).
GISAID is a global data science initiative and the primary source of genomic and associated metadata of all influenza viruses, Respiratory Syncytial Virus (RSV) and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the pandemic coronavirus causing coronavirus disease 2019 (COVID-19). GISAID's publicly accessible data sharing platform enables collaboration of over 42,000 participating researchers from 198 nations and data generators from over 3,500 institutions across the globe. Since the first wholegenome sequences were made available by China CDC through GISAID on January 10, 2020, over 5 million genetic sequences of SARS-CoV-2 from 194 countries and territories have been made publicly available through GISAID's EpiCoV database as of November 9, 2021. This high-quality, curated data enabled the rapid development of diagnostic and prophylactic measures against SARS-CoV-2 including the first diagnostic tests and the first vaccines to combat COVID-19 as well as continuous monitoring of emerging variants in near real-time.
New official nomenclature subdivides human monocytes into 3 subsets: the classical (CD14(++)CD16(-)), intermediate (CD14(++)CD16(+)), and nonclassical (CD14(+)CD16(++)) monocytes. This introduces new challenges, as monocyte heterogeneity is mostly understood based on 2 subsets, the CD16(-) and CD16(+) monocytes. Here, we comprehensively defined the 3 circulating human monocyte subsets using microarray, flow cytometry, and cytokine production analysis. We find that intermediate monocytes expressed a large majority (87%) of genes and surface proteins at levels between classical and nonclassical monocytes. This establishes their intermediary nature at the molecular level. We unveil the close relationship between the intermediate and nonclassic monocytes, along with features that separate them. Intermediate monocytes expressed highest levels of major histocompatibility complex class II, GFRα2 and CLEC10A, whereas nonclassic monocytes were distinguished by cytoskeleton rearrangement genes, inflammatory cytokine production, and CD294 and Siglec10 surface expression. In addition, we identify new features for classic monocytes, including AP-1 transcription factor genes, CLEC4D and IL-13Rα1 surface expression. We also find circumstantial evidence supporting the developmental relationship between the 3 subsets, including gradual changes in maturation genes and surface markers. By comprehensively defining the 3 monocyte subsets during healthy conditions, we facilitate target identification and detailed analyses of aberrations that may occur to monocyte subsets during diseases.
The chromosomal position of human genes is rapidly being established. We integrated these mapping data with genome-wide messenger RNA expression profiles as provided by SAGE (serial analysis of gene expression). Over 2.45 million SAGE transcript tags, including 160,000 tags of neuroblastomas, are presently known for 12 tissue types. We developed algorithms to assign these tags to UniGene clusters and their chromosomal position. The resulting Human Transcriptome Map generates gene expression profiles for any chromosomal region in 12 normal and pathologic tissue types. The map reveals a clustering of highly expressed genes to specific chromosomal regions. It provides a tool to search for genes that are overexpressed or silenced in cancer.
Cigarette smoke is the major cause of lung cancer, the leading cause of cancer death, and of chronic obstructive pulmonary disease, the fourth leading cause of death in the United States. Using high-density gene expression arrays, we describe genes that are normally expressed in a subset of human airway epithelial cells obtained at bronchoscopy (the airway transcriptome), define how cigarette smoking alters the transcriptome, and detail the effects of variables, such as cumulative exposure, age, sex, and race, on cigarette smoke-induced changes in gene expression. We also determine which changes in gene expression are and are not reversible when smoking is discontinued. The persistent altered expression of a subset of genes in former smokers may explain the risk these individuals have for developing lung cancer long after they have discontinued smoking. The use of gene expression profiling to explore the normal biology of a specific subset of cells within a complex organ across a broad spectrum of healthy individuals and to define the reversible and irreversible genetic effects of cigarette smoke on human airway epithelial cells has not been previously reported.
Antimicrobial peptides (AMPs) are promising next generation antibiotics that hold great potential for combating bacterial resistance. AMPs can be both bacteriostatic and bactericidal, induce rapid killing and display a lower propensity to develop resistance than do conventional antibiotics. Despite significant progress in the past 30 years, no peptide antibiotic has reached the clinic yet. Poor understanding of the action mechanisms and lack of rational design principles have been the two major obstacles that have slowed progress. Technological developments are now enabling multidisciplinary approaches including molecular dynamics simulations combined with biophysics and microbiology toward providing valuable insights into the interactions of AMPs with membranes at atomic level. This has led to increasingly robust models of the mechanisms of action of AMPs and has begun to contribute meaningfully toward the discovery of new AMPs. This review discusses the detailed action mechanisms that have been put forward, with detailed atomistic insights into how the AMPs interact with bacterial membranes. The review further discusses how this knowledge is exploited toward developing design principles for novel AMPs. Finally, the current status, associated challenges, and future directions for the development of AMP therapeutics are discussed.
The to date largest comparative study of nine state-of-the-art drug target prediction methods finds that deep learning outperforms all other competitors. The results are based on a benchmark of 1300 assays and half a million compounds.
We compared the transcriptomes of marrow-derived mesenchymal stem cells (MSCs) with differentiated adipocytes, osteocytes, and chondrocytes derived from these MSCs. Using global gene-expression profiling arrays to detect RNA transcripts, we have identified markers that are specific for MSCs and their differentiated progeny. Further, we have also identified pathways that MSCs use to differentiate into adipogenic, chondrogenic, and osteogenic lineages. We identified activin-mediated transforming growth factor (TGF)-beta signaling, platelet-derived growth factor (PDGF) signaling and fibroblast growth factor (FGF) signaling as the key pathways involved in MSC differentiation. The differentiation of MSCs into these lineages is affected when these pathways are perturbed by inhibitors of cell surface receptor function. Since growth and differentiation are tightly linked processes, we also examined the importance of these 3 pathways in MSC growth. These 3 pathways were necessary and sufficient for MSC growth. Inhibiting any of these pathways slowed MSC growth, whereas a combination of TGF-beta, PDGF, and beta-FGF was sufficient to grow MSCs in a serum-free medium up to 5 passages. Thus, this study illustrates it is possible to predict signaling pathways active in cellular differentiation and growth using microarray data and experimentally verify these predictions.
Abstract Network pharmacology (NP) provides a new methodological perspective for understanding traditional medicine from a holistic perspective, giving rise to frontiers such as traditional Chinese medicine network pharmacology (TCM-NP). With the development of artificial intelligence (AI) technology, it is key for NP to develop network-based AI methods to reveal the treatment mechanism of complex diseases from massive omics data. In this review, focusing on the TCM-NP, we summarize involved AI methods into three categories: network relationship mining, network target positioning and network target navigating, and present the typical application of TCM-NP in uncovering biological basis and clinical value of Cold/Hot syndromes. Collectively, our review provides researchers with an innovative overview of the methodological progress of NP and its application in TCM from the AI perspective.
Hepatocellular carcinoma (HCC) is one of the leading cancers worldwide. Classically, HCC develops in genetically susceptible individuals who are exposed to risk factors, especially in the presence of liver cirrhosis. Significant temporal and geographic variations exist for HCC and its etiologies. Over time, the burden of HCC has shifted from the low–moderate to the high sociodemographic index regions, reflecting the transition from viral to nonviral causes. Geographically, the hepatitis viruses predominate as the causes of HCC in Asia and Africa. Although there are genetic conditions that confer increased risk for HCC, these diagnoses are rarely recognized outside North America and Europe. In this review, we will evaluate the epidemiologic trends and risk factors of HCC, and discuss the genetics of HCC, including monogenic diseases, single-nucleotide polymorphisms, gut microbiome, and somatic mutations. Hepatocellular carcinoma (HCC) is one of the leading cancers worldwide. Classically, HCC develops in genetically susceptible individuals who are exposed to risk factors, especially in the presence of liver cirrhosis. Significant temporal and geographic variations exist for HCC and its etiologies. Over time, the burden of HCC has shifted from the low–moderate to the high sociodemographic index regions, reflecting the transition from viral to nonviral causes. Geographically, the hepatitis viruses predominate as the causes of HCC in Asia and Africa. Although there are genetic conditions that confer increased risk for HCC, these diagnoses are rarely recognized outside North America and Europe. In this review, we will evaluate the epidemiologic trends and risk factors of HCC, and discuss the genetics of HCC, including monogenic diseases, single-nucleotide polymorphisms, gut microbiome, and somatic mutations.
The protooncogene MYC encodes the c-Myc transcription factor that regulates cell growth, cell proliferation, cell cycle, and apoptosis. Although deregulation of MYC contributes to tumorigenesis, it is still unclear what direct Myc-induced transcriptomes promote cell transformation. Here we provide a snapshot of genome-wide, unbiased characterization of direct Myc binding targets in a model of human B lymphoid tumor using ChIP coupled with pair-end ditag sequencing analysis (ChIP-PET). Myc potentially occupies > 4,000 genomic loci with the majority near proximal promoter regions associated frequently with CpG islands. Using gene expression profiles with ChIP-PET, we identified 668 direct Myc-regulated gene targets, including 48 transcription factors, indicating that Myc is a central transcriptional hub in growth and proliferation control. This first global genomic view of Myc binding sites yields insights of transcriptional circuitries and cis regulatory modules involving Myc and provides a substantial framework for our understanding of mechanisms of Myc-induced tumorigenesis.
IL-19, IL-20, IL-22, IL-24, and IL-26 are members of the IL-10 family of cytokines that have been shown to be up-regulated in psoriatic skin. Contrary to IL-10, these cytokines signal using receptor complex R1 subunits that are preferentially expressed on cells of epithelial origin; thus, we henceforth refer to them as the IL-20 subfamily cytokines. In this study, we show that primary human keratinocytes (KCs) express receptors for these cytokines and that IL-19, IL-20, IL-22, and IL-24 induce acanthosis in reconstituted human epidermis (RHE) in a dose-dependent manner. These cytokines also induce expression of the psoriasis-associated protein S100A7 and keratin 16 in RHE and cause persistent activation of Stat3 with nuclear localization. IL-22 had the most pronounced effects on KC proliferation and on the differentiation of KCs in RHE, inducing a decrease in the granular cell layer (hypogranulosis). Furthermore, gene expression analysis performed on cultured RHE treated with these cytokines showed that IL-19, IL-20, IL-22, and IL-24 regulate many of these same genes to variable degrees, inducing a gene expression profile consistent with inflammatory responses, wound healing re-epithelialization, and altered differentiation. Many of these genes have also been found to be up-regulated in psoriatic skin, including several chemokines, beta-defensins, S100 family proteins, and kallikreins. These results confirm that IL-20 subfamily cytokines are important regulators of epidermal KC biology with potentially pivotal roles in the immunopathology of psoriasis.
Abstract Summary: ClustalW is a tool for aligning multiple protein or nucleotide sequences. The alignment is achieved via three steps: pairwise alignment, guide-tree generation and progressive alignment. ClustalW-MPI is a distributed and parallel implementation of ClustalW. All three steps have been parallelized to reduce the execution time. The software uses a message-passing library called MPI (Message Passing Interface) and runs on distributed workstation clusters as well as on traditional parallel computers. Availability:The source codes are written in ISO C and are available at http://www.bii.a-star.edu.sg/software/clustalw-mpi/. An open source implementations of MPI http://www-unix.mcs.anl.gov/mpi/. Contact: kuobin@bii.a-star.edu.sg
Dinoflagellates are important components of marine ecosystems and essential coral symbionts, yet little is known about their genomes. We report here on the analysis of a high-quality assembly from the 1180-megabase genome of Symbiodinium kawagutii. We annotated protein-coding genes and identified Symbiodinium-specific gene families. No whole-genome duplication was observed, but instead we found active (retro)transposition and gene family expansion, especially in processes important for successful symbiosis with corals. We also documented genes potentially governing sexual reproduction and cyst formation, novel promoter elements, and a microRNA system potentially regulating gene expression in both symbiont and coral. We found biochemical complementarity between genomes of S. kawagutii and the anthozoan Acropora, indicative of host-symbiont coevolution, providing a resource for studying the molecular basis and evolution of coral symbiosis.
Realizing the democratic promise of nanopore sequencing requires the development of new bioinformatics approaches to deal with its specific error characteristics. Here we present GraphMap, a mapping algorithm designed to analyse nanopore sequencing reads, which progressively refines candidate alignments to robustly handle potentially high-error rates and a fast graph traversal to align long reads with speed and high precision (>95%). Evaluation on MinION sequencing data sets against short- and long-read mappers indicates that GraphMap increases mapping sensitivity by 10-80% and maps >95% of bases. GraphMap alignments enabled single-nucleotide variant calling on the human genome with increased sensitivity (15%) over the next best mapper, precise detection of structural variants from length 100 bp to 4 kbp, and species and strain-specific identification of pathogens using MinION reads. GraphMap is available open source under the MIT license at https://github.com/isovic/graphmap.
BACKGROUND: The impact of SARS-CoV-2 variants of concern (VOCs) on disease severity is unclear. In this retrospective study, we compared the outcomes of patients infected with B.1.1.7, B.1.351, and B.1.617.2 with wild-type strains from early 2020. METHODS: National surveillance data from January to May 2021 were obtained and outcomes in relation to VOCs were explored. Detailed patient-level data from all patients with VOC infection admitted to our center between December 2020 and May 2021 were analyzed. Clinical outcomes were compared with a cohort of 846 patients admitted from January to April 2020. RESULTS: A total of 829 patients in Singapore in the study period were infected with these 3 VOCs. After adjusting for age and sex, B.1.617.2 was associated with higher odds of oxygen requirement, intensive care unit admission, or death (adjusted odds ratio [aOR], 4.90; 95% confidence interval [CI]: 1.43-30.78). Of these patients, 157 were admitted to our center. After adjusting for age, sex, comorbidities, and vaccination, the aOR for pneumonia with B.1.617.2 was 1.88 (95% CI: .95-3.76) compared with wild-type. These differences were not seen with B.1.1.7 and B.1.351. Vaccination status was associated with decreased severity. B.1.617.2 was associated with significantly lower polymerase chain reaction cycle threshold (Ct) values and longer duration of Ct value ≤30 (median duration 18 days for B.1.617.2, 13 days for wild-type). CONCLUSIONS: B.1.617.2 was associated with increased severity of illness, and with lower Ct values and longer viral shedding. These findings provide impetus for the rapid implementation of vaccination programs.