Institute of Bioinformatics and Systems Biology
facilityOberschleißheim, Germany
Research output, citation impact, and the most-cited recent papers from Institute of Bioinformatics and Systems Biology (Germany). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Institute of Bioinformatics and Systems Biology
This paper reports the genome sequence of domesticated tomato, a major crop plant, and a draft sequence for its closest wild relative; comparative genomics reveal very little divergence between the two genomes but some important differences with the potato genome, another important food crop in the genus Solanum. Tomato (Solanum lycopersicum) is a major crop plant and a model system for fruit development. Solanum is one of the largest angiosperm genera1 and includes annual and perennial plants from diverse habitats. Here we present a high-quality genome sequence of domesticated tomato, a draft sequence of its closest wild relative, Solanum pimpinellifolium2, and compare them to each other and to the potato genome (Solanum tuberosum). The two tomato genomes show only 0.6% nucleotide divergence and signs of recent admixture, but show more than 8% divergence from potato, with nine large and several smaller inversions. In contrast to Arabidopsis, but similar to soybean, tomato and potato small RNAs map predominantly to gene-rich chromosomal regions, including gene promoters. The Solanum lineage has experienced two consecutive genome triplications: one that is ancient and shared with rosids, and a more recent one. These triplications set the stage for the neofunctionalization of genes controlling fruit characteristics, such as colour and fleshiness.
eggNOG is a public resource that provides Orthologous Groups (OGs) of proteins at different taxonomic levels, each with integrated and summarized functional annotations. Developments since the latest public release include changes to the algorithm for creating OGs across taxonomic levels, making nested groups hierarchically consistent. This allows for a better propagation of functional terms across nested OGs and led to the novel annotation of 95 890 previously uncharacterized OGs, increasing overall annotation coverage from 67% to 72%. The functional annotations of OGs have been expanded to also provide Gene Ontology terms, KEGG pathways and SMART/Pfam domains for each group. Moreover, eggNOG now provides pairwise orthology relationships within OGs based on analysis of phylogenetic trees. We have also incorporated a framework for quickly mapping novel sequences to OGs based on precomputed HMM profiles. Finally, eggNOG version 4.5 incorporates a novel data set spanning 2605 viral OGs, covering 5228 proteins from 352 viral proteomes. All data are accessible for bulk downloading, as a web-service, and through a completely redesigned web interface. The new access points provide faster searches and a number of new browsing and visualization capabilities, facilitating the needs of both experts and less experienced users. eggNOG v4.5 is available at http://eggnog.embl.de.
The field of microbiome research has evolved rapidly over the past few decades and has become a topic of great scientific and public interest. As a result of this rapid growth in interest covering different fields, we are lacking a clear commonly agreed definition of the term "microbiome." Moreover, a consensus on best practices in microbiome research is missing. Recently, a panel of international experts discussed the current gaps in the frame of the European-funded MicrobiomeSupport project. The meeting brought together about 40 leaders from diverse microbiome areas, while more than a hundred experts from all over the world took part in an online survey accompanying the workshop. This article excerpts the outcomes of the workshop and the corresponding online survey embedded in a short historical introduction and future outlook. We propose a definition of microbiome based on the compact, clear, and comprehensive description of the term provided by Whipps et al. in 1988, amended with a set of novel recommendations considering the latest technological developments and research findings. We clearly separate the terms microbiome and microbiota and provide a comprehensive discussion considering the composition of microbiota, the heterogeneity and dynamics of microbiomes in time and space, the stability and resilience of microbial networks, the definition of core microbiomes, and functionally relevant keystone species as well as co-evolutionary principles of microbe-host and inter-species interactions within the microbiome. These broad definitions together with the suggested unifying concepts will help to improve standardization of microbiome studies in the future, and could be the starting point for an integrated assessment of data resulting in a more rapid transfer of knowledge from basic science into practice. Furthermore, microbiome standards are important for solving new challenges associated with anthropogenic-driven changes in the field of planetary health, for which the understanding of microbiomes might play a key role. Video Abstract.
Radiofrequency electromagnetic fields (EMFs) are used to enable a number of modern devices, including mobile telecommunications infrastructure and phones, Wi-Fi, and Bluetooth. As radiofrequency EMFs at sufficiently high power levels can adversely affect health, ICNIRP published Guidelines in 1998 for human exposure to time-varying EMFs up to 300 GHz, which included the radiofrequency EMF spectrum. Since that time, there has been a considerable body of science further addressing the relation between radiofrequency EMFs and adverse health outcomes, as well as significant developments in the technologies that use radiofrequency EMFs. Accordingly, ICNIRP has updated the radiofrequency EMF part of the 1998 Guidelines. This document presents these revised Guidelines, which provide protection for humans from exposure to EMFs from 100 kHz to 300 GHz.
We report the draft genome sequence of the model moss Physcomitrella patens and compare its features with those of flowering plants, from which it is separated by more than 400 million years, and unicellular aquatic algae. This comparison reveals genomic changes concomitant with the evolutionary movement to land, including a general increase in gene family complexity; loss of genes associated with aquatic environments (e.g., flagellar arms); acquisition of genes for tolerating terrestrial stresses (e.g., variation in temperature and water availability); and the development of the auxin and abscisic acid signaling pathways for coordinating multicellular growth and dehydration response. The Physcomitrella genome provides a resource for phylogenetic inferences about gene function and for experimental analysis of plant processes through this plant's unique facility for reverse genetics.
MicroRNAs (miRNAs), i.e. small non-coding RNA molecules (∼22 nt), can bind to one or more target sites on a gene transcript to negatively regulate protein expression, subsequently controlling many cellular mechanisms. A current and curated collection of miRNA-target interactions (MTIs) with experimental support is essential to thoroughly elucidating miRNA functions under different conditions and in different species. As a database, miRTarBase has accumulated more than 3500 MTIs by manually surveying pertinent literature after data mining of the text systematically to filter research articles related to functional studies of miRNAs. Generally, the collected MTIs are validated experimentally by reporter assays, western blot, or microarray experiments with overexpression or knockdown of miRNAs. miRTarBase curates 3576 experimentally verified MTIs between 657 miRNAs and 2297 target genes among 17 species. miRTarBase contains the largest amount of validated MTIs by comparing with other similar, previously developed databases. The MTIs collected in the miRTarBase can also provide a large amount of positive samples to develop computational methods capable of identifying miRNA-target interactions. miRTarBase is now available on http://miRTarBase.mbc.nctu.edu.tw/, and is updated frequently by continuously surveying research articles.
CORUM is a database that provides a manually curated repository of experimentally characterized protein complexes from mammalian organisms, mainly human (64%), mouse (16%) and rat (12%). Protein complexes are key molecular entities that integrate multiple gene products to perform cellular functions. The new CORUM 2.0 release encompasses 2837 protein complexes offering the largest and most comprehensive publicly available dataset of mammalian protein complexes. The CORUM dataset is built from 3198 different genes, representing approximately 16% of the protein coding genes in humans. Each protein complex is described by a protein complex name, subunit composition, function as well as the literature reference that characterizes the respective protein complex. Recent developments include mapping of functional annotation to Gene Ontology terms as well as cross-references to Entrez Gene identifiers. In addition, a 'Phylogenetic Conservation' analysis tool was implemented that analyses the potential occurrence of orthologous protein complex subunits in mammals and other selected groups of organisms. This allows one to predict the occurrence of protein complexes in different phylogenetic groups. CORUM is freely accessible at (http://mips.helmholtz-muenchen.de/genre/proj/corum/index.html).
Receptor-like kinases (RLKs) belong to the large RLK/Pelle gene family, and it is known that the Arabidopsis thaliana genome contains >600 such members, which play important roles in plant growth, development, and defense responses. Surprisingly, we found that rice (Oryza sativa) has nearly twice as many RLK/Pelle members as Arabidopsis does, and it is not simply a consequence of a larger predicted gene number in rice. From the inferred phylogeny of all Arabidopsis and rice RLK/Pelle members, we estimated that the common ancestor of Arabidopsis and rice had >440 RLK/Pelles and that large-scale expansions of certain RLK/Pelle members and fusions of novel domains have occurred in both the Arabidopsis and rice lineages since their divergence. In addition, the extracellular domains have higher nonsynonymous substitution rates than the intracellular domains, consistent with the role of extracellular domains in sensing diverse signals. The lineage-specific expansions in Arabidopsis can be attributed to both tandem and large-scale duplications, whereas tandem duplication seems to be the major mechanism for recent expansions in rice. Interestingly, although the RLKs that are involved in development seem to have rarely been duplicated after the Arabidopsis-rice split, those that are involved in defense/disease resistance apparently have undergone many duplication events. These findings led us to hypothesize that most of the recent expansions of the RLK/Pelle family have involved defense/resistance-related genes.
Human heat-shock protein (HSP)70 activates innate immune cells and hence requires no additional adjuvants to render bound peptides immunogenic. Here we tested the assumption that endogenous HSP70 activates the Toll/IL-1 receptor signal pathway similar to HSP60 and pathogen-derived molecular patterns. We show that HSP70 induces interleukin-12 (IL-12) and endothelial cell-leukocyte adhesion molecule-1 (ELAM-1) promoters in macrophages and that this is controlled by MyD88 and TRAF6. Furthermore, HSP70 causes MyD88 relocalization and MyD88-deficient dendritic cells do not respond to HSP70 with proinflammatory cytokine production. Using the system of genetic complementation with Toll-like receptors (TLR) we found that TLR2 and TLR4 confer responsiveness to HSP70 in 293T fibroblasts. The expanding list of endogenous ligands able to activate the ancient Toll/IL-1 receptor signal pathway is in line with the "danger hypothesis" proposing that the innate immune system senses danger signals even if they originate from self.
We sequenced and annotated the genome of the filamentous fungus Fusarium graminearum, a major pathogen of cultivated cereals. Very few repetitive sequences were detected, and the process of repeat-induced point mutation, in which duplicated sequences are subject to extensive mutation, may partially account for the reduced repeat content and apparent low number of paralogous (ancestrally duplicated) genes. A second strain of F. graminearum contained more than 10,000 single-nucleotide polymorphisms, which were frequently located near telomeres and within other discrete chromosomal segments. Many highly polymorphic regions contained sets of genes implicated in plant-fungus interactions and were unusually divergent, with higher rates of recombination. These regions of genome innovation may result from selection due to interactions of F. graminearum with its plant hosts.
Genome-wide association studies (GWAS) with intermediate phenotypes, like changes in metabolite and protein levels, provide functional evidence to map disease associations and translate them into clinical applications. However, although hundreds of genetic variants have been associated with complex disorders, the underlying molecular pathways often remain elusive. Associations with intermediate traits are key in establishing functional links between GWAS-identified risk-variants and disease end points. Here we describe a GWAS using a highly multiplexed aptamer-based affinity proteomics platform. We quantify 539 associations between protein levels and gene variants (pQTLs) in a German cohort and replicate over half of them in an Arab and Asian cohort. Fifty-five of the replicated pQTLs are located in trans. Our associations overlap with 57 genetic risk loci for 42 unique disease end points. We integrate this information into a genome-proteome network and provide an interactive web-tool for interrogations. Our results provide a basis for novel approaches to pharmaceutical and diagnostic applications.
MicroRNA-122 (miR-122), which accounts for 70% of the liver's total miRNAs, plays a pivotal role in the liver. However, its intrinsic physiological roles remain largely undetermined. We demonstrated that mice lacking the gene encoding miR-122a (Mir122a) are viable but develop temporally controlled steatohepatitis, fibrosis, and hepatocellular carcinoma (HCC). These mice exhibited a striking disparity in HCC incidence based on sex, with a male-to-female ratio of 3.9:1, which recapitulates the disease incidence in humans. Impaired expression of microsomal triglyceride transfer protein (MTTP) contributed to steatosis, which was reversed by in vivo restoration of Mttp expression. We found that hepatic fibrosis onset can be partially attributed to the action of a miR-122a target, the Klf6 transcript. In addition, Mir122a(-/-) livers exhibited disruptions in a range of pathways, many of which closely resemble the disruptions found in human HCC. Importantly, the reexpression of miR-122a reduced disease manifestation and tumor incidence in Mir122a(-/-) mice. This study demonstrates that mice with a targeted deletion of the Mir122a gene possess several key phenotypes of human liver diseases, which provides a rationale for the development of a unique therapy for the treatment of chronic liver disease and HCC.
The calcium-dependent homophilic cell adhesion molecule and candidate suppressor gene, E (epithelial)-cadherin, plays a major role in the organization and integrity of most epithelial tissues. Diffusely growing gastric carcinomas show markedly reduced homophilic cell-to-cell interactions. We speculated that mutations in the E-cadherin gene may be responsible for the scattered phenotype of this type of carcinoma. For that reason we have examined E-cadherin in 26 diffuse type, 20 intestinal type and 7 mixed gastric carcinomas (Laurén's classification) at the DNA, RNA, and protein levels. Reverse transcription polymerase chain reaction and direct sequencing of amplified E-cadherin complementary DNA fragments revealed inframe skipping of either exon 8 or exon 9 in 10 patients with diffuse tumors and an exon 9 deletion in one patient with a mixed carcinoma; both exons encode putative calcium binding domains. These alterations were not seen in nontumorous gastric tissues. Splice site mutations responsible for the exon deletions were identified in six of these patients, eliminating the possibility of alternative splicing mechanisms. Five of these splice site alterations were confirmed as somatic mutations. Non-splice site mutations were observed in three diffuse type tumors, namely a 69-base pair deletion of exon 10 and two point mutations, one of which destroys a putative calcium binding region. Immunohistochemical evaluation showed E-cadherin immunoreactivity in tumors and lymph node metastases of patients expressing abnormal mRNA. The allelic status of the E-cadherin gene was analyzed in one patient, revealing loss of heterozygosity with retention of a mutated E-cadherin allele. Overall, E-cadherin mutations were identified in 50% (13 of 26) of the diffuse type and in 14% (1 of 7) of the mixed carcinomas. In contrast, two silent E-cadherin mutations (not changing the amino acid sequence) were detected in two tumors of the intestinal type. Our study provides strong in vivo evidence that E-cadherin gene mutations may contribute to the development of diffusely growing gastric carcinomas and support a tumor/metastasis suppressor gene hypothesis.
The rapidly evolving field of metabolomics aims at a comprehensive measurement of ideally all endogenous metabolites in a cell or body fluid. It thereby provides a functional readout of the physiological state of the human body. Genetic variants that associate with changes in the homeostasis of key lipids, carbohydrates, or amino acids are not only expected to display much larger effect sizes due to their direct involvement in metabolite conversion modification, but should also provide access to the biochemical context of such variations, in particular when enzyme coding genes are concerned. To test this hypothesis, we conducted what is, to the best of our knowledge, the first GWA study with metabolomics based on the quantitative measurement of 363 metabolites in serum of 284 male participants of the KORA study. We found associations of frequent single nucleotide polymorphisms (SNPs) with considerable differences in the metabolic homeostasis of the human body, explaining up to 12% of the observed variance. Using ratios of certain metabolite concentrations as a proxy for enzymatic activity, up to 28% of the variance can be explained (p-values 10 216 to 10 221 ). We identified four genetic variants in genes coding for enzymes (FADS1, LIPC, SCAD, MCAD) where the corresponding metabolic phenotype (metabotype) clearly matches the biochemical pathways in which these enzymes are active. Our results suggest that common genetic polymorphisms induce major differentiations in the metabolic make-up of the human population. This may lead to a novel approach to personalized health care based on a combination of genotyping and metabolic characterization. These genetically determined metabotypes may subscribe the risk for a certain medical phenotype, the response to a given drug treatment, or the reaction to a nutritional intervention or environmental challenge.
Sequencing and analysing the diploid genome and transcriptome of Aegilops tauschii provide new insights into the role of this genome in enabling the adaptation of bread wheat and are a step towards understanding the very large and complicated hexaploid genomes of wheat species. The hexaploid genome of bread wheat Triticum aestivum, designated AABBDD, evolved as a result of hybridization between three ancestral grasses. Two papers published in the issue of Nature present genome sequences and analysis of two of these wheat progenitors. First, the genome sequence of the diploid wild wheat T. urartu (ancestor of the A genome), which resembles cultivated wheat more strongly than either Aegilops speltoides (the B ancestor) or Ae. tauschii (the D donor). And second, the Ae. tauschii genome, together with an analysis of its transcriptome. These genomes and their analyses will be powerful tools for the study of complex, polyploid wheat genomes and a valuable resource for genetic improvement of wheat. About 8,000 years ago in the Fertile Crescent, a spontaneous hybridization of the wild diploid grass Aegilops tauschii (2n = 14; DD) with the cultivated tetraploid wheat Triticum turgidum (2n = 4x = 28; AABB) resulted in hexaploid wheat (T. aestivum; 2n = 6x = 42; AABBDD)1,2. Wheat has since become a primary staple crop worldwide as a result of its enhanced adaptability to a wide range of climates and improved grain quality for the production of baker’s flour2. Here we describe sequencing the Ae. tauschii genome and obtaining a roughly 90-fold depth of short reads from libraries with various insert sizes, to gain a better understanding of this genetically complex plant. The assembled scaffolds represented 83.4% of the genome, of which 65.9% comprised transposable elements. We generated comprehensive RNA-Seq data and used it to identify 43,150 protein-coding genes, of which 30,697 (71.1%) were uniquely anchored to chromosomes with an integrated high-density genetic map. Whole-genome analysis revealed gene family expansion in Ae. tauschii of agronomically relevant gene families that were associated with disease resistance, abiotic stress tolerance and grain quality. This draft genome sequence provides insight into the environmental adaptation of bread wheat and can aid in defining the large and complicated genomes of wheat species.
Type 2 diabetes (T2D) can be prevented in pre-diabetic individuals with impaired glucose tolerance (IGT). Here, we have used a metabolomics approach to identify candidate biomarkers of pre-diabetes. We quantified 140 metabolites for 4297 fasting serum samples in the population-based Cooperative Health Research in the Region of Augsburg (KORA) cohort. Our study revealed significant metabolic variation in pre-diabetic individuals that are distinct from known diabetes risk indicators, such as glycosylated hemoglobin levels, fasting glucose and insulin. We identified three metabolites (glycine, lysophosphatidylcholine (LPC) (18:2) and acetylcarnitine) that had significantly altered levels in IGT individuals as compared to those with normal glucose tolerance, with P-values ranging from 2.4×10(-4) to 2.1×10(-13). Lower levels of glycine and LPC were found to be predictors not only for IGT but also for T2D, and were independently confirmed in the European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam cohort. Using metabolite-protein network analysis, we identified seven T2D-related genes that are associated with these three IGT-specific metabolites by multiple interactions with four enzymes. The expression levels of these enzymes correlate with changes in the metabolite concentrations linked to diabetes. Our results may help developing novel strategies to prevent T2D.
INTRODUCTION: Increasing evidence suggests a role for the gut microbiome in central nervous system disorders and a specific role for the gut-brain axis in neurodegeneration. Bile acids (BAs), products of cholesterol metabolism and clearance, are produced in the liver and are further metabolized by gut bacteria. They have major regulatory and signaling functions and seem dysregulated in Alzheimer's disease (AD). METHODS: Serum levels of 15 primary and secondary BAs and their conjugated forms were measured in 1464 subjects including 370 cognitively normal older adults, 284 with early mild cognitive impairment, 505 with late mild cognitive impairment, and 305 AD cases enrolled in the AD Neuroimaging Initiative. We assessed associations of BA profiles including selected ratios with diagnosis, cognition, and AD-related genetic variants, adjusting for confounders and multiple testing. RESULTS: In AD compared to cognitively normal older adults, we observed significantly lower serum concentrations of a primary BA (cholic acid [CA]) and increased levels of the bacterially produced, secondary BA, deoxycholic acid, and its glycine and taurine conjugated forms. An increased ratio of deoxycholic acid:CA, which reflects 7α-dehydroxylation of CA by gut bacteria, strongly associated with cognitive decline, a finding replicated in serum and brain samples in the Rush Religious Orders and Memory and Aging Project. Several genetic variants in immune response-related genes implicated in AD showed associations with BA profiles. DISCUSSION: We report for the first time an association between altered BA profile, genetic variants implicated in AD, and cognitive changes in disease using a large multicenter study. These findings warrant further investigation of gut dysbiosis and possible role of gut-liver-brain axis in the pathogenesis of AD.
Picoeukaryotes are a taxonomically diverse group of organisms less than 2 micrometers in diameter. Photosynthetic marine picoeukaryotes in the genus Micromonas thrive in ecosystems ranging from tropical to polar and could serve as sentinel organisms for biogeochemical fluxes of modern oceans during climate change. These broadly distributed primary producers belong to an anciently diverged sister clade to land plants. Although Micromonas isolates have high 18S ribosomal RNA gene identity, we found that genomes from two isolates shared only 90% of their predicted genes. Their independent evolutionary paths were emphasized by distinct riboswitch arrangements as well as the discovery of intronic repeat elements in one isolate, and in metagenomic data, but not in other genomes. Divergence appears to have been facilitated by selection and acquisition processes that actively shape the repertoire of genes that are mutually exclusive between the two isolates differently than the core genes. Analyses of the Micromonas genomes offer valuable insights into ecological differentiation and the dynamic nature of early plant evolution.
We produced a reference sequence of the 1-gigabase chromosome 3B of hexaploid bread wheat. By sequencing 8452 bacterial artificial chromosomes in pools, we assembled a sequence of 774 megabases carrying 5326 protein-coding genes, 1938 pseudogenes, and 85% of transposable elements. The distribution of structural and functional features along the chromosome revealed partitioning correlated with meiotic recombination. Comparative analyses indicated high wheat-specific inter- and intrachromosomal gene duplication activities that are potential sources of variability for adaption. In addition to providing a better understanding of the organization, function, and evolution of a large and polyploid genome, the availability of a high-quality sequence anchored to genetic maps will accelerate the identification of genes underlying important agronomic traits.
The human gut microbiome has been associated with many health factors but variability between studies limits exploration of effects between them. Gut microbiota profiles are available for >2700 members of the deeply phenotyped TwinsUK cohort, providing a uniform platform for such comparisons. Here, we present gut microbiota association analyses for 38 common diseases and 51 medications within the cohort. We describe several novel associations, highlight associations common across multiple diseases, and determine which diseases and medications have the greatest association with the gut microbiota. These results provide a reference for future studies of the gut microbiome and its role in human health.