Hasso Plattner Institute
facilityPotsdam, Germany
Research output, citation impact, and the most-cited recent papers from Hasso Plattner Institute (Germany). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Hasso Plattner Institute
Abstract Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored 1,2 . FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10 –11 ) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.
The DBpedia community project extracts structured, multilingual knowledge from Wikipedia and makes it freely available on the Web using Semantic Web and Linked Data technologies. The project extracts knowledge from 111 different language editions of
The development of the Internet in recent years has made it possible and useful to access many different information systems anywhere in the world to obtain information. While there is much research on the integration of heterogeneous information systems, most commercial systems stop short of the actual integration of available data. Data fusion is the process of fusing multiple records representing the same real-world object into a single, consistent, and clean representation. This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data fusion, namely, uncertain and conflicting data values. We give an overview and classification of different ways of fusing data and present several techniques based on standard and advanced operators of the relational algebra and SQL. Finally, the article features a comprehensive survey of data integration systems from academia and industry, showing if and how data fusion is performed in each.
Blood cells play essential roles in human health, underpinning physiological processes such as immunity, oxygen transport, and clotting, which when perturbed cause a significant global health burden. Here we integrate data from UK Biobank and a large-scale international collaborative effort, including data for 563,085 European ancestry participants, and discover 5,106 new genetic variants independently associated with 29 blood cell phenotypes covering a range of variation impacting hematopoiesis. We holistically characterize the genetic architecture of hematopoiesis, assess the relevance of the omnigenic model to blood cell phenotypes, delineate relevant hematopoietic cell states influenced by regulatory genetic variants and gene networks, identify novel splice-altering variants mediating the associations, and assess the polygenic prediction potential for blood traits and clinical disorders at the interface of complex and Mendelian genetics. These results show the power of large-scale blood cell trait GWAS to interrogate clinically meaningful variants across a wide allelic spectrum of human variation.
Significance Statement Early reports have indicated that AKI and other kidney abnormalities are associated with coronavirus disease 2019 (COVID-19). Of 3993 hospitalized patients with COVID-19 in a New York City health system, AKI occurred in 1835 (46%) patients; among patients with AKI, 19% required dialysis, and half of them died in the hospital. Among patients who were discharged, 35% had not recovered to baseline kidney function at the time of discharge. AKI is common among patients with COVID-19 and is associated with higher mortality than in patients without AKI; among those who survive, only about a third are discharged with renal recovery. These findings may help centers with resource planning and preparing for the increased load resulting from survivors of COVID-19–associated AKI who do not experience recovery of kidney function. Background Early reports indicate that AKI is common among patients with coronavirus disease 2019 (COVID-19) and associated with worse outcomes. However, AKI among hospitalized patients with COVID-19 in the United States is not well described. Methods This retrospective, observational study involved a review of data from electronic health records of patients aged ≥18 years with laboratory-confirmed COVID-19 admitted to the Mount Sinai Health System from February 27 to May 30, 2020. We describe the frequency of AKI and dialysis requirement, AKI recovery, and adjusted odds ratios (aORs) with mortality. Results Of 3993 hospitalized patients with COVID-19, AKI occurred in 1835 (46%) patients; 347 (19%) of the patients with AKI required dialysis. The proportions with stages 1, 2, or 3 AKI were 39%, 19%, and 42%, respectively. A total of 976 (24%) patients were admitted to intensive care, and 745 (76%) experienced AKI. Of the 435 patients with AKI and urine studies, 84% had proteinuria, 81% had hematuria, and 60% had leukocyturia. Independent predictors of severe AKI were CKD, men, and higher serum potassium at admission. In-hospital mortality was 50% among patients with AKI versus 8% among those without AKI (aOR, 9.2; 95% confidence interval, 7.5 to 11.3). Of survivors with AKI who were discharged, 35% had not recovered to baseline kidney function by the time of discharge. An additional 28 of 77 (36%) patients who had not recovered kidney function at discharge did so on posthospital follow-up. Conclusions AKI is common among patients hospitalized with COVID-19 and is associated with high mortality. Of all patients with AKI, only 30% survived with recovery of kidney function by the time of discharge.
Blockchain technology offers a sizable promise to rethink the way interorganizational business processes are managed because of its potential to realize execution without a central party serving as a single point of trust (and failure). To stimulate research on this promise and the limits thereof, in this article, we outline the challenges and opportunities of blockchain for business process management (BPM). We first reflect how blockchains could be used in the context of the established BPM lifecycle and second how they might become relevant beyond. We conclude our discourse with a summary of seven research directions for investigating the application of blockchain technology in the context of BPM.
With the success of wireless technologies in consumer electronics, standard wireless technologies are envisioned for the deployment in industrial environments as well. Industrial applications involving mobile subsystems or just the desire to save cabling make wireless technologies attractive. Nevertheless, these applications often have stringent requirements on reliability and timing. In wired environments, timing and reliability are well catered for by fieldbus systems (which are a mature technology designed to enable communication between digital controllers and the sensors and actuators interfacing to a physical process). When wireless links are included, reliability and timing requirements are significantly more difficult to meet, due to the adverse properties of the radio channels. In this paper, we thus discuss some key issues coming up in wireless fieldbus and wireless industrial communication systems: 1) fundamental problems like achieving timely and reliable transmission despite channel errors; 2) the usage of existing wireless technologies for this specific field of applications; and 3) the creation of hybrid systems in which wireless stations are incorporated into existing wired systems.
. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.
A worldwide movement in advanced manufacturing countries is seeking to reinvigorate (and revolutionize) the industrial and manufacturing core competencies with the use of the latest advances in information and communications technology. Visual computing plays an important role as the "glue factor" in complete solutions. This article positions visual computing in its intrinsic crucial role for Industrie 4.0 and provides a general, broad overview and points out specific directions and scenarios for future research.
MOTIVATION: Text mining has become an important tool for biomedical research. The most fundamental text-mining task is the recognition of biomedical named entities (NER), such as genes, chemicals and diseases. Current NER methods rely on pre-defined features which try to capture the specific surface properties of entity types, properties of the typical local context, background knowledge, and linguistic information. State-of-the-art tools are entity-specific, as dictionaries and empirically optimal feature sets differ between entity types, which makes their development costly. Furthermore, features are often optimized for a specific gold standard corpus, which makes extrapolation of quality measures difficult. RESULTS: We show that a completely generic method based on deep learning and statistical word embeddings [called long short-term memory network-conditional random field (LSTM-CRF)] outperforms state-of-the-art entity-specific NER tools, and often by a large margin. To this end, we compared the performance of LSTM-CRF on 33 data sets covering five different entity classes with that of best-of-class NER tools and an entity-agnostic CRF implementation. On average, F1-score of LSTM-CRF is 5% above that of the baselines, mostly due to a sharp increase in recall. AVAILABILITY AND IMPLEMENTATION: The source code for LSTM-CRF is available at https://github.com/glample/tagger and the links to the corpora are available at https://corposaurus.github.io/corpora/ . CONTACT: habibima@informatik.hu-berlin.de.
Context-dependent behavior is becoming increasingly important for a wide range of application domains, from pervasive computing to common business applications. Unfortunately, mainstream programming languages do not provide mechanisms that enable software entities to adapt their behavior dynamically to the current execution context. This leads developers to adopt convoluted designs to achieve the necessary runtime flexibility. We propose a new programming technique called Context-oriented Programming (COP) which addresses this problem. COP treats context explicitly, and provides mechanisms to dynamically adapt behavior in reaction to changes in context, even after system deployment at runtime. In this paper, we lay the foundations of COP, show how dynamic layer activation enables multi-dimensional dispatch, illustrate the application of COP by examples in several language extensions, and demonstrate that COP is largely independent of other commitments to programming style.
ABSTRACT Population isolates such as Finland provide benefits in genetic studies because the allelic spectrum of damaging alleles in any gene is often concentrated on a small number of low-frequency variants (0.1% ≤ minor allele frequency < 5%), which survived the founding bottleneck, as opposed to being distributed over a much larger number of ultra--rare variants. While this advantage is well-- established in Mendelian genetics, its value in common disease genetics has been less explored. FinnGen aims to study the genome and national health register data of 500,000 Finns, already reaching 224,737 genotyped and phenotyped participants. Given the relatively high median age of participants (63 years) and dominance of hospital-based recruitment, FinnGen is enriched for many disease endpoints often underrepresented in population-based studies (e.g., rarer immune-mediated diseases and late onset degenerative and ophthalmologic endpoints). We report here a genome-wide association study (GWAS) of 1,932 clinical endpoints defined from nationwide health registries. We identify genome--wide significant associations at 2,491 independent loci. Among these, finemapping implicates 148 putatively causal coding variants associated with 202 endpoints, 104 with low allele frequency (AF<10%) of which 62 were over two-fold enriched in Finland. We studied a benchmark set of 15 diseases that had previously been investigated in large genome-wide association studies. FinnGen discovery analyses were meta-analysed in Estonian and UK biobanks. We identify 30 novel associations, primarily low-frequency variants strongly enriched, in or specific to, the Finnish population and Uralic language family neighbors in Estonia and Russia. These findings demonstrate the power of bottlenecked populations to find unique entry points into the biology of common diseases through low-frequency, high impact variants. Such high impact variants have a potential to contribute to medical translation including drug discovery.
Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses.
Planning has been very successful for control tasks with known environment dynamics. To leverage planning in unknown environments, the agent needs to learn the dynamics from interactions with the world. However, learning dynamics models that are accurate enough for planning has been a long-standing challenge, especially in image-based domains. We propose the Deep Planning Network (PlaNet), a purely model-based agent that learns the environment dynamics from images and chooses actions through fast online planning in latent space. To achieve high performance, the dynamics model must accurately predict the rewards ahead for multiple time steps. We approach this using a latent dynamics model with both deterministic and stochastic transition components. Moreover, we propose a multi-step variational inference objective that we name latent overshooting. Using only pixel observations, our agent solves continuous control tasks with contact dynamics, partial observability, and sparse rewards, which exceed the difficulty of tasks that were previously solved by planning with learned models. PlaNet uses substantially fewer episodes and reaches final performance close to and sometimes higher than strong model-free algorithms.
When SQL and the relational data model were introduced 25 years ago as a general data management concept, enterprise software migrated quickly to this new technology. It is fair to say that SQL and the various implementations of RDBMSs became the backbone of enterprise systems. In those days. we believed that business planning, transaction processing and analytics should reside in one single system. Despite the incredible improvements in computer hardware, high-speed networks, display devices and the associated software, speed and flexibility remained an issue.
The advantages of constructivist learning and criteria for its realization have been well-determined through theoretical findings in pedagogy (Reich, 2008; Dewey, 1916). Educational researchers and the Organization for Economic Cooperation and Development (OECD) promote a process oriented, so-called CSSC learning (constructed, self-regulated, situated, collaborative) to be effective in supporting 21st century competences (de Corte, 2010). However, the practical implementation itself leaves a lot to be desired (Gardner, 2010; Wagner, 2011). Lessons are not efficiently designed to help teachers execute CSSC learning. Common CSSC learning methods are abstractly describing what to do, while leaving the teacher uncertain about how to do it. We therefore conclude: there is a missing link between theoretical findings and demands by pedagogy science, and practical implementation of constructivist learning and teaching. Teachers have negative classroom experience with project methods. They would rather opt for the well structured, but abstract and instruction-only approach, than using an open structured, but more concrete and holistic mode of collaborative learning in projects. We claim that, Design Thinking as a methodology for project-oriented learning offers teachers the needed support towards a CSSC oriented teaching and learning design. Through a formalized process it may serve as a bridge between demand and reality of learning in the classroom. Thereby, Design Thinking would contribute to educational research. Our case study points out the improvement of the classroom experience for teacher and student alike, when using Design Thinking. This leads to a positive attitude towards constructivist learning and an increase of its implementation in education. The ultimate goal of this paper is to prove that Design Thinking gets teachers empowered to facilitate CSSC learning in order to foster 21st century skills.
In this paper, we explore how to add pointing input capabilities to very small screen devices. On first sight, touchscreens seem to allow for particular compactness, because they integrate input and screen into the same physical space. The opposite is true, however, because the user's fingers occlude contents and prevent precision.
We explore how to add haptics to walls and other heavy objects in virtual reality. When a user tries to push such an object, our system actuates the user's shoulder, arm, and wrist muscles by means of electrical muscle stimulation, creating a counter force that pulls the user's arm backwards. Our device accomplishes this in a wearable form factor.
This paper surveys the field of nonphotorealistic rendering (NPR), focusing on techniques for transforming 2D input (images and video) into artistically stylized renderings. We first present a taxonomy of the 2D NPR algorithms developed over the past two decades, structured according to the design characteristics and behavior of each technique. We then describe a chronology of development from the semiautomatic paint systems of the early nineties, through to the automated painterly rendering systems of the late nineties driven by image gradient analysis. Two complementary trends in the NPR literature are then addressed, with reference to our taxonomy. First, the fusion of higher level computer vision and NPR, illustrating the trends toward scene analysis to drive artistic abstraction and diversity of style. Second, the evolution of local processing approaches toward edge-aware filtering for real-time stylization of images and video. The survey then concludes with a discussion of open challenges for 2D NPR identified in recent NPR symposia, including topics such as user and aesthetic evaluation.
Current touch devices, such as capacitive touchscreens are based on the implicit assumption that users acquire targets with the center of the contact area between finger and device. Findings from our previous work indicate, however, that such devices are subject to systematic error offsets. This suggests that the underlying assumption is most likely wrong. In this paper, we therefore revisit this assumption. In a series of three user studies, we find evidence that the features that users align with the target are visual features. These features are located on the top of the user's fingers, not at the bottom, as assumed by traditional devices. We present the projected center model, under which error offsets drop to 1.6mm, compared to 4mm for the traditional model. This suggests that the new model is indeed a good approximation of how users conceptualize touch input. The primary contribution of this paper is to help understand touch-one of the key input technologies in human-computer interaction. At the same time, our findings inform the design of future touch input technology. They explain the inaccuracy of traditional touch devices as a -Sparallax- artifact between user control based on the top of the finger and sensing based on the bottom side of the finger. We conclude that certain camera-based sensing technologies can inherently be more accurate than contact area-based sensing.