Czech Academy of Sciences, Institute of Computer Science
facilityPrague, Prague, Czechia
Research output, citation impact, and the most-cited recent papers from Czech Academy of Sciences, Institute of Computer Science (Czechia). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Czech Academy of Sciences, Institute of Computer Science
Modern enterprise applications are currently undergoing a complete paradigm shift away from traditional transactional processing to combined analytical and transactional processing. This challenge of combining two opposing query types in a single database management system results in additional requirements for transaction management as well. In this paper, we discuss our approach to achieve high throughput for transactional query processing while allowing concurrent analytical queries. We present our approach to distributed snapshot isolation and optimized two-phase commit protocols.
<ns3:p> g:Profiler ( <ns3:ext-link xmlns:ns4="http://www.w3.org/1999/xlink" ext-link-type="uri" ns4:href="http://biit.cs.ut.ee/gprofiler">https://biit.cs.ut.ee/gprofiler</ns3:ext-link> ) is a widely used gene list functional profiling and namespace conversion toolset that has been contributing to reproducible biological data analysis already since 2007. Here we introduce the accompanying R package, <ns3:bold>gprofiler2</ns3:bold> , developed to facilitate programmatic access to g:Profiler computations and databases via REST API. The <ns3:bold>gprofiler2</ns3:bold> package provides an easy-to-use functionality that enables researchers to incorporate functional enrichment analysis into automated analysis pipelines written in R. The package also implements interactive visualisation methods to help to interpret the enrichment results and to illustrate them for publications. In addition, <ns3:bold>gprofiler2</ns3:bold> gives access to the versatile gene/protein identifier conversion functionality in g:Profiler enabling to map between hundreds of different identifier types or orthologous species. The <ns3:bold>gprofiler2</ns3:bold> package is freely available at the <ns3:ext-link xmlns:ns4="http://www.w3.org/1999/xlink" ext-link-type="uri" ns4:href="https://cran.r-project.org/package=gprofiler2">CRAN repository</ns3:ext-link> . </ns3:p>
When an auction of multiple items is performed, it is often desirable to allow bids on combinations of items, as opposed to only on single items. Such an auction is often called &quot;combinatorial&quot;, and the exponential number of possible combinations results in computational intractability ofmanyaspects regarding such an auction. This paper considers two of these aspects: the bidding language and the allocation algorithm. First we consider which kinds of bids on combinations are allowed and how, i.e. in what language, they are speci ed. The basic tradeo is the expressibility of the language versus its simplicity. Weconsider and formalize several bidding languages and compare their strengths. We proveexponential separations between the expressive power of di erent languages, and show that one language, \\OR-bids with phantom items&quot;, can polynomially simulate the others. We then consider the problem of determining the best allocation { a problem known to be computationally intractable. We suggest an approach based on Linear Programming (LP) and motivate it. We provethat the LP approach nds an optimal allocation if and only if prices can be attached to single items in the auction. We pinpoint several classes of auctions where this is the case, and suggest greedy and branch-and-bound heuristics based on LP for other cases.
We consider algorithmic problems in a distributed setting where the participants annot be assumed to follow the algorithm but rather their own self-interest. As such pxticipants, termed agents, are capable of manipulating the algorithm, the algorithm designer should ensure in advance that the agents' interests are best served by behaving correctly.
Abstract A database consisting of 1870 data sets on catalyst compositions and their performances in the oxidative coupling of methane was compiled. For this goal, about 1000 full‐text references from the last 30 years have been analyzed and about 420 of them, which contained all the necessary information, were selected for the data extraction. The accumulated data were subject to statistical analysis: analysis of variance, correlation analysis, and decision tree. On the basis of the results, 18 catalytic key elements were selected from originally 68 elements. All oxides of the selected elements, which positively affect the selectivity to C 2 products, show strong basicity. Analysis of binary and ternary interactions between the selected key elements shows that high‐performance catalysts are mainly based on Mg and La oxides. Alkali (Cs, Na) and alkaline‐earth (Sr, Ba) metals used as dopants increase the selectivity of the host oxides, whereas dopants such as Mn, W, and the Cl anion have positive effects on the catalyst activity. The maximal C 2 selectivities for the proposed catalyst compositions range from 72 to 82 %, and the respective C 2 yields range from 16 to 26 %.
Image segmentation is usually addressed by training a model for a fixed set of object classes. Incorporating additional classes or more complex queries later is expensive as it requires re-training the model on a dataset that encompasses these expressions. Here we propose a system that can generate image segmentations based on arbitrary prompts at test time. A prompt can be either a text or an image. This approach enables us to create a unified model (trained once) for three common segmentation tasks, which come with distinct challenges: referring expression segmentation, zero-shot segmentation and one-shot segmentation. We build upon the CLIP model as a backbone which we extend with a transformer-based decoder that enables dense prediction. After training on an extended version of the PhraseCut dataset, our system generates a binary segmentation map for an image based on a free-text prompt or on an additional image expressing the query. We analyze different variants of the latter image-based prompts in detail. This novel hybrid input allows for dynamic adaptation not only to the three segmentation tasks mentioned above, but to any binary segmentation task where a text or image query can be formulated. Finally, we find our system to adapt well to generalized queries involving affordances or properties. Code is available at https://eckerlab.org/code/CLIPSeg
Journal Article Computational Variants of the Lanczos Method for the Eigenproblem Get access C. C. PAIGE C. C. PAIGE * London University, Institute of Computer Science44 Gordon Square, London WC1H OPD * Present address: School of Computer Science, McGill University, Montreal 101, Quebec, Canada. Search for other works by this author on: Oxford Academic Google Scholar IMA Journal of Applied Mathematics, Volume 10, Issue 3, December 1972, Pages 373–381, https://doi.org/10.1093/imamat/10.3.373 Published: 01 December 1972 Article history Received: 28 October 1971 Revision received: 27 January 1972 Published: 01 December 1972
Abstract. In this paper, we describe the PALM model system 6.0. PALM (formerly an abbreviation for Parallelized Large-eddy Simulation Model and now an independent name) is a Fortran-based code and has been applied for studying a variety of atmospheric and oceanic boundary layers for about 20 years. The model is optimized for use on massively parallel computer architectures. This is a follow-up paper to the PALM 4.0 model description in Maronga et al. (2015). During the last years, PALM has been significantly improved and now offers a variety of new components. In particular, much effort was made to enhance the model with components needed for applications in urban environments, like fully interactive land surface and radiation schemes, chemistry, and an indoor model. This paper serves as an overview paper of the PALM 6.0 model system and we describe its current model core. The individual components for urban applications, case studies, validation runs, and issues with suitable input data are presented and discussed in a series of companion papers in this special issue.
A method for computing a sparse incomplete factorization of the inverse of a symmetric positive definite matrix A is developed, and the resulting factorized sparse approximate inverse is used as an explicit preconditioner for conjugate gradient calculations. It is proved that in exact arithmetic the preconditioner is well defined if A is an H-matrix. The results of numerical experiments are presented.
We describe two small and portable TCP/IP implementations fulfilling the subset of RFC1122 requirements needed for full host-to-host interoperability. Our TCP/IP implementations do not sacrifice any of TCP's mechanisms such as urgent data or congestion control. They support IP fragment reassembly and the number of multiple simultaneous connections is limited only by the available RAM. Despite being small and simple, our implementations do not require their peers to have complex, full-size stacks, but can communicate with peers running a similarly light-weight stack. The code size is on the order of 10 kilobytes and RAM usage can be configured to be as low as a few hundred bytes.
We propose a method for semi-supervised semantic segmentation using an adversarial network. While most existing discriminators are trained to classify input images as real or fake on the image level, we design a discriminator in a fully convolutional manner to differentiate the predicted probability maps from the ground truth segmentation distribution with the consideration of the spatial resolution. We show that the proposed discriminator can be used to improve semantic segmentation accuracy by coupling the adversarial loss with the standard cross entropy loss of the proposed model. In addition, the fully convolutional discriminator enables semi-supervised learning through discovering the trustworthy regions in predicted results of unlabeled images, thereby providing additional supervisory signals. In contrast to existing methods that utilize weakly-labeled images, our method leverages unlabeled images to enhance the segmentation model. Experimental results on the PASCAL VOC 2012 and Cityscapes datasets demonstrate the effectiveness of the proposed algorithm.
Wireless sensor networks (WSNs) are attracting great interest in a number of application domains concerned with monitoring and control of physical phenomena, as they enable dense and untethered deployments at low cost and with unprecedented flexibility. However, application development is still one of the main hurdles to a wide adoption of WSN technology. In current real-world WSN deployments, programming is typically carried out very close to the operating system, therefore requiring the programmer to focus on low-level system issues. This not only distracts the programmer from the application logic, but also requires a technical background rarely found among application domain experts. The need for appropriate high-level programming abstractions, capable of simplifying the programming chore without sacrificing efficiency, has long been recognized, and several solutions have hitherto been proposed, which differ along many dimensions. In this article, we survey the state of the art in programming approaches for WSNs. We begin by presenting a taxonomy of WSN applications, to identify the fundamental requirements programming platforms must deal with. Then, we introduce a taxonomy of WSN programming approaches that captures the fundamental differences among existing solutions, and constitutes the core contribution of this article. Our presentation style relies on concrete examples and code snippets taken from programming platforms representative of the taxonomy dimensions being discussed. We use the taxonomy to provide an exhaustive classification of existing approaches. Moreover, we also map existing approaches back to the application requirements, therefore providing not only a complete view of the state of the art, but also useful insights for selecting the programming abstraction most appropriate to the application at hand.
Energy is of primary importance in wireless sensor networks. By being able to estimate the energy consumption of the sensor nodes, applications and routing protocols are able to make informed decisions that increase the lifetime of the sensor network. However, it is in general not possible to measure the energy consumption on popular sensor node platforms. In this paper, we present and evaluate a software-based on-line energy estimation mechanism that estimates the energy consumption of a sensor node. We evaluate the mechanism by comparing the estimated energy consumption with the lifetime of capacitor-powered sensor nodes. By implementing and evaluating the X-MAC protocol, we show how software-based on-line energy estimation can be used to empirically evaluate the energy efficiency of sensor network protocols.
Human and animal diet reconstruction studies that rely on tissue chemical signatures aim at providing estimates on the relative intake of potential food groups. However, several sources of uncertainty need to be considered when handling data. Bayesian mixing models provide a natural platform to handle diverse sources of uncertainty while allowing the user to contribute with prior expert information. The Bayesian mixing model FRUITS (Food Reconstruction Using Isotopic Transferred Signals) was developed for use in diet reconstruction studies. FRUITS incorporates the capability to account for dietary routing, that is, the contribution of different food fractions (e.g. macronutrients) towards a dietary proxy signal measured in the consumer. FRUITS also provides relatively straightforward means for the introduction of prior information on the relative dietary contributions of food groups or food fractions. This type of prior may originate, for instance, from physiological or metabolic studies. FRUITS performance was tested using simulated data and data from a published controlled animal feeding experiment. The feeding experiment data was selected to exemplify the application of the novel capabilities incorporated into FRUITS but also to illustrate some of the aspects that need to be considered when handling data within diet reconstruction studies. FRUITS accurately predicted dietary intakes, and more precise estimates were obtained for dietary scenarios in which expert prior information was included. FRUITS represents a useful tool to achieve accurate and precise food intake estimates in diet reconstruction studies within different scientific fields (e.g. ecology, forensics, archaeology, and dietary physiology).
Summary With the widespread use of encrypted data transport, network traffic encryption is becoming a standard nowadays. This presents a challenge for traffic measurement, especially for analysis and anomaly detection methods, which are dependent on the type of network traffic. In this paper, we survey existing approaches for classification and analysis of encrypted traffic. First, we describe the most widespread encryption protocols used throughout the Internet. We show that the initiation of an encrypted connection and the protocol structure give away much information for encrypted traffic classification and analysis. Then, we survey payload and feature‐based classification methods for encrypted traffic and categorize them using an established taxonomy. The advantage of some of described classification methods is the ability to recognize the encrypted application protocol in addition to the encryption protocol. Finally, we make a comprehensive comparison of the surveyed feature‐based classification methods and present their weaknesses and strengths. Copyright © 2015 John Wiley & Sons, Ltd.
Abstractive summarization is the ultimate goal of document summarization research, but previously it is less investigated due to the immaturity of text generation techniques. Recently impressive progress has been made to abstractive sentence summarization using neural models. Unfortunately, attempts on abstractive document summarization are still in a primitive stage, and the evaluation results are worse than extractive methods on benchmark datasets. In this paper, we review the difficulties of neural abstractive document summarization, and propose a novel graph-based attention mechanism in the sequence-to-sequence framework. The intuition is to address the saliency factor of summarization, which has been overlooked by prior works. Experimental results demonstrate our model is able to achieve considerable improvement over previous neural abstractive models. The data-driven neural abstractive method is also competitive with state-of-the-art extractive methods.
Abstract. Numerical models that combine weather forecasting and atmospheric chemistry are here referred to as chemical weather forecasting models. Eighteen operational chemical weather forecasting models on regional and continental scales in Europe are described and compared in this article. Topics discussed in this article include how weather forecasting and atmospheric chemistry models are integrated into chemical weather forecasting systems, how physical processes are incorporated into the models through parameterization schemes, how the model architecture affects the predicted variables, and how air chemistry and aerosol processes are formulated. In addition, we discuss sensitivity analysis and evaluation of the models, user operational requirements, such as model availability and documentation, and output availability and dissemination. In this manner, this article allows for the evaluation of the relative strengths and weaknesses of the various modelling systems and modelling approaches. Finally, this article highlights the most prominent gaps of knowledge for chemical weather forecasting models and suggests potential priorities for future research directions, for the following selected focus areas: emission inventories, the integration of numerical weather prediction and atmospheric chemical transport models, boundary conditions and nesting of models, data assimilation of the various chemical species, improved understanding and parameterization of physical processes, better evaluation of models against data and the construction of model ensembles.
Sarcasm is a subtle form of language in which people express the opposite of what is implied. Previous works of sarcasm detection focused on texts. However, more and more social media platforms like Twitter allow users to create multi-modal messages, including texts, images, and videos. It is insufficient to detect sarcasm from multi-model messages based only on texts. In this paper, we focus on multimodal sarcasm detection for tweets consisting of texts and images in Twitter. We treat text features, image features and image attributes as three modalities and propose a multi-modal hierarchical fusion model to address this task. Our model first extracts image features and attribute features, and then leverages attribute features and bidirectional LSTM network to extract text features. Features of three modalities are then reconstructed and fused into one feature vector for prediction. We create a multi-modal sarcasm detection dataset based on Twitter. Evaluation results on the dataset demonstrate the efficacy of our proposed model and the usefulness of the three modalities.
Abstract Programmers frequently face the need to identify the differences between two programs, usually two different versions of a program. Text‐based tools such as the UNIXr̀ utility diff often produce unsatisfactory comparisons because they cannot accurately pinpoint the differences and because they sometimes produce irrelevant differences. Since programs have a rigid syntactic structure as described by the grammar of the programming language in which they are written, we develop a comparison algorithm that exploits knowledge of the grammar. The algorithm, which is based on a dynamic programming scheme, can point out the differences between two programs more accurately than previous text comparison tools. Finally, the two programs are pretty‐printed ‘synchronously’ with the differences highlighted so that the differences are easily identified.
In this paper, we propose a means to enhance an architecture description language with a description of component behavior. A notation used for this purpose should be able to express the "interplay" on the component's interfaces and reflect step-by-step refinement of the component's specification during its design. In addition, the notation should be easy to comprehend and allow for formal reasoning about the correctness of the specification refinement and also about the correctness of an implementation in terms of whether it adheres to the specification. Targeting all these requirements together, the paper proposes employing behavior protocols which are based on a notation similar to regular expressions. As proof of the concept, the behavior protocols are used in the SOFA architecture description language at three levels: interface, frame, and architecture. Key achievements of this paper include the definitions of bounded component behavior and protocol conformance relation. Using these concepts, the designer can verify the adherence of a component's implementation to its specification at runtime, while the correctness of refining the specification can be verified at design time.