Centre de Recherche en Informatique
facilityFontainebleau, Île-de-France, France
Research output, citation impact, and the most-cited recent papers from Centre de Recherche en Informatique (France). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Centre de Recherche en Informatique
In order to understand the functioning of organisms on the molecular level, we need to know which genes are expressed, when and where in the organism, and to which extent. The regulation of gene expression is achieved through genetic regulatory systems structured by networks of interactions between DNA, RNA, proteins, and small molecules. As most genetic regulatory networks of interest involve many components connected through interlocking positive and negative feedback loops, an intuitive understanding of their dynamics is hard to obtain. As a consequence, formal methods and computer tools for the modeling and simulation of genetic regulatory networks will be indispensable. This paper reviews formalisms that have been employed in mathematical biology and bioinformatics to describe genetic regulatory systems, in particular directed graphs, Bayesian networks, Boolean networks and their generalizations, ordinary and partial differential equations, qualitative differential equations, stochastic equations, and rule-based formalisms. In addition, the paper discusses how these formalisms have been used in the simulation of the behavior of actual regulatory systems.
Geometric and topological properties of protein structures, including surface pockets, interior cavities and cross channels, are of fundamental importance for proteins to carry out their functions. Computed Atlas of Surface Topography of proteins (CASTp) is a web server that provides online services for locating, delineating and measuring these geometric and topological properties of protein structures. It has been widely used since its inception in 2003. In this article, we present the latest version of the web server, CASTp 3.0. CASTp 3.0 continues to provide reliable and comprehensive identifications and quantifications of protein topography. In addition, it now provides: (i) imprints of the negative volumes of pockets, cavities and channels, (ii) topographic features of biological assemblies in the Protein Data Bank, (iii) improved visualization of protein structures and pockets, and (iv) more intuitive structural and annotated information, including information of secondary structure, functional sites, variant sites and other annotations of protein residues. The CASTp 3.0 web server is freely accessible at http://sts.bioe.uic.edu/castp/.
Analytic combinatorics aims to enable precise quantitative predictions of the properties of large combinatorial structures. The theory has emerged over recent decades as essential both for the analysis of algorithms and for the study of scientific models in many disciplines, including probability theory, statistical physics, computational biology, and information theory. With a careful combination of symbolic enumeration methods and complex analysis, drawing heavily on generating functions, results of sweeping generality emerge that can be applied in particular to fundamental structures such as permutations, sequences, strings, walks, paths, trees, graphs and maps. This account is the definitive treatment of the topic. The authors give full coverage of the underlying mathematics and a thorough treatment of both classical and modern applications of the theory. The text is complemented with exercises, examples, appendices and notes to aid understanding. The book can be used for an advanced undergraduate or a graduate course, or for self-study.
The Messidor database, which contains hundreds of eye fundus images, has been publicly distributed since 2008. It was created by the Messidor project in order to evaluate automatic lesion segmentation and diabetic retinopathy grading methods. Designing, producing and maintaining such a database entails significant costs. By publicly sharing it, one hopes to bring a valuable resource to the public research community. However, the real interest and benefit of the research community is not easy to quantify. We analyse here the feedback on the Messidor database, after more than 6 years of diffusion. This analysis should apply to other similar research databases.
BACKGROUND: Several R packages exist for the detection of differentially expressed genes from RNA-Seq data. The analysis process includes three main steps, namely normalization, dispersion estimation and test for differential expression. Quality control steps along this process are recommended but not mandatory, and failing to check the characteristics of the dataset may lead to spurious results. In addition, normalization methods and statistical models are not exchangeable across the packages without adequate transformations the users are often not aware of. Thus, dedicated analysis pipelines are needed to include systematic quality control steps and prevent errors from misusing the proposed methods. RESULTS: SARTools is an R pipeline for differential analysis of RNA-Seq count data. It can handle designs involving two or more conditions of a single biological factor with or without a blocking factor (such as a batch effect or a sample pairing). It is based on DESeq2 and edgeR and is composed of an R package and two R script templates (for DESeq2 and edgeR respectively). Tuning a small number of parameters and executing one of the R scripts, users have access to the full results of the analysis, including lists of differentially expressed genes and a HTML report that (i) displays diagnostic plots for quality control and model hypotheses checking and (ii) keeps track of the whole analysis process, parameter values and versions of the R packages used. CONCLUSIONS: SARTools provides systematic quality controls of the dataset as well as diagnostic plots that help to tune the model parameters. It gives access to the main parameters of DESeq2 and edgeR and prevents untrained users from misusing some functionalities of both packages. By keeping track of all the parameters of the analysis process it fits the requirements of reproducible research.
The Critical Assessment of Metagenome Interpretation (CAMI) community initiative presents results from its first challenge, a rigorous benchmarking of software for metagenome assembly, binning and taxonomic profiling. Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Today, three different physical (PHY) layers for the IEEE 802.11 WLAN are available (802.11a/b/g); they all provide multi-rate capabilities. To achieve a high performance under varying conditions, these devices need to adapt their transmission rate dynamically. While this rate adaptation algorithm is a critical component of their performance, only very few algorithms such as Auto Rate Fallback (ARF) or Receiver Based Auto Rate (RBAR) have been published and the implementation challenges associated with these mechanisms have never been publicly discussed. In this paper, we first present the important characteristics of the 802.11 systems that must be taken into account when such algorithms are designed. Specifically, we emphasize the contrast between low latency and high latency systems, and we give examples of actual chipsets that fall in either of the different categories. We propose an Adaptive ARF (AARF) algorithm for low latency systems that improves upon ARF to provide both short-term and long-term adaptation. The new algorithm has very low complexity while obtaining a performance similar to RBAR, which requires incompatible changes to the 802.11 MAC and PHY protocol. Finally, we present a new rate adaptation algorithm designed for high latency systems that has been implemented and evaluated on an AR5212-based device. Experimentation results show a clear performance improvement over the algorithm previously implemented in the AR5212 driver we used.
A discrete-event system is a system whose behavior can be described by means of a set of time-consuming activities, performed according to a prescribed ordering. Events correspond to starting or ending some activity. An analogy between linear systems and a class of discrete-event systems is developed. Following this analogy, such discrete-event systems can be viewed as linear, in the sense of an appropriate algebra. The periodical behavior of closed discrete-event systems, i.e., involving a set of repeatedly performed activities, can be totally characterized by solving an eigenvalue and eigenvector equation in this algebra. This problem is numerically solved by an efficient algorithm which basically consists of finding the shortest paths from one node to all other nodes in a graph. The potentiality of this approach for the performance evaluation of flexible manufacturing systems is emphasized; the case of a flowshop-like production process is analyzed in detail.
Physical systems usually exhibit quantum behavior, such as superpositions and entanglement, only when they are sufficiently decoupled from a lossy environment. Paradoxically, a specially engineered interaction with the environment can become a resource for the generation and protection of quantum states. This notion can be generalized to the confinement of a system into a manifold of quantum states, consisting of all coherent superpositions of multiple stable steady states. We have confined the state of a superconducting resonator to the quantum manifold spanned by two coherent states of opposite phases and have observed a Schrödinger cat state spontaneously squeeze out of vacuum before decaying into a classical mixture. This experiment points toward robustly encoding quantum information in multidimensional steady-state manifolds.
A survival analysis was applied to 1,453 patients treated between 1972 and 1978 in 33 French dialysis centers and prospectively followed up in the computerized Diaphane Dialysis Registry. 198 deaths (overall mortality = OM) were registered, of which 87 (43%) were secondary to cardiovascular complications (cardiovascular mortality = CVM). Risk factors for OM and CVM (p values less than 0.05) were age, male sex, nephroangiosclerosis or diabetic nephropathy as the primary renal disease, elevated systolic and diastolic blood pressure and two weekly dialysis rather then three. In contrast with the results observed for the general population, a high body mass index and elevated cholesterol, triglycerides and uric acid were not found to be associated with significantly increased CVM or OM. On the contrary, low body mass index (less than 20 kg/m2), low cholesterol (less than 4.5 mmol/l) and low mean predialysis blood urea (less than 4.6 mmol/l) were associated with increased OM and CVM, and more especially with high stroke mortality. Results for urea but not for cholesterol remain significant after adjustment for age, sex, weekly dialysis schedule and body mass index. They suggest that, in addition to elevated blood pressure, a poor nutritional state and/or low protein intake may be important factors for explaining the high cardiovascular mortality, particularly for strokes, observed in dialyzed patients.
We comprehensively study the least-squares Gaussian approximations of the diffraction-limited 2D-3D paraxial-nonparaxial point-spread functions (PSFs) of the wide field fluorescence microscope (WFFM), the laser scanning confocal microscope (LSCM), and the disk scanning confocal microscope (DSCM). The PSFs are expressed using the Debye integral. Under an L(infinity) constraint imposing peak matching, optimal and near-optimal Gaussian parameters are derived for the PSFs. With an L1 constraint imposing energy conservation, an optimal Gaussian parameter is derived for the 2D paraxial WFFM PSF. We found that (1) the 2D approximations are all very accurate; (2) no accurate Gaussian approximation exists for 3D WFFM PSFs; and (3) with typical pinhole sizes, the 3D approximations are accurate for the DSCM and nearly perfect for the LSCM. All the Gaussian parameters derived in this study are in explicit analytical form, allowing their direct use in practical applications.
This chapter describes some common approaches that can be used for simultaneously visualising the association between multiple categorical variables by focusing on the analysis of only three variables. It confines the application of multiple correspondence analysis to the visual summary of the association between three categorical variables, although it can be applied to a much larger sized table without loss of generality. The chapter also describes three coding methods and how they can be used to perform multiple correspondence analysis. The coding methods covered are crisp coding, Burt matrix, and stacking. Multiple correspondence analysis does not truly describe the underlying multivariate association structure between the variables of a multi-way contingency table. Instead, since it involves some form of bivariate transformation of the original contingency table, they are at best a way of visualising the various bivariate association structures that exist.
Even though goal modeling is an effective approach to requirements engineering, it is known to present a number of difficulties in practice. The paper discusses these difficulties and proposes to couple goal modeling and scenario authoring to overcome them. Whereas existing techniques use scenarios to concretize goals, we use them to discover goals. Our proposal is to define enactable rules which form the basis of a software environment called L'Ecritoire to guide the requirements elicitation process through interleaved goal modeling and scenario authoring. The focus of the paper is on the discovery of goals from scenarios. The discovery process is centered around the notion of a requirement chunk (RC) which is a pair <Goal, Scenario>. The paper presents the notion of RC, the rules to support the discovery of RCs and illustrates the application of the approach within L'Ecritoire using the ATM example. It also evaluates the potential practical benefits expected from the use of the approach.
A new, formally defined database model is introduced that combines fundamental principles of “semantic” database modeling in a coherent fashion. Using a graph-based formalism, the IFO model provides mechanisms for representing structured objects, and functional and ISA relationships between them. A number of fundamental results concerning semantic data modeling are obtained in the context of the IFO model. Notably, the types of object structure that can arise as a result of multiple uses of ISA relationships and object construction are described. Also, a natural, formal definition of update propagation is given, and it is shown that (under certain conditions) a correct update always exists.
Monte-Carlo Tree Search (MCTS) is a new best-first search guided by the results of Monte-Carlo simulations. In this article, we introduce two progressive strategies for MCTS, called progressive bias and progressive unpruning. They enable the use of relatively time-expensive heuristic knowledge without speed reduction. Progressive bias directs the search according to heuristic knowledge. Progressive unpruning first reduces the branching factor, and then increases it gradually again. Experiments assess that the two progressive strategies significantly improve the level of our Go program Mango. Moreover, we see that the combination of both strategies performs even better on larger board sizes.
In recent years, there has been a growing research interest in integrating machine learning techniques into meta-heuristics for solving combinatorial optimization problems. This integration aims to lead meta-heuristics toward an efficient, effective, and robust search and improve their performance in terms of solution quality, convergence rate, and robustness. Since various integration methods with different purposes have been developed, there is a need to review the recent advances in using machine learning techniques to improve meta-heuristics. To the best of our knowledge, the literature is deprived of having a comprehensive yet technical review. To fill this gap, this paper provides such a review on the use of machine learning techniques in the design of different elements of meta-heuristics for different purposes including algorithm selection, fitness evaluation, initialization, evolution, parameter setting, and cooperation. First, we describe the key concepts and preliminaries of each of these ways of integration. Then, the recent advances in each way of integration are reviewed and classified based on a proposed unified taxonomy. Finally, we provide a technical discussion on the advantages, limitations, requirements, and challenges of implementing each of these integration ways, followed by promising future research directions.
We propose Nopol, an approach to automatic repair of buggy conditional statements (i.e., if-then-else statements). This approach takes a buggy program as well as a test suite as input and generates a patch with a conditional expression as output. The test suite is required to contain passing test cases to model the expected behavior of the program and at least one failing test case that reveals the bug to be repaired. The process of Nopol consists of three major phases. First, Nopol employs angelic fix localization to identify expected values of a condition during the test execution. Second, runtime trace collection is used to collect variables and their actual values, including primitive data types and objected-oriented features (e.g., nullness checks), to serve as building blocks for patch generation. Third, Nopol encodes these collected data into an instance of a Satisfiability Modulo Theory (SMT) problem; then a feasible solution to the SMT instance is translated back into a code patch. We evaluate Nopol on 22 real-world bugs (16 bugs with buggy if conditions and six bugs with missing preconditions) on two large open-source projects, namely Apache Commons Math and Apache Commons Lang. Empirical analysis on these bugs shows that our approach can effectively fix bugs with buggy if conditions and missing preconditions. We illustrate the capabilities and limitations of Nopol using case studies of real bug fixes.
Throughout the history of mathematics, concepts of number and space have been tightly intertwined. We tested the hypothesis that cortical circuits for spatial attention contribute to mental arithmetic in humans. We trained a multivariate classifier algorithm to infer the direction of an eye movement, left or right, from the brain activation measured in the posterior parietal cortex. Without further training, the classifier then generalized to an arithmetic task. Its left versus right classification could be used to sort out subtraction versus addition trials, whether performed with symbols or with sets of dots. These findings are consistent with the suggestion that mental arithmetic co-opts parietal circuitry associated with spatial coding.
Human cognitive control is uniquely flexible and has been shown to depend on prefrontal cortex (PFC). But exactly how the biological mechanisms of the PFC support flexible cognitive control remains a profound mystery. Existing theoretical models have posited powerful task-specific PFC representations, but not how these develop. We show how this can occur when a set of PFC-specific neural mechanisms interact with breadth of experience to self organize abstract rule-like PFC representations that support flexible generalization in novel tasks. The same model is shown to apply to benchmark PFC tasks (Stroop and Wisconsin card sorting), accurately simulating the behavior of neurologically intact and frontally damaged people.
In this paper, the problem of robust stability of systems subject to parametric uncertainties is considered. Sufficient conditions for the existence of parameter-dependent Lyapunov functions are given in terms of a criterion which is reminiscent of, but less conservative than, Popov's stability criterion. An equivalent frequency-domain criterion is demonstrated. The relative sharpness of the proposed test and existing stability criteria is then discussed. The use of parameter-dependent Lyapunov functions for robust controller synthesis is then considered. It is shown that the search for robustly stabilizing controllers may be limited to controllers with the same order as the original plant. A possible synthesis procedure and a numerical example are then discussed.