Centre Inria de Saclay
facilityPalaiseau, Île-de-France, France
Research output, citation impact, and the most-cited recent papers from Centre Inria de Saclay (France). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Centre Inria de Saclay
Statistical machine learning methods are increasingly used for neuroimaging data analysis. Their main virtue is their ability to model high-dimensional datasets, e.g., multivariate analysis of activation images or resting-state time series. Supervised learning is typically used in decoding or encoding settings to relate brain images to behavioral or clinical observations, while unsupervised learning can uncover hidden structures in sets of images (e.g., resting state functional MRI) or find sub-populations in large cohorts. By considering different functional neuroimaging applications, we illustrate how scikit-learn, a Python machine learning library, can be used to perform some key analysis steps. Scikit-learn contains a very large set of statistical learning algorithms, both supervised and unsupervised, and its application to neuroimaging data provides a versatile tool to study the brain.
The development of magnetic resonance imaging (MRI) techniques has defined modern neuroimaging. Since its inception, tens of thousands of studies using techniques such as functional MRI and diffusion weighted imaging have allowed for the non-invasive study of the brain. Despite the fact that MRI is routinely used to obtain data for neuroscience research, there has been no widely adopted standard for organizing and describing the data collected in an imaging experiment. This renders sharing and reusing data (within or between labs) difficult if not impossible and unnecessarily complicates the application of automatic pipelines and quality assurance protocols. To solve this problem, we have developed the Brain Imaging Data Structure (BIDS), a standard for organizing and describing MRI datasets. The BIDS standard uses file formats compatible with existing software, unifies the majority of practices already common in the field, and captures the metadata necessary for most common data processing operations.
Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope.
Interpreting and controlling bioelectromagnetic phenomena require realistic physiological models and accurate numerical solvers. A semi-realistic model often used in practise is the piecewise constant conductivity model, for which only the interfaces have to be meshed. This simplified model makes it possible to use Boundary Element Methods. Unfortunately, most Boundary Element solutions are confronted with accuracy issues when the conductivity ratio between neighboring tissues is high, as for instance the scalp/skull conductivity ratio in electro-encephalography. To overcome this difficulty, we proposed a new method called the symmetric BEM, which is implemented in the OpenMEEG software. The aim of this paper is to present OpenMEEG, both from the theoretical and the practical point of view, and to compare its performances with other competing software packages. We have run a benchmark study in the field of electro- and magneto-encephalography, in order to compare the accuracy of OpenMEEG with other freely distributed forward solvers. We considered spherical models, for which analytical solutions exist, and we designed randomized meshes to assess the variability of the accuracy. Two measures were used to characterize the accuracy. the Relative Difference Measure and the Magnitude ratio. The comparisons were run, either with a constant number of mesh nodes, or a constant number of unknowns across methods. Computing times were also compared. We observed more pronounced differences in accuracy in electroencephalography than in magnetoencephalography. The methods could be classified in three categories: the linear collocation methods, that run very fast but with low accuracy, the linear collocation methods with isolated skull approach for which the accuracy is improved, and OpenMEEG that clearly outperforms the others. As far as speed is concerned, OpenMEEG is on par with the other methods for a constant number of unknowns, and is hence faster for a prescribed accuracy level. This study clearly shows that OpenMEEG represents the state of the art for forward computations. Moreover, our software development strategies have made it handy to use and to integrate with other packages. The bioelectromagnetic research community should therefore be able to benefit from OpenMEEG with a limited development effort.
New challenges in remote sensing impose the necessity of designing pixel classification methods that, once trained on a certain dataset, generalize to other areas of the earth. This may include regions where the appearance of the same type of objects is significantly different. In the literature it is common to use a single image and split it into training and test sets to train a classifier and assess its performance, respectively. However, this does not prove the generalization capabilities to other inputs. In this paper, we propose an aerial image labeling dataset that covers a wide range of urban settlement appearances, from different geographic locations. Moreover, the cities included in the test set are different from those of the training set. We also experiment with convolutional neural networks on our dataset.
Importance: Great interest exists in identifying methods to predict neuropsychiatric disease states and treatment outcomes from high-dimensional data, including neuroimaging and genomics data. The goal of this review is to highlight several potential problems that can arise in studies that aim to establish prediction. Observations: A number of neuroimaging studies have claimed to establish prediction while establishing only correlation, which is an inappropriate use of the statistical meaning of prediction. Statistical associations do not necessarily imply the ability to make predictions in a generalized manner; establishing evidence for prediction thus requires testing of the model on data separate from those used to estimate the model's parameters. This article discusses various measures of predictive performance and the limitations of some commonly used measures, with a focus on the importance of using multiple measures when assessing performance. For classification, the area under the receiver operating characteristic curve is an appropriate measure; for regression analysis, correlation should be avoided, and median absolute error is preferred. Conclusions and Relevance: To ensure accurate estimates of predictive validity, the recommended best practices for predictive modeling include the following: (1) in-sample model fit indices should not be reported as evidence for predictive accuracy, (2) the cross-validation procedure should encompass all operations applied to the data, (3) prediction analyses should not be performed with samples smaller than several hundred observations, (4) multiple measures of prediction accuracy should be examined and reported, (5) the coefficient of determination should be computed using the sums of squares formulation and not the correlation coefficient, and (6) k-fold cross-validation rather than leave-one-out cross-validation should be used.
It is control that turns scientific knowledge into useful technology: in physics and engineering it provides a systematic way for driving a dynamical system from a given initial state into a desired target state with minimized expenditure of energy and resources. As one of the cornerstones for enabling quantum technologies, optimal quantum control keeps evolving and expanding into areas as diverse as quantum-enhanced sensing, manipulation of single spins, photons, or atoms, optical spectroscopy, photochemistry, magnetic resonance (spectroscopy as well as medical imaging), quantum information processing and quantum simulation. In this communication, state-of-the-art quantum control techniques are reviewed and put into perspective by a consortium of experts in optimal control theory and applications to spectroscopy, imaging, as well as quantum dynamics of closed and open systems. We address key challenges and sketch a roadmap for future developments.
Physical representations of data have existed for thousands of years. Yet it is now that advances in digital fabrication, actuated tangible interfaces, and shape-changing displays are spurring an emerging area of research that we call Data Physicalization. It aims to help people explore, understand, and communicate data using computer-supported physical data representations. We call these representations physicalizations, analogously to visualizations -- their purely visual counterpart. In this article, we go beyond the focused research questions addressed so far by delineating the research area, synthesizing its open challenges and laying out a research agenda.
EMPIRE10 (Evaluation of Methods for Pulmonary Image REgistration 2010) is a public platform for fair and meaningful comparison of registration algorithms which are applied to a database of intrapatient thoracic CT image pairs. Evaluation of nonrigid registration techniques is a nontrivial task. This is compounded by the fact that researchers typically test only on their own data, which varies widely. For this reason, reliable assessment and comparison of different registration algorithms has been virtually impossible in the past. In this work we present the results of the launch phase of EMPIRE10, which comprised the comprehensive evaluation and comparison of 20 individual algorithms from leading academic and industrial research groups. All algorithms are applied to the same set of 30 thoracic CT pairs. Algorithm settings and parameters are chosen by researchers expert in the configuration of their own method and the evaluation is independent, using the same criteria for all participants. All results are published on the EMPIRE10 website (http://empire10.isi.uu.nl). The challenge remains ongoing and open to new participants. Full results from 24 algorithms have been published at the time of writing. This paper details the organization of the challenge, the data and evaluation methods and the outcome of the initial launch with 20 algorithms. The gain in knowledge and future work are discussed.
The lack of a formal link between neural network structure and its emergent function has hampered our understanding of how the brain processes information. We have now come closer to describing such a link by taking the direction of synaptic transmission into account, constructing graphs of a network that reflect the direction of information flow, and analyzing these directed graphs using algebraic topology. Applying this approach to a local network of neurons in the neocortex revealed a remarkably intricate and previously unseen topology of synaptic connectivity. The synaptic network contains an abundance of cliques of neurons bound into cavities that guide the emergence of correlated activity. In response to stimuli, correlated activity binds synaptically connected neurons into functional cliques and cavities that evolve in a stereotypical sequence toward peak complexity. We propose that the brain processes stimuli by forming increasingly complex functional cliques and cavities.
Topological persistence has proven to be a key concept for the study of real-valued functions defined over topological spaces. Its validity relies on the fundamental property that the persistence diagrams of nearby functions are close. However, existing stability results are restricted to the case of continuous functions defined over triangulable spaces. In this paper, we present new stability results that do not suffer from the above restrictions. Furthermore, by working at an algebraic level directly, we make it possible to compare the persistence diagrams of functions defined over different spaces, thus enabling a variety of new applications of the concept of persistence. Along the way, we extend the definition of persistence diagram to a larger setting, introduce the notions of discretization of a persistence module and associated pixelization map, define a proximity measure between persistence modules, and show how to interpolate between persistence modules, thereby lending a more analytic character to this otherwise algebraic setting. We believe these new theoretical concepts and tools shed new light on the theory of persistence, in addition to simplifying proofs and enabling new applications.
We present a model for building, visualizing, and interacting with multiscale representations of information visualization techniques using hierarchical aggregation. The motivation for this work is to make visual representations more visually scalable and less cluttered. The model allows for augmenting existing techniques with multiscale functionality, as well as for designing new visualization and interaction techniques that conform to this new class of visual representations. We give some examples of how to use the model for standard information visualization techniques such as scatterplots, parallel coordinates, and node-link diagrams, and discuss existing techniques that are based on hierarchical aggregation. This yields a set of design guidelines for aggregated visualizations. We also present a basic vocabulary of interaction techniques suitable for navigating these multiscale visualizations.
The brain is constituted of multiple networks of functionally correlated brain areas, out of which the default-mode network (DMN) is the largest. Most existing research into the DMN has taken a corticocentric approach. Despite its resemblance with the unitary model of the limbic system, the contribution of subcortical structures to the DMN may be underappreciated. Here, we propose a more comprehensive neuroanatomical model of the DMN including subcortical structures such as the basal forebrain, cholinergic nuclei, anterior and mediodorsal thalamic nuclei. Additionally, tractography of diffusion-weighted imaging was employed to explore the structural connectivity, which revealed that the thalamus and basal forebrain are of central importance for the functioning of the DMN. The contribution of these neurochemically diverse brain nuclei reconciles previous neuroimaging with neuropathological findings in diseased brains and offers the potential for identifying a conserved homologue of the DMN in other mammalian species.
One of the main challenges that the Semantic Web faces is the integration of a growing number of independently designed ontologies. In this work, we present paris, an approach for the automatic alignment of ontologies. paris aligns not only instances, but also relations and classes. Alignments at the instance level cross-fertilize with alignments at the schema level. Thereby, our system provides a truly holistic solution to the problem of ontology alignment. The heart of the approach is probabilistic, i.e., we measure degrees of matchings based on probability estimates. This allows paris to run without any parameter tuning. We demonstrate the efficiency of the algorithm and its precision through extensive experiments. In particular, we obtain a precision of around 90% in experiments with some of the world's largest ontologies.
We present YAGO2, an extension of the YAGO knowledge base with focus on temporal and spatial knowledge. It is automatically built from Wikipedia, GeoNames, and WordNet, and contains nearly 10 million entities and events, as well as 80 million facts representing general world knowledge. An enhanced data representation introduces time and location as first-class citizens. The wealth of spatio-temporal information in YAGO can be explored either graphically or through a special time- and space-aware query language.
Determining the state of consciousness in patients with disorders of consciousness is a challenging practical and theoretical problem. Recent findings suggest that multiple markers of brain activity extracted from the EEG may index the state of consciousness in the human brain. Furthermore, machine learning has been found to optimize their capacity to discriminate different states of consciousness in clinical practice. However, it is unknown how dependable these EEG markers are in the face of signal variability because of different EEG configurations, EEG protocols and subpopulations from different centres encountered in practice. In this study we analysed 327 recordings of patients with disorders of consciousness (148 unresponsive wakefulness syndrome and 179 minimally conscious state) and 66 healthy controls obtained in two independent research centres (Paris Pitié-Salpêtrière and Liège). We first show that a non-parametric classifier based on ensembles of decision trees provides robust out-of-sample performance on unseen data with a predictive area under the curve (AUC) of ~0.77 that was only marginally affected when using alternative EEG configurations (different numbers and positions of sensors, numbers of epochs, average AUC = 0.750 ± 0.014). In a second step, we observed that classifiers based on multiple as well as single EEG features generalize to recordings obtained from different patient cohorts, EEG protocols and different centres. However, the multivariate model always performed best with a predictive AUC of 0.73 for generalization from Paris 1 to Paris 2 datasets, and an AUC of 0.78 from Paris to Liège datasets. Using simulations, we subsequently demonstrate that multivariate pattern classification has a decisive performance advantage over univariate classification as the stability of EEG features decreases, as different EEG configurations are used for feature-extraction or as noise is added. Moreover, we show that the generalization performance from Paris to Liège remains stable even if up to 20% of the diagnostic labels are randomly flipped. Finally, consistent with recent literature, analysis of the learned decision rules of our classifier suggested that markers related to dynamic fluctuations in theta and alpha frequency bands carried independent information and were most influential. Our findings demonstrate that EEG markers of consciousness can be reliably, economically and automatically identified with machine learning in various clinical and acquisition contexts.
We consider a set of views stating possibly conflicting facts. Negative facts in the views may come, e.g., from functional dependencies in the underlying database schema. We want to predict the truth values of the facts. Beyond simple methods such as voting (typically rather accurate), we explore techniques based on "corroboration", i.e., taking into account trust in the views. We introduce three fixpoint algorithms corresponding to different levels of complexity of an underlying probabilistic model. They all estimate both truth values of facts and trust in the views. We present experimental studies on synthetic and real-world data. This analysis illustrates how and in which context these methods improve corroboration results over baseline methods. We believe that corroboration can serve in a wide range of applications such as source selection in the semantic Web, data quality assessment or semantic annotation cleaning in social networks. This work sets the bases for a wide range of techniques for solving these more complex problems.
International audience
Deep learning algorithms trained to predict masked words from large amount of text have recently been shown to generate activations similar to those of the human brain. However, what drives this similarity remains currently unknown. Here, we systematically compare a variety of deep language models to identify the computational principles that lead them to generate brain-like representations of sentences. Specifically, we analyze the brain responses to 400 isolated sentences in a large cohort of 102 subjects, each recorded for two hours with functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG). We then test where and when each of these algorithms maps onto the brain responses. Finally, we estimate how the architecture, training, and performance of these models independently account for the generation of brain-like representations. Our analyses reveal two main findings. First, the similarity between the algorithms and the brain primarily depends on their ability to predict words from context. Second, this similarity reveals the rise and maintenance of perceptual, lexical, and compositional representations within each cortical region. Overall, this study shows that modern language algorithms partially converge towards brain-like solutions, and thus delineates a promising path to unravel the foundations of natural language processing.
Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm2 and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65 nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.