Entrepôts, Représentation et Ingénierie des Connaissances

facilitySaint-Priest, France

Research output, citation impact, and the most-cited recent papers from Entrepôts, Représentation et Ingénierie des Connaissances (France). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works

539

Citations

2.3K

h-index

i10-index

Also known as

Entrepôts, Représentation et Ingénierie des Connaissances

Top-cited papers from Entrepôts, Représentation et Ingénierie des Connaissances

The discriminative functional mixture model for a comparative analysis of bike sharing systems

Charles Bouveyron, Étienne Côme, Julien Jacques

2015· The Annals of Applied Statistics155doi:10.1214/15-aoas861

Bike sharing systems (BSSs) have become a means of sustainable intermodal transport and are now proposed in many cities worldwide. Most BSSs also provide open access to their data, particularly to real-time status reports on their bike stations. The analysis of the mass of data generated by such systems is of particular interest to BSS providers to update system structures and policies. This work was motivated by interest in analyzing and comparing several European BSSs to identify common operating patterns in BSSs and to propose practical solutions to avoid potential issues. Our approach relies on the identification of common patterns between and within systems. To this end, a model-based clustering method, called FunFEM, for time series (or more generally functional data) is developed. It is based on a functional mixture model that allows the clustering of the data in a discriminative functional subspace. This model presents the advantage in this context to be parsimonious and to allow the visualization of the clustered systems. Numerical experiments confirm the good behavior of FunFEM, particularly compared to state-of-the-art methods. The application of FunFEM to BSS data from JCDecaux and the Transport for London Initiative allows us to identify 10 general patterns, including pathological ones, and to propose practical improvement strategies based on the system comparison. The visualization of the clustered data within the discriminative subspace turns out to be particularly informative regarding the system efficiency. The proposed methodology is implemented in a package for the R software, named funFEM, which is available on the CRAN. The package also provides a subset of the data analyzed in this work.

Algorithmique et géométrie discrète pour la caractérisation des courbes et des surfaces

David Cœurjolly

2002· HAL (Le Centre pour la Communication Scientifique Directe)33

The context of the work presented in this thesis is the digital geometry. This research area is devoted to the automatic analysis of objects in digital images in dimension 2 and 3. All acquisition devices provide data organized on regular grids, called digital data. The algorithms that are explored and extended keep the discrete aspect of the data, in opposition to techniques based on an approximation process of a continuous model. More precisely, we are interested in the study of digital curves and surfaces. First of all, we consider basic digital objects such as digital straight lines, planes and circles. We present algorithms that allow to characterize such objects and we propose some extensions of these methods. Then, we study some metrics on the digital objects such as the Euclidean distance transform and the notion of digital geodesic. An approach based on the visibility property in digital domains is presented. In the third part, we define and evaluate estimators of the Euclidean measurements such as the length, the curvature or the area. Some results on the convergence of these estimators are presented. Finally, we illustrate some applications in which these researches have been used for: archaeological object automatic classification and snow sample micro-structure analysis.

Processing and Managing Complex Data for Decision Support

Jérôme Darmont, Omar Boussaïd

2006· IGI Global eBooks31doi:10.4018/978-1-59140-655-6

"This book provides an overall view of the emerging field of complex data processing, highlighting the similarities between the different data, issues and approaches"--Provided by publisher

A Novel Hybrid Algorithm to Forecast Functional Time Series Based on Pattern Sequence Similarity with Application to Electricity Demand

Francisco Martínez‐Álvarez, Amandine Schmutz, G. Asencio–Cortés, Julien Jacques

2018· Energies25doi:10.3390/en12010094

The forecasting of future values is a very challenging task. In almost all scientific disciplines, the analysis of time series provides useful information and even economic benefits. In this context, this paper proposes a novel hybrid algorithm to forecast functional time series with arbitrary prediction horizons. It integrates a well-known clustering functional data algorithm into a forecasting strategy based on pattern sequence similarity, which was originally developed for discrete time series. The new approach assumes that some patterns are repeated over time, and it attempts to discover them and evaluate their immediate future. Hence, the algorithm first applies a clustering functional time series algorithm, i.e., it assigns labels to every data unit (it may represent either one hour, or one day, or any arbitrary length). As a result, the time series is transformed into a sequence of labels. Later, it retrieves the sequence of labels occurring just after the sample that we want to be forecasted. This sequence is searched for within the historical data, and every time it is found, the sample immediately after is stored. Once the searching process is terminated, the output is generated by weighting all stored data. The performance of the approach has been tested on real-world datasets related to electricity demand and compared to other existing methods, reporting very promising results. Finally, a statistical significance test has been carried out to confirm the suitability of the election of the compared methods. In conclusion, a novel algorithm to forecast functional time series is proposed with very satisfactory results when assessed in the context of electricity demand.

A Study of Synthetic Oversampling for Twitter Imbalanced Sentiment Analysis

Julien Ah-Pine, Edmundo-Pavel Soriano-Morales

2016· HAL (Le Centre pour la Communication Scientifique Directe)18

International audience

Chapter 6. Gender and age differences in swearing

Michaël Gauthier, Adrien Guille

2017· Pragmatics & beyond. New series17doi:10.1075/pbns.282.07gau

International audience

Co-clustering of multivariate functional data for the analysis of air pollution in the South of France

Charles Bouveyron, Julien Jacques, Amandine Schmutz, Fanny Simões +1 more

2022· The Annals of Applied Statistics16doi:10.1214/21-aoas1547

Nowadays, air pollution is a major threat for public health with clear relationships with many diseases, especially cardiovascular ones. The spatiotemporal study of pollution is of great interest for governments and local authorities when deciding for public alerts or new city policies against pollution increase. The aim of this work is to study spatiotemporal profiles of environmental data collected in the south of France (Région Sud) by the public agency AtmoSud. The idea is to better understand the exposition to pollutants of inhabitants on a large territory with important differences in term of geography and urbanism. The data gather the recording of daily measurements of five environmental variables, namely, three pollutants (PM10, NO2, O3) and two meteorological factors (pressure and temperature) over six years. Those data can be seen as multivariate functional data: quantitative entities evolving along time for which there is a growing need of methods to summarize and understand them. For this purpose a novel co-clustering model for multivariate functional data is defined. The model is based on a functional latent block model which assumes for each co-cluster a probabilistic distribution for multivariate functional principal component scores. A stochastic EM algorithm, embedding a Gibbs sampler, is proposed for model inference as well as a model selection criteria for choosing the number of co-clusters. The application of the proposed co-clustering algorithm on environmental data of the Région Sud allowed to divide the region, composed by 357 zones, into six macroareas with common exposure to pollution. We showed that pollution profiles vary accordingly to the seasons, and the patterns are similar during the six years studied. These results can be used by local authorities to develop specific programs to reduce pollution at the macroarea level and to identify specific periods of the year with high pollution peaks in order to set up specific health prevention programs. Overall, the proposed co-clustering approach is a powerful resource to analyse multivariate functional data in order to identify intrinsic data structure and to summarize variables profiles over long periods of time.

ECSTRA-INSERM @ CLEF eHealth2016-task 2: ICD10 Code Extraction from Death Certificates

Mohamed Dermouche, Vincent Looten, Rémi Flicoteaux, Sylvie Chevret +2 more

2016· HAL (Le Centre pour la Communication Scientifique Directe)14

International audience

Coining goldMEDAL: A New Contribution to Data Lake Generic Metadata Modeling

Étienne Scholly, Pegdwendé N. Sawadogo, Pengfei Liu, Javier A. Espinosa-Oviedo +4 more

2021· arXiv (Cornell University)12doi:10.48550/arxiv.2103.13155

The rise of big data has revolutionized data exploitation practices and led to the emergence of new concepts. Among them, data lakes have emerged as large heterogeneous data repositories that can be analyzed by various methods. An efficient data lake requires a metadata system that addresses the many problems arising when dealing with big data. In consequence, the study of data lake metadata models is currently an active research topic and many proposals have been made in this regard. However, existing metadata models are either tailored for a specific use case or insufficiently generic to manage different types of data lakes, including our previous model MEDAL. In this paper, we generalize MEDAL's concepts in a new metadata model called goldMEDAL. Moreover, we compare goldMEDAL with the most recent state-of-the-art metadata models aiming at genericity and show that we can reproduce these metadata models with goldMEDAL's concepts. As a proof of concept, we also illustrate that goldMEDAL allows the design of various data lakes by presenting three different use cases.

An Efficient and Effective Generic Agglomerative Hierarchical Clustering Approach

Julien Ah-Pine

2018· HAL (Le Centre pour la Communication Scientifique Directe)12

International audience

The Cluster Description Problem -Complexity Results, Formulations and Approximations

Ian Davidson, Antoine Gourru, S. S. Ravi

2018· HAL (Le Centre pour la Communication Scientifique Directe)11

International audience

Investigating the Image of Entities in Social Media: Dataset Design and First Results

Julien Velcin, Caroline Brun, Jean-Yves Dormagen, Youngmin Kim +4 more

201410doi:10.63317/3okwfjyknzkm

International audience

A Bregman-proximal point algorithm for robust non-negative matrix factorization with possible missing values and outliers - application to gene expression analysis

Stéphane Chrétien, Christophe Guyeux, Bastien Conesa, Régis Delage-Mouroux +3 more

2016· BMC Bioinformatics8doi:10.1186/s12859-016-1120-8

BACKGROUND: Non-Negative Matrix factorization has become an essential tool for feature extraction in a wide spectrum of applications. In the present work, our objective is to extend the applicability of the method to the case of missing and/or corrupted data due to outliers. RESULTS: An essential property for missing data imputation and detection of outliers is that the uncorrupted data matrix is low rank, i.e. has only a small number of degrees of freedom. We devise a new version of the Bregman proximal idea which preserves nonnegativity and mix it with the Augmented Lagrangian approach for simultaneous reconstruction of the features of interest and detection of the outliers using a sparsity promoting ℓ 1 penality. CONCLUSIONS: An application to the analysis of gene expression data of patients with bladder cancer is finally proposed.

Sharing-based Privacy and Availability of Cloud Data Warehouses

Varunya Attasena, Nouria Harbi, Jérôme Darmont

2013· HAL (Le Centre pour la Communication Scientifique Directe)8

National audience

A note on the links between different qualitative integrals

Michal Holčapek, Rico Agnes

20206doi:10.1109/fuzz48607.2020.9177567

Qualitative or equivalently fuzzy integrals are used as qualitative aggregation functions or as L-fuzzy quantifiers. In both cases they are generalisations of Sugeno integrals. The definitions of these fuzzy integrals are quite similar and coincide in particular cases, but surprisingly there is no deeper analysis of their relationship. The paper attempts to fill this gap and provides unified definitions of fuzzy quantifiers on the basis of which various links between these fuzzy integrals are studied. In order to make these links more visible and to emphasise their logical structure, we present them using the graded square and modern square of opposition.

Selection of Proximity Measures for a Topological Correspondence Analysis

Rafik Abdesselam

20206doi:10.1002/9781119721871.ch6

International audience

ordinalClust: An R Package to Analyze Ordinal Data

Margot Selosse, Julien Jacques, Christophe Biernacki

2020· The R Journal6doi:10.32614/rj-2021-011

International audience

A Topological Discriminant Analysis

Rafik Abdesselam

20195doi:10.1002/9781119579465.ch12

International audience

L’analyse quantitative des médias sociaux, une alternative aux enquêtes déclaratives ?

Julien Boyadjian, Julien Velcin

2017· Questions de communication5doi:10.4000/questionsdecommunication.11078

L’article présente les premiers résultats d’une recherche interdisciplinaire dont l’objectif est d’identifier les logiques sociales de production des messages politiques sur Twitter. Cette recherche vise précisément à démontrer l’intérêt d’une approche interdisciplinaire de l’objet. Il s’agit, d’une part, d’élaborer des algorithmes permettant d’analyser de manière supervisée et non supervisée un très grand nombre de messages politiques afin d’en identifier la polarité et la cible et, d’autre part, de comparer ces informations à des données de sondages d’opinion afin de mieux saisir les relations (ou l’absence de relations) entre les dynamiques d’opinion en ligne et hors ligne.

Les entrepôts de données pour les nuls. . . ou pas !

Cécile Favre, Fadila Bentayeb, Omar Boussaïd, Jérôme Darmont +4 more

2013· HAL (Le Centre pour la Communication Scientifique Directe)5

National audience

Search all NobleBlocks papers mentioning “Entrepôts, Représentation et Ingénierie des Connaissances” →