Idiap Research Institute

facilityMartigny-Combe, Switzerland

Research output, citation impact, and the most-cited recent papers from Idiap Research Institute (Switzerland). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works

3.9K

Citations

169.7K

h-index

170

i10-index

2.9K

Also known as

Idiap Research InstituteInstitut d'intelligence artificielle perceptive

Top-cited papers from Idiap Research Institute

Multiple Object Tracking Using K-Shortest Paths Optimization

Jérôme Berclaz, François Fleuret, Engin Türetken, Pascal Fua

2011· IEEE Transactions on Pattern Analysis and Machine Intelligence1.0Kdoi:10.1109/tpami.2011.21

Multi-object tracking can be achieved by detecting objects in individual frames and then linking detections across frames. Such an approach can be made very robust to the occasional detection failure: If an object is not detected in a frame but is in previous and following ones, a correct trajectory will nevertheless be produced. By contrast, a false-positive detection in a few frames will be ignored. However, when dealing with a multiple target problem, the linking step results in a difficult optimization problem in the space of all possible families of trajectories. This is usually dealt with by sampling or greedy search based on variants of Dynamic Programming which can easily miss the global optimum. In this paper, we show that reformulating that step as a constrained flow optimization results in a convex problem. We take advantage of its particular structure to solve it using the k-shortest paths algorithm, which is very fast. This new approach is far simpler formally and algorithmically than existing techniques and lets us demonstrate excellent performance in two very different contexts.

Electromyography data for non-invasive naturally-controlled robotic hand prostheses

Manfredo Atzori, Arjan Gijsberts, Claudio Castellini, Barbara Caputo +4 more

2014· Scientific Data929doi:10.1038/sdata.2014.53

Recent advances in rehabilitation robotics suggest that it may be possible for hand-amputated subjects to recover at least a significant part of the lost hand functionality. The control of robotic prosthetic hands using non-invasive techniques is still a challenge in real life: myoelectric prostheses give limited control capabilities, the control is often unnatural and must be learned through long training times. Meanwhile, scientific literature results are promising but they are still far from fulfilling real-life needs. This work aims to close this gap by allowing worldwide research groups to develop and test movement recognition and force control algorithms on a benchmark scientific database. The database is targeted at studying the relationship between surface electromyography, hand kinematics and hand forces, with the final goal of developing non-invasive, naturally controlled, robotic hand prostheses. The validation section verifies that the data are similar to data acquired in real-life conditions, and that recognition of different hand tasks by applying state-of-the-art signal features and machine-learning algorithms is possible.

The BCI competition III: validating alternative approaches to actual BCI problems

Benjamin Blankertz, K. Müller, Dean J. Krusienski, Gerwin Schalk +4 more

2006· IEEE Transactions on Neural Systems and Rehabilitation Engineering917doi:10.1109/tnsre.2006.875642

A brain-computer interface (BCI) is a system that allows its users to control external devices with brain activity. Although the proof-of-concept was given decades ago, the reliable translation of user intent into device control commands is still a major challenge. Success requires the effective interaction of two adaptive controllers: the user's brain, which produces brain activity that encodes intent, and the BCI system, which translates that activity into device control commands. In order to facilitate this interaction, many laboratories are exploring a variety of signal analysis techniques to improve the adaptation of the BCI system to the user. In the literature, many machine learning and pattern classification algorithms have been reported to give impressive results when applied to BCI data in offline analyses. However, it is more difficult to evaluate their relative value for actual online use. BCI data competitions have been organized to provide objective formal evaluations of alternative methods. Prompted by the great interest in the first two BCI Competitions, we organized the third BCI Competition to address several of the most difficult and important analysis problems in BCI research. The paper describes the data sets that were provided to the competitors and gives an overview of the results.

Noninvasive Brain-Actuated Control of a Mobile Robot by Human EEG

José del R. Millán, F. Renkens, J. Mouriño, Wulfram Gerstner

2004· IEEE Transactions on Biomedical Engineering739doi:10.1109/tbme.2004.827086

Brain activity recorded noninvasively is sufficient to control a mobile robot if advanced robotics is used in combination with asynchronous electroencephalogram (EEG) analysis and machine learning techniques. Until now brain-actuated control has mainly relied on implanted electrodes, since EEG-based systems have been considered too slow for controlling rapid and complex sequences of movements. We show that two human subjects successfully moved a robot between several rooms by mental control only, using an EEG-based brain-machine interface that recognized three mental states. Mental control was comparable to manual control on the same task with a performance ratio of 0.74.

The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism

Björn W. Schuller, Stefan Steidl, Anton Batliner, Alessandro Vinciarelli +4 more

2013729doi:10.21437/interspeech.2013-56

The INTERSPEECH 2013 Computational Paralinguistics Chal- lenge provides for the first time a unified test-bed for Social Signals such as laughter in speech. It further introduces conflict in group discussions as a new task and deals with autism and its manifestations in speech. Finally, emotion is revisited as task, albeit with a broader range of overall twelve enacted emotional states. In this paper, we describe these four Sub-Challenges, their conditions, baselines, and a new feature set by the openSMILE toolkit, provided to the participants

From image-level to pixel-level labeling with Convolutional Networks

Pedro O. Pinheiro, Ronan Collobert

2015720doi:10.1109/cvpr.2015.7298780

We are interested in inferring object segmentation by leveraging only object class information, and by considering only minimal priors on the object segmentation task. This problem could be viewed as a kind of weakly supervised segmentation task, and naturally fits the Multiple Instance Learning (MIL) framework: every training image is known to have (or not) at least one pixel corresponding to the image class label, and the segmentation task can be rewritten as inferring the pixels belonging to the class of the object (given one image, and its object class). We propose a Convolutional Neural Network-based model, which is constrained during training to put more weight on pixels which are important for classifying the image. We show that at test time, the model has learned to discriminate the right pixels well enough, such that it performs very well on an existing segmentation benchmark, by adding only few smoothing priors. Our system is trained using a subset of the Imagenet dataset and the segmentation experiments are performed on the challenging Pascal VOC dataset (with no fine-tuning of the model on Pascal VOC). Our model beats the state of the art results in weakly supervised object segmentation task by a large margin. We also compare the performance of our model with state of the art fully-supervised segmentation approaches.

On the effectiveness of local binary patterns in face anti-spoofing

Ivana Chingovska, André Anjos, Sébastien Marcel

2012· Infoscience (Ecole Polytechnique Fédérale de Lausanne)697

Abstract—Spoofing attacks are one of the security traits that biometric recognition systems are proven to be vulnerable to. When spoofed, a biometric recognition system is bypassed by presenting a copy of the biometric evidence of a valid user. Among all biometric modalities, spoofing a face recognition system is particularly easy to perform: all that is needed is a simple photograph of the user. In this paper, we address the problem of detecting face spoofing attacks. In particular, we inspect the potential of texture features based on Local Binary Patterns (LBP) and their variations on three types of attacks: printed photographs, and photos and videos displayed on electronic screens of different sizes. For this purpose, we introduce REPLAY-ATTACK, a novel publicly available face spoofing database which contains all the mentioned types of attacks. We conclude that LBP, with ∼15 % Half Total Error Rate, show moderate discriminability when confronted with a wide set of attack types. I.

Recurrent Convolutional Neural Networks for Scene Labeling

Pedro O. Pinheiro, Ronan Collobert

2014· Infoscience (Ecole Polytechnique Fédérale de Lausanne)624

Abstract. Scene parsing is a technique that consist on giving a label to all pixels in an image according to the class they belong to. To ensure a good visual coherence and a high class accuracy, it is essential for a scene parser to capture image long range dependencies. In a feed-forward architecture, this can be simply achieved by considering a sufficiently large input context patch, around each pixel to be labeled. We propose an approach consisting of a recurrent convolutional neural network which allows us to consider a large input context, while limiting the capacity of the model. Contrary to most standard approaches, our method does not rely on any segmentation methods, nor any task-specific features. The system is trained in an end-to-end manner over raw pixels, and models complex spatial dependencies with low inference cost. As the context size increases with the built-in recurrence, the system identifies and corrects its own errors. Our approach yields state-of-the-art performance on both the Stanford Background Dataset and the SIFT Flow Dataset, while remaining very fast at test time.

Image Quality Assessment for Fake Biometric Detection: Application to Iris, Fingerprint, and Face Recognition

Javier Galbally, Sébastien Marcel, Julián Fiérrez

2014· IEEE Transactions on Image Processing607doi:10.1109/tip.2013.2292332

To ensure the actual presence of a real legitimate trait in contrast to a fake self-manufactured synthetic or reconstructed sample is a significant problem in biometric authentication, which requires the development of new and efficient protection measures. In this paper, we present a novel software-based fake detection method that can be used in multiple biometric systems to detect different types of fraudulent access attempts. The objective of the proposed system is to enhance the security of biometric recognition frameworks, by adding liveness assessment in a fast, user-friendly, and non-intrusive manner, through the use of image quality assessment. The proposed approach presents a very low degree of complexity, which makes it suitable for real-time applications, using 25 general image quality features extracted from one image (i.e., the same acquired for authentication purposes) to distinguish between legitimate and impostor samples. The experimental results, obtained on publicly available data sets of fingerprint, iris, and 2D face, show that the proposed method is highly competitive compared with other state-of-the-art approaches and that the analysis of the general image quality of real biometric samples reveals highly valuable information that may be very efficiently used to discriminate them from fake traits.

Audio-visual speech modeling for continuous speech recognition

Stéphane Dupont, Juergen Luettin

2000· IEEE Transactions on Multimedia601doi:10.1109/6046.865479

This paper describes a speech recognition system that uses both acoustic and visual speech information to improve recognition performance in noisy environments. The system consists of three components: a visual module; an acoustic module; and a sensor fusion module. The visual module locates and tracks the lip movements of a given speaker and extracts relevant speech features. This task is performed with an appearance-based lip model that is learned from example images. Visual speech features are represented by contour information of the lips and grey-level information of the mouth area. The acoustic module extracts noise-robust features from the audio signal. Finally the sensor fusion module is responsible for the joint temporal modeling of the acoustic and visual feature streams and is realized using multistream hidden Markov models (HMMs). The multistream method allows the definition of different temporal topologies and levels of stream integration and hence enables the modeling of temporal dependencies more accurately than traditional approaches. We present two different methods to learn the asynchrony between the two modalities and how to incorporate them in the multistream models. The superior performance for the proposed system is demonstrated on a large multispeaker database of continuously spoken digits. On a recognition task at 15 dB acoustic signal-to-noise ratio (SNR), acoustic perceptual linear prediction (PLP) features lead to 56% error rate, noise robust RASTA-PLP (relative spectra) acoustic features to 7.2% error rate and combined noise robust acoustic features and visual features to 2.5% error rate.

A Survey of Personality Computing

Alessandro Vinciarelli, Gelareh Mohammadi

2014· IEEE Transactions on Affective Computing531doi:10.1109/taffc.2014.2330816

Personality is a psychological construct aimed at explaining the wide variety of human behaviors in terms of a few, stable and measurable individual characteristics. In this respect, any technology involving understanding, prediction and synthesis of human behavior is likely to benefit from Personality Computing approaches, i.e. from technologies capable of dealing with human personality. This paper is a survey of such technologies and it aims at providing not only a solid knowledge base about the state-of-the-art, but also a conceptual model underlying the three main problems addressed in the literature, namely Automatic Personality Recognition (inference of the true personality of an individual from behavioral evidence), Automatic Personality Perception (inference of personality others attribute to an individual based on her observable behavior) and Automatic Personality Synthesis (generation of artificial personalities via embodied agents). Furthermore, the article highlights the issues still open in the field and identifies potential application areas.

Person Authentication Using Brainwaves (EEG) and Maximum A Posteriori Model Adaptation

Sébastien Marcel, José del R. Millán

2007· IEEE Transactions on Pattern Analysis and Machine Intelligence530doi:10.1109/tpami.2007.1012

In this paper, we investigate the use of brain activity for person authentication. It has been shown in previous studies that the brain-wave pattern of every individual is unique and that the electroencephalogram (EEG) can be used for biometric identification. EEG-based biometry is an emerging research topic and we believe that it may open new research directions and applications in the future. However, very little work has been done in this area and was focusing mainly on person identification but not on person authentication. Person authentication aims to accept or to reject a person claiming an identity, i.e., comparing a biometric data to one template, while the goal of person identification is to match the biometric data against all the records in a database. We propose the use of a statistical framework based on Gaussian Mixture Models and Maximum A Posteriori model adaptation, successfully applied to speaker and face authentication, which can deal with only one training session. We perform intensive experimental simulations using several strict train/test protocols to show the potential of our method. We also show that there are some mental tasks that are more appropriate for person authentication than others.

Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition

Di Wu, Lionel Pigou, Pieter-Jan Kindermans, Nam Le +3 more

2016· IEEE Transactions on Pattern Analysis and Machine Intelligence451doi:10.1109/tpami.2016.2537340

This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatio-temporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network (DBN) to handle skeletal dynamics, and a 3D Convolutional Neural Network (3DCNN) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data.

Biometric Antispoofing Methods: A Survey in Face Recognition

Javier Galbally, Sébastien Marcel, Julián Fiérrez

2014· IEEE Access426doi:10.1109/access.2014.2381273

In recent decades, we have witnessed the evolution of biometric technology from the first pioneering works in face and voice recognition to the current state of development wherein a wide spectrum of highly accurate systems may be found, ranging from largely deployed modalities, such as fingerprint, face, or iris, to more marginal ones, such as signature or hand. This path of technological evolution has naturally led to a critical issue that has only started to be addressed recently: the resistance of this rapidly emerging technology to external attacks and, in particular, to spoofing. Spoofing, referred to by the term presentation attack in current standards, is a purely biometric vulnerability that is not shared with other IT security solutions. It refers to the ability to fool a biometric system into recognizing an illegitimate user as a genuine one by means of presenting a synthetic forged version of the original biometric trait to the sensor. The entire biometric community, including researchers, developers, standardizing bodies, and vendors, has thrown itself into the challenging task of proposing and developing efficient protection methods against this threat. The goal of this paper is to provide a comprehensive overview on the work that has been carried out over the last decade in the emerging field of antispoofing, with special attention to the mature and largely deployed face modality. The work covers theories, methodologies, state-of-the-art techniques, and evaluation databases and also aims at providing an outlook into the future of this very active field of research.

Torchvision the machine-vision package of torch

Sébastien Marcel, Yann Rodriguez

2010417doi:10.1145/1873951.1874254

This paper presents Torchvision an open source machine vision package for Torch. Torch is a machine learning library providing a series of the state-of-the-art algorithms such as Neural Networks, Support Vector Machines, Gaussian Mixture Models, Hidden Markov Models and many others. Torchvision provides additional functionalities to manipulate and process images with standard image processing algorithms. Hence, the resulting images can be used directly with the Torch machine learning algorithms as Torchvision is fully integrated with Torch. Both Torch and Torchvision are written in C++ language and are publicly available under the Free-BSD License.

Aesthetics and Emotions in Images

Dhiraj Joshi, Ritendra Datta, Elena Fedorovskaya, Quang-Tuan Luong +3 more

2011· IEEE Signal Processing Magazine407doi:10.1109/msp.2011.941851

In this tutorial, we define and discuss key aspects of the problem of computational inference of aesthetics and emotion from images. We begin with a background discussion on philosophy, photography, paintings, visual arts, and psychology. This is followed by introduction of a set of key computational problems that the research community has been striving to solve and the computational framework required for solving them. We also describe data sets available for performing assessment and outline several real-world applications where research in this domain can be employed. A significant number of papers that have attempted to solve problems in aesthetics and emotion inference are surveyed in this tutorial. We also discuss future directions that researchers can pursue and make a strong case for seriously attempting to solve problems in this research domain.

10.1162/15324430152733142

Ronan Collobert, Samy Bengio

2000· Applied Physics Letters389doi:10.1162/15324430152733142

This DOI is not currently attached to any metadata records. DOIs can’t actually ever be deleted (they’re persistent), but sometimes our members create DOIs in error. We do have a process to approximate deletion which we follow only in rare cases where the DOI has been genuinely created in error, and most crucially, if the DOI has never been published anywhere online or in print and never otherwise distributed to or communicated with anyone (authors, readers, reviewers, etc.

Bridging the Gap between Social Animal and Unsocial Machine: A Survey of Social Signal Processing

Alessandro Vinciarelli, Maja Pantić, Dirk Heylen, Catherine Pélachaud +3 more

2011· IEEE Transactions on Affective Computing377doi:10.1109/t-affc.2011.27

Social Signal Processing is the research domain aimed at bridging the social intelligence gap between humans and machines. This paper is the first survey of the domain that jointly considers its three major aspects, namely, modeling, analysis, and synthesis of social behavior. Modeling investigates laws and principles underlying social interaction, analysis explores approaches for automatic understanding of social exchanges recorded with different sensors, and synthesis studies techniques for the generation of social behavior via various forms of embodiment. For each of the above aspects, the paper includes an extensive survey of the literature, points to the most important publicly available resources, and outlines the most fundamental challenges ahead.

Pierre W. Ferrez, José del R. Millán

2008· IEEE Transactions on Biomedical Engineering361doi:10.1109/tbme.2007.908083

Brain-computer interfaces (BCIs) are prone to errors in the recognition of subject's intent. An elegant approach to improve the accuracy of BCIs consists in a verification procedure directly based on the presence of error-related potentials (ErrP) in the electroencephalogram (EEG) recorded right after the occurrence of an error. Several studies show the presence of ErrP in typical choice reaction tasks. However, in the context of a BCI, the central question is: "Are ErrP also elicited when the error is made by the interface during the recognition of the subject's intent?"; We have thus explored whether ErrP also follow a feedback indicating incorrect responses of the simulated BCI interface. Five healthy volunteer subjects participated in a new human-robot interaction experiment, which seem to confirm the previously reported presence of a new kind of ErrP. However, in order to exploit these ErrP, we need to detect them in each single trial using a short window following the feedback associated to the response of the BCI. We have achieved an average recognition rate of correct and erroneous single trials of 83.5% and 79.2%, respectively, using a classifier built with data recorded up to three months earlier.

Automatic analysis of multimodal group actions in meetings

L. McCowan, Daniel Gática-Pérez, Samy Bengio, Guillaume Lathoud +2 more

2005· IEEE Transactions on Pattern Analysis and Machine Intelligence346doi:10.1109/tpami.2005.49

This paper investigates the recognition of group actions in meetings. A framework is employed in which group actions result from the interactions of the individual participants. The group actions are modeled using different HMM-based approaches, where the observations are provided by a set of audiovisual features monitoring the actions of individuals. Experiments demonstrate the importance of taking interactions into account in modeling the group actions. It is also shown that the visual modality contains useful information, even for predominantly audio-based events, motivating a multimodal approach to meeting analysis.

Search all NobleBlocks papers mentioning “Idiap Research Institute” →