Institute of Electrical and Electronics Engineers
UniversityPiscataway, United States
Research output, citation impact, and the most-cited recent papers from Institute of Electrical and Electronics Engineers (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Institute of Electrical and Electronics Engineers
After [15], [31], [19], [8], [25], [5], minimum cut/maximum flow algorithms on graphs emerged as an increasingly useful tool for exact or approximate energy minimization in low-level vision. The combinatorial optimization literature provides many min-cut/max-flow algorithms with different polynomial time complexity. Their practical efficiency, however, has to date been studied mainly outside the scope of computer vision. The goal of this paper is to provide an experimental comparison of the efficiency of min-cut/max flow algorithms for applications in vision. We compare the running times of several standard algorithms, as well as a new algorithm that we have recently developed. The algorithms we study include both Goldberg-Tarjan style "push-relabel" methods and algorithms based on Ford-Fulkerson style "augmenting paths." We benchmark these algorithms on a number of typical graphs in the contexts of image restoration, stereo, and segmentation. In many cases, our new algorithm works several times faster than any of the other methods, making near real-time performance possible. An implementation of our max-flow/min-cut algorithm is available upon request for research purposes.
A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects, is proposed. The feature histogram-based target representations are regularized by spatial masking with an isotropic kernel. The masking induces spatially-smooth similarity functions suitable for gradient-based optimization, hence, the target localization problem can be formulated using the basin of attraction of the local maxima. We employ a metric derived from the Bhattacharyya coefficient as similarity measure, and use the mean shift procedure to perform the optimization. In the presented tracking examples, the new method successfully coped with camera motion, partial occlusions, clutter, and target scale variations. Integration with motion filters and data association techniques is also discussed. We describe only a few of the potential applications: exploitation of background information, Kalman tracking using motion models, and face tracking.
We propose an appearance-based face recognition method called the Laplacianface approach. By using Locality Preserving Projections (LPP), the face images are mapped into a face subspace for analysis. Different from Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) which effectively see only the Euclidean structure of face space, LPP finds an embedding that preserves local information, and obtains a face subspace that best detects the essential face manifold structure. The Laplacianfaces are the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the face manifold. In this way, the unwanted variations resulting from changes in lighting, facial expression, and pose may be eliminated or reduced. Theoretical analysis shows that PCA, LDA, and LPP can be obtained from different graph models. We compare the proposed Laplacianface approach with Eigenface and Fisherface methods on three different face data sets. Experimental results suggest that the proposed Laplacianface approach provides a better representation and achieves lower error rates in face recognition.
Learning visual models of object categories notoriously requires hundreds or thousands of training examples. We show that it is possible to learn much information about a category from just one, or a handful, of images. The key insight is that, rather than learning from scratch, one can take advantage of knowledge coming from previously learned categories, no matter how different these categories might be. We explore a Bayesian implementation of this idea. Object categories are represented by probabilistic models. Prior knowledge is represented as a probability density function on the parameters of these models. The posterior model for an object category is obtained by updating the prior in the light of one or more observations. We test a simple implementation of our algorithm on a database of 101 diverse object categories. We compare category models learned by an implementation of our Bayesian approach to models learned from by Maximum Likelihood (ML) and Maximum A Posteriori (MAP) methods. We find that on a database of more than 100 categories, the Bayesian approach produces informative models when the number of training examples is too small for other methods to operate successfully.
Using a differential-geometric treatment of planar shapes, we present tools for: 1) hierarchical clustering of imaged objects according to the shapes of their boundaries, 2) learning of probability models for clusters of shapes, and 3) testing of newly observed shapes under competing probability models. Clustering at any level of hierarchy is performed using a mimimum variance type criterion criterion and a Markov process. Statistical means of clusters provide shapes to be clustered at the next higher level, thus building a hierarchy of shapes. Using finite-dimensional approximations of spaces tangent to the shape space at sample means, we (implicitly) impose probability models on the shape space, and results are illustrated via random sampling and classification (hypothesis testing). Together, hierarchical clustering and hypothesis testing provide an efficient framework for shape retrieval. Examples are presented using shapes and images from ETH, Surrey, and AMCOM databases.
Previous work has demonstrated that the image variation of many objects (human faces in particular) under variable lighting can be effectively modeled by low-dimensional linear spaces, even when there are multiple light sources and shadowing. Basis images spanning this space are usually obtained in one of three ways: A large set of images of the object under different lighting conditions is acquired, and principal component analysis (PCA) is used to estimate a subspace. Alternatively, synthetic images are rendered from a 3D model (perhaps reconstructed from images) under point sources and, again, PCA is used to estimate a subspace. Finally, images rendered from a 3D model under diffuse lighting based on spherical harmonics are directly used as basis images. In this paper, we show how to arrange physical lighting so that the acquired images of each object can be directly used as the basis vectors of a low-dimensional linear space and that this subspace is close to those acquired by the other methods. More specifically, there exist configurations of k point light source directions, with k typically ranging from 5 to 9, such that, by taking k images of an object under these single sources, the resulting subspace is an effective representation for recognition under a wide range of lighting conditions. Since the subspace is generated directly from real images, potentially complex and/or brittle intermediate steps such as 3D reconstruction can be completely avoided; nor is it necessary to acquire large numbers of training images or to physically construct complex diffuse (harmonic) light fields. We validate the use of subspaces constructed in this fashion within the context of face recognition.
We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation. We demonstrate the strength of the approach on a range of recognition tasks: From invariant single object recognition in clutter to multiclass categorization problems and complex scene understanding tasks that rely on the recognition of both shape-based as well as texture-based objects. Given the biological constraints that the system had to satisfy, the approach performs surprisingly well: It has the capability of learning from only a few training examples and competes with state-of-the-art systems. We also discuss the existence of a universal, redundant dictionary of features that could handle the recognition of most object categories. In addition to its relevance for computer vision, the success of this approach suggests a plausibility proof for a class of feedforward models of object recognition in cortex.
On August 14, 2003, a cascading outage of transmission and generation facilities in the North American Eastern Interconnection resulted in a blackout of most of New York state as well as parts of Pennsylvania, Ohio, Michigan, and Ontario, Canada. On September 23, 2003, nearly four million customers lost power in eastern Denmark and southern Sweden following a cascading outage that struck Scandinavia. Days later, a cascading outage between Italy and the rest of central Europe left most of Italy in darkness on September 28. These major blackouts are among the worst power system failures in the last few decades. The Power System Stability and Power System Stability Controls Subcommittees of the IEEE PES Power System Dynamic Performance Committee sponsored an all day panel session with experts from around the world. The experts described their recent work on the investigation of grid blackouts. The session offered a unique forum for discussion of possible root causes and necessary steps to reduce the risk of blackouts. This white paper presents the major conclusions drawn from the presentations and ensuing discussions during the all day session, focusing on the root causes of grid blackouts. This paper presents general conclusions drawn by this Committee together with recommendations based on lessons learned.
We explore the idea of evidence accumulation (EAC) for combining the results of multiple clusterings. First, a clustering ensemble--a set of object partitions, is produced. Given a data set (n objects or patterns in d dimensions), different ways of producing data partitions are: 1) applying different clustering algorithms and 2) applying the same clustering algorithm with different values of parameters or initializations. Further, combinations of different data representations (feature spaces) and clustering algorithms can also provide a multitude of significantly different data partitionings. We propose a simple framework for extracting a consistent clustering, given the various partitions in a clustering ensemble. According to the EAC concept, each partition is viewed as an independent evidence of data organization, individual data partitions being combined, based on a voting mechanism, to generate a new n x n, similarity matrix between the n patterns. The final data partition of the n patterns is obtained by applying a hierarchical agglomerative clustering algorithm on this matrix. We have developed a theoretical framework for the analysis of the proposed clustering combination strategy and its evaluation, based on the concept of mutual information between data partitions. Stability of the results is evaluated using bootstrapping techniques. A detailed discussion of an evidence accumulation-based clustering algorithm, using a split and merge strategy based on the K-means clustering algorithm, is presented. Experimental results of the proposed method on several synthetic and real data sets are compared with other combination strategies, and with individual clustering results produced by well-known clustering algorithms.
Developing on-board automotive driver assistance systems aiming to alert drivers about driving environments, and possible collision with other vehicles has attracted a lot of attention lately. In these systems, robust and reliable vehicle detection is a critical step. This paper presents a review of recent vision-based on-road vehicle detection systems. Our focus is on systems where the camera is mounted on the vehicle rather than being fixed such as in traffic/driveway monitoring systems. First, we discuss the problem of on-road vehicle detection using optical sensors followed by a brief review of intelligent vehicle research worldwide. Then, we discuss active and passive sensors to set the stage for vision-based vehicle detection. Methods aiming to quickly hypothesize the location of vehicles in an image as well as to verify the hypothesized locations are reviewed next. Integrating detection with tracking is also reviewed to illustrate the benefits of exploiting temporal continuity for vehicle detection. Finally, we present a critical overview of the methods discussed, we assess their potential for future deployment, and we present directions for future research.
This paper introduces a texture representation suitable for recognizing images of textured surfaces under a wide range of transformations, including viewpoint changes and nonrigid deformations. At the feature extraction stage, a sparse set of affine Harris and Laplacian regions is found in the image. Each of these regions can be thought of as a texture element having a characteristic elliptic shape and a distinctive appearance pattern. This pattern is captured in an affine-invariant fashion via a process of shape normalization followed by the computation of two novel descriptors, the spin image and the RIFT descriptor. When affine invariance is not required, the original elliptical shape servee as an additional discriminative feature for texture recognition. The proposed approach is evaluated in retrieval and classification tasks using the entire Brodatz database and a publicly available collection of 1,000 photographs of textured surfaces taken from different viewpoints.
While recognition of most facial variations, such as identity, expression and gender, has been extensively studied, automatic age estimation has rarely been explored. In contrast to other facial variations, aging variation presents several unique characteristics which make age estimation a challenging task. This paper proposes an automatic age estimation method named AGES (AGing pattErn Subspace). The basic idea is to model the aging pattern, which is defined as the sequence of a particular individual' s face images sorted in time order, by constructing a representative subspace. The proper aging pattern for a previously unseen face image is determined by the projection in the subspace that can reconstruct the face image with minimum reconstruction error, while the position of the face image in that aging pattern will then indicate its age. In the experiments, AGES and its variants are compared with the limited existing age estimation methods (WAS and AAS) and some well-established classification methods (kNN, BP, C4.5, and SVM). Moreover, a comparison with human perception ability on age is conducted. It is interesting to note that the performance of AGES is not only significantly better than that of all the other algorithms, but also comparable to that of the human observers.
This paper proposes a novel hybrid genetic algorithm for feature selection. Local search operations are devised and embedded in hybrid GAs to fine-tune the search. The operations are parameterized in terms of their fine-tuning power, and their effectiveness and timing requirements are analyzed and compared. The hybridization technique produces two desirable effects: a significant improvement in the final performance and the acquisition of subset-size control. The hybrid GAs showed better convergence properties compared to the classical GAs. A method of performing rigorous timing analysis was developed, in order to compare the timing requirement of the conventional and the proposed algorithms. Experiments performed with various standard data sets revealed that the proposed hybrid GA is superior to both a simple GA and sequential search algorithms.
In response to the 2013 Update of the European Strategy for Particle Physics, the Future Circular Collider (FCC) study was launched, as an international collaboration hosted by CERN. This study covers a highest-luminosity high-energy lepton collider (FCC-ee) and an energy-frontier hadron collider (FCC-hh), which could, successively, be installed in the same 100 km tunnel. The scientific capabilities of the integrated FCC programme would serve the worldwide community throughout the 21st century. The FCC study also investigates an LHC energy upgrade, using FCC-hh technology. This document constitutes the second volume of the FCC Conceptual Design Report, devoted to the electron-positron collider FCC-ee. After summarizing the physics discovery opportunities, it presents the accelerator design, performance reach, a staged operation scenario, the underlying technologies, civil engineering, technical infrastructure, and an implementation plan. FCC-ee can be built with today's technology. Most of the FCC-ee infrastructure could be reused for FCC-hh. Combining concepts from past and present lepton colliders and adding a few novel elements, the FCC-ee design promises outstandingly high luminosity. This will make the FCC-ee a unique precision instrument to study the heaviest known particles (Z, W and H bosons and the top quark), offering great direct and indirect sensitivity to new physics.
Fish-eye lenses are convenient in such applications where a very wide angle of view is needed, but their use for measurement purposes has been limited by the lack of an accurate, generic, and easy-to-use calibration procedure. We hence propose a generic camera model, which is suitable for fish-eye lens cameras as well as for conventional and wide-angle lens cameras, and a calibration method for estimating the parameters of the model. The achieved level of calibration accuracy is comparable to the previously reported state-of-the-art.
The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such data, the ability to process the data in a single pass, or a small number of passes, while using little memory, is crucial. We describe such a streaming algorithm that effectively clusters large data streams. We also provide empirical evidence of the algorithm's performance on synthetic and real data streams.
Relevance feedback schemes based on support vector machines (SVM) have been widely used in content-based image retrieval (CBIR). However, the performance of SVM-based relevance feedback is often poor when the number of labeled positive feedback samples is small. This is mainly due to three reasons: 1) an SVM classifier is unstable on a small-sized training set, 2) SVM's optimal hyperplane may be biased when the positive feedback samples are much less than the negative feedback samples, and 3) overfitting happens because the number of feature dimensions is much higher than the size of the training set. In this paper, we develop a mechanism to overcome these problems. To address the first two problems, we propose an asymmetric bagging-based SVM (AB-SVM). For the third problem, we combine the random subspace method and SVM for relevance feedback, which is named random subspace SVM (RS-SVM). Finally, by integrating AB-SVM and RS-SVM, an asymmetric bagging and random subspace SVM (ABRS-SVM) is built to solve these three problems and further improve the relevance feedback performance.
In this paper, we propose a novel forwarding technique based on geographical location of the nodes involved and random selection of the relaying node via contention among receivers. We focus on the multihop performance of such a solution, in terms of the average number of hops to reach a destination as a function of the distance and of the average number of available neighbors. An idealized scheme (in which the best relay node is always chosen) is discussed and its performance is evaluated by means of both simulation and analytical techniques. A practical scheme to select one of the best relays is shown to achieve performance very close to that of the ideal case. Some discussion about design issues for practical implementation is also given.
Biometrics-based authentication systems offer obvious usability advantages over traditional password and token-based authentication schemes. However, biometrics raises several privacy concerns. A biometric is permanently associated with a user and cannot be changed. Hence, if a biometric identifier is compromised, it is lost forever and possibly for every application where the biometric is used. Moreover, if the same biometric is used in multiple applications, a user can potentially be tracked from one application to the next by cross-matching biometric databases. In this paper, we demonstrate several methods to generate multiple cancelable identifiers from fingerprint images to overcome these problems. In essence, a user can be given as many biometric identifiers as needed by issuing a new transformation "key." The identifiers can be cancelled and replaced when compromised. We empirically compare the performance of several algorithms such as Cartesian, polar, and surface folding transformations of the minutiae positions. It is demonstrated through multiple experiments that we can achieve revocability and prevent cross-matching of biometric databases. It is also shown that the transforms are noninvertible by demonstrating that it is computationally as hard to recover the original biometric identifier from a transformed version as by randomly guessing. Based on these empirical results and a theoretical analysis we conclude that feature-level cancelable biometric construction is practicable in large biometric deployments.
A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuning.