NobleBlocks

Mitsubishi Electric (United States)

companyCypress, California, United States

Research output, citation impact, and the most-cited recent papers from Mitsubishi Electric (United States) (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works
7.7K
Citations
388.1K
h-index
244
i10-index
4.7K
Also known as
Mitsubishi Electric (United States)Mitsubishi Electric US Holdings, Inc

Top-cited papers from Mitsubishi Electric (United States)

Rapid object detection using a boosted cascade of simple features
Paul Viola, Michael Jones
200518.2Kdoi:10.1109/cvpr.2001.990517

This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "integral image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.

A simple Cooperative diversity method based on network path selection
A. Bletsas, A. Khisti, D.P. Reed, A. Lippman
2006· IEEE Journal on Selected Areas in Communications2.5Kdoi:10.1109/jsac.2005.862417

Cooperative diversity has been recently proposed as a way to form virtual antenna arrays that provide dramatic gains in slow fading wireless environments. However, most of the proposed solutions require distributed space-time coding algorithms, the careful design of which is left for future investigation if there is more than one cooperative relay. We propose a novel scheme that alleviates these problems and provides diversity gains on the order of the number of relays in the network. Our scheme first selects the best relay from a set of M available relays and then uses this "best" relay for cooperation between the source and the destination. We develop and analyze a distributed method to select the best relay that requires no topology information and is based on local measurements of the instantaneous channel conditions. This method also requires no explicit communication among the relays. The success (or failure) to select the best available path depends on the statistics of the wireless channel, and a methodology to evaluate performance for any kind of wireless channel statistics, is provided. Information theoretic analysis of outage probability shows that our scheme achieves the same diversity-multiplexing tradeoff as achieved by more complex protocols, where coordination and distributed space-time coding for M relay nodes is required, such as those proposed by Laneman and Wornell (2003). The simplicity of the technique allows for immediate implementation in existing radio hardware and its adoption could provide for improved flexibility, reliability, and efficiency in future 4G wireless systems.

Example-based super-resolution
William T. Freeman, Thouis R. Jones, Egon Pasztor
2002· IEEE Computer Graphics and Applications2.5Kdoi:10.1109/38.988747

We call methods for achieving high-resolution enlargements of pixel-based images super-resolution algorithms. Many applications in graphics or image processing could benefit from such resolution independence, including image-based rendering (IBR), texture mapping, enlarging consumer photographs, and converting NTSC video content to high-definition television. We built on another training-based super-resolution algorithm and developed a faster and simpler algorithm for one-pass super-resolution. Our algorithm requires only a nearest-neighbor search in the training set for a vector derived from each patch of local image data. This one-pass super-resolution algorithm is a step toward achieving resolution independence in image-based representations. We don't expect perfect resolution independence-even the polygon representation doesn't have that-but increasing the resolution independence of pixel-based representations is an important task for IBR.

Image quilting for texture synthesis and transfer
Alexei A. Efros, William T. Freeman
20012.4Kdoi:10.1145/383259.383296

We present a simple image-based method of generating novel visual appearance in which a new image is synthesized by stitching together small patches of existing images. We call this process image quilting. First, we use quilting as a fast and very simple texture synthesis algorithm which produces surprisingly good results for a wide range of textures. Second, we extend the algorithm to perform texture transfer — rendering an object with a texture taken from a different object. More generally, we demonstrate how an image can be re-rendered in the style of a different image. The method works directly on the images and does not require 3D information.

Localization via ultra-wideband radios: a look at positioning aspects for future sensor networks
Sinan Gezici, Zhi Tian, Georgios B. Giannakis, Hiroshi Kobayashi +3 more
2005· IEEE Signal Processing Magazine2.1Kdoi:10.1109/msp.2005.1458289

UWB technology provides an excellent means for wireless positioning due to its high resolution capability in the time domain. Its ability to resolve multipath components makes it possible to obtain accurate location estimates without the need for complex estimation algorithms. In this article, theoretical limits for TOA estimation and TOA-based location estimation for UWB systems have been considered. Due to the complexity of the optimal schemes, suboptimal but practical alternatives have been emphasized. Performance limits for hybrid TOA/SS and TDOA/SS schemes have also been considered. Although the fundamental mechanisms for localization, including AOA-, TOA-, TDOA-, and SS-based methods, apply to all radio air interface, some positioning techniques are favored by UWB-based systems using ultrawide bandwidths.

Constructing Free-Energy Approximations and Generalized Belief Propagation Algorithms
Jonathan S. Yedidia, William T. Freeman, Yaakov Weiss
2005· IEEE Transactions on Information Theory1.7Kdoi:10.1109/tit.2005.850085

Important inference problems in statistical physics, computer vision, error-correcting coding theory, and artificial intelligence can all be reformulated as the computation of marginal probabilities on factor graphs. The belief propagation (BP) algorithm is an efficient way to solve these problems that is exact when the factor graph is a tree, but only approximate when the factor graph has cycles. We show that BP fixed points correspond to the stationary points of the Bethe approximation of the free energy for a factor graph. We explain how to obtain region-based free energy approximations that improve the Bethe approximation, and corresponding generalized belief propagation (GBP) algorithms. We emphasize the conditions a free energy approximation must satisfy in order to be a "valid" or "maxent-normal" approximation. We describe the relationship between four different methods that can be used to generate valid approximations: the "Bethe method", the "junction graph method", the "cluster variation method", and the "region graph method". Finally, we explain how to tell whether a region-based approximation, and its corresponding GBP algorithm, is likely to be accurate, and describe empirical results showing that GBP can significantly outperform BP.

Deep clustering: Discriminative embeddings for segmentation and separation
John R. Hershey, Zhuo Chen, Jonathan Le Roux, Shinji Watanabe
20161.4Kdoi:10.1109/icassp.2016.7471631

We address the problem of "cocktail-party" source separation in a deep learning framework called deep clustering. Previous deep network approaches to separation have shown promising performance in scenarios with a fixed number of sources, each belonging to a distinct signal class, such as speech and noise. However, for arbitrary source classes and number, "class-based" methods are not suitable. Instead, we train a deep network to assign contrastive embedding vectors to each time-frequency region of the spectrogram in order to implicitly predict the segmentation labels of the target spectrogram from the input mixtures. This yields a deep network-based analogue to spectral clustering, in that the embeddings form a low-rank pair-wise affinity matrix that approximates the ideal affinity matrix, while enabling much faster performance. At test time, the clustering step "decodes" the segmentation implicit in the embeddings by optimizing K-means with respect to the unknown assignments. Preliminary experiments on single-channel mixtures from multiple speakers show that a speaker-independent model trained on two-speaker mixtures can improve signal quality for mixtures of held-out speakers by an average of 6dB. More dramatically, the same model does surprisingly well with three-speaker mixtures.

Fast Human Detection Using a Cascade of Histograms of Oriented Gradients
Qiang Zhu, Mei-Chen Yeh, Kwang‐Ting Cheng, Shai Avidan
20061.4Kdoi:10.1109/cvpr.2006.119

We integrate the cascade-of-rejectors approach with the Histograms of Oriented Gradients (HoG) features to achieve a fast and accurate human detection system. The features used in our system are HoGs of variable-size blocks that capture salient features of humans automatically. Using AdaBoost for feature selection, we identify the appropriate set of blocks, from a large set of possible blocks. In our system, we use the integral image representation and a rejection cascade which significantly speed up the computation. For a 320 × 280 image, the system can process 5 to 30 frames per second depending on the density in which we scan the image, while maintaining an accuracy level similar to existing methods.

An improved deep learning architecture for person re-identification
Ejaz Ahmed, Michael Jones, Tim K. Marks
20151.3Kdoi:10.1109/cvpr.2015.7299016

In this work, we propose a method for simultaneously learning features and a corresponding similarity metric for person re-identification. We present a deep convolutional architecture with layers specially designed to address the problem of re-identification. Given a pair of images as input, our network outputs a similarity value indicating whether the two input images depict the same person. Novel elements of our architecture include a layer that computes cross-input neighborhood differences, which capture local relationships between the two input images based on mid-level features from each input image. A high-level summary of the outputs of this layer is computed by a layer of patch summary features, which are then spatially integrated in subsequent layers. Our method significantly outperforms the state of the art on both a large data set (CUHK03) and a medium-sized data set (CUHK01), and is resistant to over-fitting. We also demonstrate that by initially training on an unrelated large data set before fine-tuning on a small target data set, our network can achieve results comparable to the state of the art even on a small data set (VIPeR).

ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi +4 more
20181.3Kdoi:10.21437/interspeech.2018-1456

This paper introduces a new open source platform for end-to-end speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and PyTorch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. This paper explains a major architecture of this software platform, several important functionalities, which differentiate ESPnet from other open source ASR toolkits, and experimental results with major ASR benchmarks.

DiamondTouch
Paul Dietz, Darren Leigh
20011.1Kdoi:10.1145/502348.502389

A technique for creating a touch-sensitive input device is proposed which allows multiple, simultaneous users to interact in an intuitive fashion. Touch location information is determined independently for each user, allowing each touch on a common surface to be associated with a particular user. The surface generates location dependent, modulated electric fields which are capacitively coupled through the users to receivers installed in the work environment. We describe the design of these systems and their applications. Finally, we present results we have obtained with a small prototype device.

Ensemble Tracking
Shai Avidan
2007· IEEE Transactions on Pattern Analysis and Machine Intelligence1.1Kdoi:10.1109/tpami.2007.35

We consider tracking as a binary classification problem, where an ensemble of weak classifiers is trained online to distinguish between the object and the background. The ensemble of weak classifiers is combined into a strong classifier using AdaBoost. The strong classifier is then used to label pixels in the next frame as either belonging to the object or the background, giving a confidence map. The peak of the map and, hence, the new position of the object, is found using mean shift. Temporal coherence is maintained by updating the ensemble with new weak classifiers that are trained online during tracking. We show a realization of this method and demonstrate it on several video sequences.

The steerable pyramid: a flexible architecture for multi-scale derivative computation
Eero P. Simoncelli, William T. Freeman
2002· Proceedings - International Conference on Image Processing1.1Kdoi:10.1109/icip.1995.537667

We describe an architecture for efficient and accurate linear decomposition of an image into scale and orientation subbands. The basis functions of this decomposition are directional derivative operators of any desired order. We describe the construction and implementation of the transform.

Entropy rate superpixel segmentation
Ming-Yu Liu, Oncel Tuzel, Srikumar Ramalingam, Rama Chellappa
20111.0Kdoi:10.1109/cvpr.2011.5995323

We propose a new objective function for superpixel segmentation. This objective function consists of two components: entropy rate of a random walk on a graph and a balancing term. The entropy rate favors formation of compact and homogeneous clusters, while the balancing function encourages clusters with similar sizes. We present a novel graph construction for images and show that this construction induces a matroid - a combinatorial structure that generalizes the concept of linear independence in vector spaces. The segmentation is then given by the graph topology that maximizes the objective function under the matroid constraint. By exploiting submodular and mono-tonic properties of the objective function, we develop an efficient greedy algorithm. Furthermore, we prove an approximation bound of ½ for the optimality of the solution. Extensive experiments on the Berkeley segmentation benchmark show that the proposed algorithm outperforms the state of the art in all the standard evaluation metrics.

Detecting pedestrians using patterns of motion and appearance
Viola, Jones, Snow
2003989doi:10.1109/iccv.2003.1238422

This paper describes a pedestrian detection system that integrates image intensity information with motion information. We use a detection style algorithm that scans a detector over two consecutive frames of a video sequence. The detector is trained (using AdaBoost) to take advantage of both motion and appearance information to detect a walking person. Past approaches have built detectors based on appearance information, but ours is the first to combine both sources of information in a single detector. The implementation described runs at about 4 frames/second, detects pedestrians at very small scales (as small as 20/spl times/15 pixels), and has a very low false positive rate. Our approach builds on the detection work of Viola and Jones. Novel contributions of this paper include: i) development of a representation of image motion which is extremely efficient, and ii) implementation of a state of the art pedestrian detection system which operates on low resolution images under difficult conditions (such as rain and snow).

Pedestrian Detection via Classification on Riemannian Manifolds
Oncel Tuzel, Fatih Porikli, Peter Meer
2008· IEEE Transactions on Pattern Analysis and Machine Intelligence968doi:10.1109/tpami.2008.75

We present a new algorithm to detect pedestrian in still images utilizing covariance matrices as object descriptors. Since the descriptors do not form a vector space, well known machine learning techniques are not well suited to learn the classifiers. The space of d-dimensional nonsingular covariance matrices can be represented as a connected Riemannian manifold. The main contribution of the paper is a novel approach for classifying points lying on a connected Riemannian manifold using the geometry of the space. The algorithm is tested on INRIA and DaimlerChrysler pedestrian datasets where superior detection rates are observed over the previous approaches.

Robust real-time face detection
Paul Viola, Michael Jones
2005899doi:10.1109/iccv.2001.937709

We have constructed a frontal face detection system which achieves detection and false positive rates which are equivalent to the best published results [7, 5, 6, 4, 1]. This face detection system is most clearly distinguished from previous approaches in its ability to detect faces extremely rapidly. Operating on 384 by 288 pixel images, faces are detected at 15 frames per second on a conventional 700 MHz Intel Pentium III. In other face detection systems, auxiliary information, such as image differences in video sequences, or pixel color in color images, have been used to achieve high frame rates. Our system achieves high frame rates working only with the information present in a single grey scale image. These alternative sources of information can also be integrated with our system to achieve even higher frame rates.

Separating Style and Content with Bilinear Models
Joshua B. Tenenbaum, William T. Freeman
2000· Neural Computation876doi:10.1162/089976600300015349

Perceptual systems routinely separate "content" from "style," classifying familiar words spoken in an unfamiliar accent, identifying a font or handwriting style across letters, or recognizing a familiar face or object seen under unfamiliar viewing conditions. Yet a general and tractable computational model of this ability to untangle the underlying factors of perceptual observations remains elusive (Hofstadter, 1985). Existing factor models (Mardia, Kent, & Bibby, 1979; Hinton & Zemel, 1994; Ghahramani, 1995; Bell & Sejnowski, 1995; Hinton, Dayan, Frey, & Neal, 1995; Dayan, Hinton, Neal, & Zemel, 1995; Hinton & Ghahramani, 1997) are either insufficiently rich to capture the complex interactions of perceptually meaningful factors such as phoneme and speaker accent or letter and font, or do not allow efficient learning algorithms. We present a general framework for learning to solve two-factor tasks using bilinear models, which provide sufficiently expressive representations of factor interactions but can nonetheless be fit to data using efficient algorithms based on the singular value decomposition and expectation-maximization. We report promising results on three different tasks in three different perceptual domains: spoken vowel classification with a benchmark multi-speaker database, extrapolation of fonts to unseen letters, and translation of faces to novel illuminants.

Non-negative matrix factorization for polyphonic music transcription
Paris Smaragdis, Judith C. Brown
2004867doi:10.1109/aspaa.2003.1285860

We present a methodology for analyzing polyphonic musical passages comprised of notes that exhibit a harmonically fixed spectral profile (such as piano notes). Taking advantage of this unique note structure, we can model the audio content of the musical passage by a linear basis transform and use non-negative matrix decomposition methods to estimate the spectral profile and the temporal information of every note. This approach results in a very simple and compact system that is not knowledge-based, but rather learns notes by observation.

MIMO systems with antenna selection
Andreas F. Molisch, Moe Z. Win
2004· IEEE Microwave Magazine857doi:10.1109/mmw.2004.1284943

Multiple-input-multiple-output (MIMO) wireless systems are those that have multiple antenna elements at both the transmitter and receiver. They were first investigated by computer simulations in the 1980s. Since that time, interest in MIMO systems has exploded. They are now being used for third-generation cellular systems (W-CDMA) and are discussed for future high-performance modes of the highly successful IEEE 802.11 standard for wireless local area networks. MIMO-related topics also occupy a considerable part of today's academic communications research. The multiple antennas in MIMO systems can be exploited in two different ways. One is the creation of a highly effective antenna diversity system; the other is the use of the multiple antennas for the transmission of several parallel data streams to increase the capacity of the system. This article presented an overview of MIMO systems with antenna selection. The transmitter, the receiver, or both use only the signals from a subset of the available antennas. This allows considerable reductions in the hardware expense.