Mitsubishi Electric (Japan)
companyTokyo, Tokyo, Japan
Research output, citation impact, and the most-cited recent papers from Mitsubishi Electric (Japan) (Japan). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Mitsubishi Electric (Japan)
We present a simple image-based method of generating novel visual appearance in which a new image is synthesized by stitching together small patches of existing images. We call this process image quilting. First, we use quilting as a fast and very simple texture synthesis algorithm which produces surprisingly good results for a wide range of textures. Second, we extend the algorithm to perform texture transfer — rendering an object with a texture taken from a different object. More generally, we demonstrate how an image can be re-rendered in the style of a different image. The method works directly on the images and does not require 3D information.
The Cambridge Research Laboratory was founded in 1987 to advance the state of the art in both core computing and human-computer interaction, and to use the knowledge so gained to support the Company’s corporate objectives. We believe this is best accomplished through interconnected pursuits in technology creation, advanced systems engineering, and business development. We are actively investigating scalable computing; mobile computing; vision-based human and scene sensing; speech interaction; computer-animated synthetic persona; intelligent information appliances; and the capture, coding, storage, indexing, retrieval, decoding, and rendering of multimedia data. We recognize and embrace a technology creation model which is characterized by three major phases: Freedom: The life blood of the Laboratory comes from the observations and imaginations of our research staff. It is here that challenging research problems are uncovered (through discussions with customers, through interactions with others in the Corporation, through other professional interactions, through reading, and the like) or that new ideas are born. For any such problem or idea, this phase culminates in the nucleation of a project team around a well articulated central research question and the outlining of a research plan. Focus: Once a team is formed, we aggressively pursue the creation of new technology based on
For real-world concept learning problems, feature selection is important to speed up learning and to improve concept quality. We review and analyze past approaches to feature selection and note their strengths and weaknesses. We then introduce and theoretically examine a new algorithm Rellef which selects relevant features using a statistical method. Relief does not depend on heuristics, is accurate even if features interact, and is noise-tolerant. It requires only linear time in the number of given features and the number of training instances, regardless of the target concept complexity. The algorithm also has certain limitations such as nonoptimal feature set size. Ways to overcome the limitations are suggested. We also report the test results of comparison between Relief and other feature selection algorithms. The empirical results support the theoretical analysis, suggesting a practical approach to feature selection for real-world problems.
Effective resizing of images should not only use geometric constraints, but consider the image content as well. We present a simple image operator called seam carving that supports content-aware image resizing for both reduction and expansion. A seam is an optimal 8-connected path of pixels on a single image from top to bottom, or left to right, where optimality is defined by an image energy function. By repeatedly carving out or inserting seams in one direction we can change the aspect ratio of an image. By applying these operators in both directions we can retarget the image to a new size. The selection and order of seams protect the content of the image, as defined by the energy function. Seam carving can also be used for image content enhancement and object removal. We support various visual saliency measures for defining the energy of an image, and can also include user input to guide the process. By storing the order of seams in an image we create multi-size images, that are able to continuously change in real time to fit a given size.
Recent deep networks that directly handle points in a point set, e.g., PointNet, have been state-of-the-art for supervised learning tasks on point clouds such as classification and segmentation. In this work, a novel end-to-end deep auto-encoder is proposed to address unsupervised learning challenges on point clouds. On the encoder side, a graph-based enhancement is enforced to promote local structures on top of PointNet. Then, a novel folding-based decoder deforms a canonical 2D grid onto the underlying 3D object surface of a point cloud, achieving low reconstruction errors even for objects with delicate structures. The proposed decoder only uses about 7% parameters of a decoder with fully-connected neural networks, yet leads to a more discriminative representation that achieves higher linear SVM classification accuracy than the benchmark. In addition, the proposed decoder structure is shown, in theory, to be a generic architecture that is able to reconstruct an arbitrary point cloud from a 2D grid. Our code is available at http://www.merl.com/research/license#FoldingNet.
The first solid-state field-effect transistor has been fabricated utilizing a film of an organic macromolecule, polythiophene, as a semiconductor. The device characteristics have been optimized by controlling the doping levels of the polymer. The device is a normally off type and the source (drain) current can be modulated by a factor of 102–103 by varying the gate voltage. The carrier mobility and the transconductance have also been determined to be ∼10−5 cm2/V s and 3 nS, respectively, by means of electrical measurements.
A secure communication network with quantum key distribution in a metropolitan area is reported. Six different QKD systems are integrated into a mesh-type network. GHz-clocked QKD links enable us to demonstrate the world-first secure TV conferencing over a distance of 45km. The network includes a commercial QKD product for long-term stable operation, and application interface to secure mobile phones. Detection of an eavesdropper, rerouting into a secure path, and key relay via trusted nodes are demonstrated in this network.
Effective resizing of images should not only use geometric constraints, but consider the image content as well. We present a simple image operator called seam carving that supports content-aware image resizing for both reduction and expansion. A seam is an optimal 8-connected path of pixels on a single image from top to bottom, or left to right, where optimality is defined by an image energy function. By repeatedly carving out or inserting seams in one direction we can change the aspect ratio of an image. By applying these operators in both directions we can retarget the image to a new size. The selection and order of seams protect the content of the image, as defined by the energy function. Seam carving can also be used for image content enhancement and object removal. We support various visual saliency measures for defining the energy of an image, and can also include user input to guide the process. By storing the order of seams in an image we create multi-size images, that are able to continuously change in real time to fit a given size.
We propose a new objective function for superpixel segmentation. This objective function consists of two components: entropy rate of a random walk on a graph and a balancing term. The entropy rate favors formation of compact and homogeneous clusters, while the balancing function encourages clusters with similar sizes. We present a novel graph construction for images and show that this construction induces a matroid - a combinatorial structure that generalizes the concept of linear independence in vector spaces. The segmentation is then given by the graph topology that maximizes the objective function under the matroid constraint. By exploiting submodular and mono-tonic properties of the objective function, we develop an efficient greedy algorithm. Furthermore, we prove an approximation bound of ½ for the optimality of the solution. Extensive experiments on the Berkeley segmentation benchmark show that the proposed algorithm outperforms the state of the art in all the standard evaluation metrics.
Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. One approach is the attention-based encoder-decoder framework that learns a mapping between variable-length input and output sequences in one step using a purely data-driven method. The attention model has often been shown to improve the performance over another end-to-end approach, the Connectionist Temporal Classification (CTC), mainly because it explicitly uses the history of the target character without any conditional independence assumptions. However, we observed that the performance of the attention has shown poor results in noisy condition and is hard to learn in the initial training stage with long input sequences. This is because the attention model is too flexible to predict proper alignments in such cases due to the lack of left-to-right constraints as used in CTC. This paper presents a novel method for end-to-end speech recognition to improve robustness and achieve fast convergence by using a joint CTC-attention model within the multi-task learning framework, thereby mitigating the alignment issue. An experiment on the WSJ and CHiME-4 tasks demonstrates its advantages over both the CTC and attention-based encoder-decoder baselines, showing 5.4-14.6% relative improvements in Character Error Rate (CER).
We approach the problem of stylistic motion synthesis by learning motion patterns from a highly varied set of motion capture sequences. Each sequence may have a distinct choreography, performed in a distinct sytle. Learning identifies common choreographic elements across sequences, the different styles in which each element is performed, and a small number of stylistic degrees of freedom which span the many variations in the dataset. The learned model can synthesize novel motion data in any interpolation or extrapolation of styles. For example, it can convert novice ballet motions into the more graceful modern dance of an expert. The model can also be driven by video, by scripts or even by noise to generate new choreography and synthesize virtual motion-capture in many styles.
Temporal action localization is an important yet challenging problem. Given a long, untrimmed video consisting of multiple action instances and complex background contents, we need not only to recognize their action categories, but also to localize the start time and end time of each instance. Many state-of-the-art systems use segment-level classifiers to select and rank proposal segments of pre-determined boundaries. However, a desirable model should move beyond segment-level and make dense predictions at a fine granularity in time to determine precise temporal boundaries. To this end, we design a novel Convolutional-De-Convolutional (CDC) network that places CDC filters on top of 3D ConvNets, which have been shown to be effective for abstracting action semantics but reduce the temporal length of the input data. The proposed CDC filter performs the required temporal upsampling and spatial downsampling operations simultaneously to predict actions at the frame-level granularity. It is unique in jointly modeling action semantics in space-time and fine-grained temporal dynamics. We train the CDC network in an end-to-end manner efficiently. Our model not only achieves superior performance in detecting actions in every frame, but also significantly boosts the precision of localizing temporal boundaries. Finally, the CDC network demonstrates a very high efficiency with the ability to process 500 frames per second on a single GPU server. Source code and trained models are available online at https://bitbucket.org/columbiadvmm/cdc.
Unlike on images, semantic learning on 3D point clouds using a deep network is challenging due to the naturally unordered data structure. Among existing works, PointNet has achieved promising results by directly learning on point sets. However, it does not take full advantage of a point's local neighborhood that contains fine-grained structural information which turns out to be helpful towards better semantic learning. In this regard, we present two new operations to improve PointNet with a more efficient exploitation of local structures. The first one focuses on local 3D geometric structures. In analogy to a convolution kernel for images, we define a point-set kernel as a set of learnable 3D points that jointly respond to a set of neighboring data points according to their geometric affinities measured by kernel correlation, adapted from a similar technique for point cloud registration. The second one exploits local high-dimensional feature structures by recursive feature aggregation on a nearest-neighbor-graph computed from 3D positions. Experiments show that our network can efficiently capture local information and robustly achieve better performances on major datasets. Our code is available at http://www.merl.com/research/license#KCNet.
The Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) was developed and provided to the National Aeronautics and Space Administration's EOS Aqua satellite by the National Space Development Agency of Japan, as one of the indispensable instruments for Aqua's mission. AMSR-E is a modified version of AMSR that was launched December 2002 aboard the Advanced Earth Observing Satellite-II (ADEOS-II). It is a six-frequency dual-polarized total-power passive microwave radiometer that observes water-related geophysical parameters supporting global change science and monitoring efforts. The hardware improvements over existing spaceborne microwave radiometers for Earth imaging include the largest main reflector of its kind and addition of 6.925-GHz channels. These improvements provide finer spatial resolution and the capability to retrieve sea surface temperature and soil moisture information on a global basis. This paper provides an overview of the instrument characteristics, mission objectives, and data products.
Face Transfer is a method for mapping videorecorded performances of one individual to facial animations of another. It extracts visemes (speech-related mouth articulations), expressions, and three-dimensional (3D) pose from monocular video or film footage. These parameters are then used to generate and drive a detailed 3D textured face mesh for a target identity, which can be seamlessly rendered back into target footage. The underlying face model automatically adjusts for how the target performs facial expressions and visemes. The performance data can be easily edited to change the visemes, expressions, pose, or even the identity of the target---the attributes are separably controllable. This supports a wide variety of video rewrite and puppetry applications.Face Transfer is based on a multilinear model of 3D face meshes that separably parameterizes the space of geometric variations due to different attributes (e.g., identity, expression, and viseme). Separability means that each of these attributes can be independently varied. A multilinear model can be estimated from a Cartesian product of examples (identities × expressions × visemes) with techniques from statistical analysis, but only after careful preprocessing of the geometric data set to secure one-to-one correspondence, to minimize cross-coupling artifacts, and to fill in any missing examples. Face Transfer offers new solutions to these problems and links the estimated model with a face-tracking algorithm to extract pose, expression, and viseme parameters.
Abstract The Hyper Suprime-Cam (HSC) is an 870 megapixel prime focus optical imaging camera for the 8.2 m Subaru telescope. The wide-field corrector delivers sharp images of 0${^{\prime\prime}_{.}}$2 (FWHM) in the HSC-i band over the entire 1${^{\circ}_{.}}$5 diameter field of view. The collimation of the camera with respect to the optical axis of the primary mirror is done with hexapod actuators, the mechanical accuracy of which is a few microns. Analysis of the remaining wavefront error in off-focus stellar images reveals that the collimation of the optical components meets design specifications. While there is a flexure of mechanical components, it also is within the design specification. As a result, the camera achieves its seeing-limited imaging on Maunakea during most of the time; the median seeing over several years of observing is 0${^{\prime\prime}_{.}}$67 (FWHM) in the i band. The sensors use p-channel, fully depleted CCDs of 200 μm thickness (2048 × 4176 15 μm square pixels) and we employ 116 of them to pave the 50 cm diameter focal plane. The minimum interval between exposures is 34 s, including the time to read out arrays, to transfer data to the control computer, and to save them to the hard drive. HSC on Subaru uniquely features a combination of a large aperture, a wide field of view, sharp images and a high sensitivity especially at longer wavelengths, which makes the HSC one of the most powerful observing facilities in the world.
This paper presents an anthropomorphic robot hand, called the Gifu hand II, which has a thumb and four fingers, all the joints of which are driven by servomotors built into the fingers and the palm. The thumb has four joints with four-degrees-of-freedom (DOF), the other fingers have four joints with 3-DOF, and two axes of the joints near the palm cross orthogonally at one point, as is the case in the human hand. The Gifu hand II can be equipped with six-axes force sensor at each fingertip, and a developed distributed tactile sensor with 624 detecting points on its surface. The design concepts and specifications of the Gifu hand II, the basic characteristics of the tactile sensor, and the pressure distributions at the time of object grasping are described and discussed herein. Our results demonstrate that the Gifu hand II has a high potential to perform dexterous object manipulations like the human hand.
This paper presents a new approach to optimal reactive power planning based on a genetic algorithm. Many outstanding methods to this problem have been proposed in the past. However, most these approaches have the common defect of being caught to a local minimum solution. The integer problem which yields integer value solutions for discrete controllers/banks still remain as a difficult one. The genetic algorithm is a kind of search algorithm based on the mechanics of natural selection and genetics. This algorithm can search for a global solution using multiple paths and treat integer problems naturally. The proposed method was applied to practical 51-bus and 224-bus systems to show its feasibility and capabilities. Although this method is not as fast as sophisticated traditional methods, the concept is quite promising and useful in the coming computer age.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
The role of forward error correction has become of critical importance in fiber optic communications, as backbone networks increase in speed to 40 and 100 Gb/s, particularly as poor optical-signal-to-noise environments are encountered. Such environments become more commonplace in higher-speed environments, as more optical amplifiers are deployed in networks. Many generations of FEC have been implemented, including block codes and concatenated codes. Developers now have options to consider hard-decision and soft-decision codes. This article describes the advantages of each type in particular transmission environments.
Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this framework, and to provide researchers with a “researchready” system, Sphinx-4 also includes several implementations of both simple and state-of-the-art techniques. The framework and the implementations are all freely available via open source.