Panasonic (United States)
companyNewark, New Jersey, United States
Research output, citation impact, and the most-cited recent papers from Panasonic (United States) (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Panasonic (United States)
A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.
Unsupervised anomaly detection with localization has many practical applications when labeling is infeasible and, moreover, when anomaly examples are completely missing in the train data. While recently proposed models for such data setup achieve high accuracy metrics, their complexity is a limiting factor for real-time processing. In this paper, we propose a real-time model and analytically derive its relationship to prior methods. Our CFLOW-AD model is based on a conditional normalizing flow frame- work adopted for anomaly detection with localization. In particular, CFLOW-AD consists of a discriminatively pretrained encoder followed by a multi-scale generative de- coders where the latter explicitly estimate likelihood of the encoded features. Our approach results in a computationally and memory-efficient model: CFLOW-AD is faster and smaller by a factor of 10× than prior state-of-the-art with the same input setting. Our experiments on the MVTec dataset show that CFLOW-AD outperforms previous methods by 0.36% AUROC in detection task, by 1.12% AUROC and 2.5% AUPRO in localization task, respectively. We open-source our code with fully reproducible experiments <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> .
This paper describes a new model-based speaker adaptation algorithm called the eigenvoice approach. The approach constrains the adapted model to be a linear combination of a small number of basis vectors obtained offline from a set of reference speakers, and thus greatly reduces the number of free parameters to be estimated from adaptation data. These "eigenvoice" basis vectors are orthogonal to each other and guaranteed to represent the most important components of variation between the reference speakers. Experimental results for a small-vocabulary task (letter recognition) given in the paper show that the approach yields major improvements in performance for tiny amounts of adaptation data. For instance, we obtained 16% relative improvement in error rate with one letter of supervised adaptation data, and 26% relative improvement with four letters of supervised adaptation data. After a comparison of the eigenvoice approach with other speaker adaptation algorithms, the paper concludes with a discussion of future work.
Automatic speech recognition experiments show that, depending on the task performed and how speech variability is modeled, automatic speech recognizers are more or less sensitive to the Lombard reflex. To gain an understanding about the Lombard effect with the prospect of improving performance of automatic speech recognizers, (1) an analysis was made of the acoustic-phonetic changes occurring in Lombard speech, and (2) the influence of the Lombard effect on speech perception was studied. Both acoustic and perceptual analyses suggest that the influence of the Lombard effect on male and female speakers is different. The analyses also bring to light that, even if some tendencies across speakers can be observed consistently, the Lombard reflex is highly variable from speaker to speaker. Based on the results of the acoustic and perceptual studies, some ways of dealing with Lombard speech variability in automatic speech recognition are also discussed.
Generalized time-frequency representations (GTFRs) which use cone-shaped kernels for nonstationary signal analysis are presented. The cone-shaped kernels are formulated for the GTFRs to produce good resolution simultaneously in time and frequency. Specifically, for a GTFR with a cone-shaped kernel, finite time support is maintained in the time dimension along with an enhanced spectrum in the frequency dimension, and the cross-terms are smoothed out. Experimental results on simulated data and real speech show the advantages of the GTFRs with the cone-shaped kernels through comparisons to the spectrogram and the pseudo-Wigner distribution.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
The worldwide opening of a massive amount of unlicensed spectra around 60 GHz has triggered great interest in developing affordable 60-GHz radios. This interest has been catalyzed by recent advance of 60-GHz front-end technologies. This paper briefly reports recent work in the 60-GHz radio. Aspects addressed in this paper include global regulatory and standardization, justification of using the 60-GHz bands, 60-GHz consumer electronics applications, radio system concept, 60-GHz propagation and antennas, and key issues in system design. Some new simulation results are also given. Potentials and problems are explained in detail.
The design of the Smart Grid requires solving a complex problem of combined sensing, communications and control and, thus, the problem of choosing a networking technology cannot be addressed without also taking into consideration requirements related to sensor networking and distributed control. These requirements are today still somewhat undefined so that it is not possible yet to give quantitative guidelines on how to choose one communication technology over the other. In this paper, we make a first qualitative attempt to better understand the role that Power Line Communications (PLCs) can have in the Smart Grid. Furthermore, we here report recent results on the electrical and topological properties of the power distribution network. The topological characterization of the power grid is not only important because it allows us to model the grid as an information source, but also because the grid becomes the actual physical information delivery infrastructure when PLCs are used.
The authors address the problem of automatic word boundary detection in quiet and in the presence of noise. Attention has been given to automatic word boundary detection for both additive noise and noise-induced changes in the talker's speech production (Lombard reflex). After a comparison of several automatic word boundary detection algorithms in different noisy-Lombard conditions, they propose a new algorithm that is robust in the presence of noise. This new algorithm identifies islands of reliability (essentially the portion of speech contained between the first and the last vowel) using time and frequency-based features and then, after a noise classification, applies a noise adaptive procedure to refine the boundaries. It is shown that this new algorithm outperforms the commonly used algorithm developed by Lamel (1981) et al. and several other recently developed methods. They evaluated the average recognition error rate due to word boundary detection in an HMM-based recognition system across several signal-to-noise ratios and noise conditions. The recognition error rate decreased to about 20% compared to an average of approximately 50% obtained with a modified version of the Lamel et al. algorithm.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
There exist a variety of ways to represent 3D content, including stereo and multiview video, as well as frame-compatible and depth-based video formats. There are also a number of compression architectures and techniques that have been introduced in recent years. This paper provides an overview of relevant 3D representation and compression formats. It also analyzes some of the merits and drawbacks of these formats considering the application requirements and constraints imposed by different storage and transmission systems.
Regional variations and substrates of high-frequency rhythmic activity induced by cholinergic stimulation were studied in hippocampal slices with 64-electrode recording arrays. (1) Carbachol triggered beta waves (17.6 +/- 5.7 Hz) in pyramidal regions of 75% of the slices. (2) The waves had phase shifts across the cell body layers and were substantially larger in the apical dendrites than in cell body layers or basal dendrites. (3) Continuous, two-dimensional current source density analyses indicated apical sinks associated with basal sources, lasting approximately 10 msec, followed by apical sources and basal sinks, lasting approximately 20 msec, in a repeating pattern with a period in the range of 15-25 Hz. (4) Carbachol-induced beta waves in the hippocampus were accompanied by 40 Hz (gamma) oscillations in deep layers of the entorhinal cortex. (5) Cholinergically elicited beta and gamma rhythms were eliminated by antagonists of either AMPA or GABA receptors. Benzodiazepines markedly enhanced beta activity and sometimes introduced a distinct gamma frequency peak. (6) Twenty Hertz activity after orthodromic activation of field CA3 was distributed in the same manner as carbachol-induced beta waves and was generated by a current source in the apical dendrites of CA3. This source was eliminated by high concentrations of GABA(A) receptor blockers. It is concluded that cholinergically driven beta rhythms arise independently in hippocampal subfields from oscillatory circuits involving (1) bursts of pyramidal cell discharges, (2) activation of a subset of feedback interneurons that project apically, and (3) production of a GABA(A)-mediated hyperpolarization in the outer portions of the apical dendrites of pyramidal neurons.
The network interfaces of existing multicomputers require a significant amount of software overhead to provide protection and to implement message passing protocols. The authors describe the design of a low-latency, high-bandwidth, virtual memory-mapped network interface for the SHRIMP multicomputer project at Princeton University. Without sacrificing protection, the network interface achieves low latency by using virtual memory mapping and write-latency hiding techniques, and obtains high bandwidth by providing a user-level block data transfer mechanism. The authors have implemented several message passing primitives in an experimental environment, demonstrating that their approach can reduce the message passing overhead to a few user-level instructions.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
The increasing availability of large-scale location traces creates unprecedent opportunities to change the paradigm for identifying abnormal moving activities. Indeed, various aspects of abnormality of moving patterns have recently been exploited, such as wrong direction and wandering. However, there is no recognized way of combining different aspects into an unified evolving abnormality score which has the ability to capture the evolving nature of abnormal moving trajectories. To that end, in this paper, we provide an evolving trajectory outlier detection method, named TOP-EYE, which continuously computes the outlying score for each trajectory in an accumulating way. Specifically, in TOP-EYE, we introduce a decay function to mitigate the influence of the past trajectories on the evolving outlying score, which is defined based on the evolving moving direction and density of trajectories. This decay function enables the evolving computation of accumulated outlying scores along the trajectories. An advantage of TOP-EYE is to identify evolving outliers at very early stage with relatively low false alarm rate. Finally, experimental results on real-world location traces show that TOP-EYE can effectively capture evolving abnormal trajectories.
Speaker-independent recognition of Lombard and noisy speech by a recognizer trained with normal speech is discussed. Speech was represented by static, dynamic (first difference), and acceleration (second difference) features. Strong interaction was found between these temporal features, the frequency differentiation due to cepstral weighting, and the degree of smoothing in the spectral analysis. When combined with the other features, acceleration raised recognition rates for Lombard or noisy input speech. Dynamic and acceleration features were found to perform much better than the static feature for noisy Lombard speech. This suggests that an algorithm which excludes the static feature in high ambient noise is desirable.< <ETX xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">></ETX>
When making decisions we need to consider the possible alternatives and then choose the optimal alternative. The uncertainty of subjective judgment is present during this selection process. Also, decision making becomes difficult when the available information is incomplete or imprecise. This kind of problem exists while selecting a project. There are also several critical factors that are involved in the selection process, including market conditions, availability of raw materials, etc. The decision mechanism is constrained by the uncertainty inherent in the determination of the relative importance of each attribute element. In this paper, me develop a system for the project selection using fuzzy logic. Fuzzy logic enables us to emulate the human reasoning process and make decisions based on vague or imprecise data. Our approach is based on uncertainty reduction. The optimal alternative is formed by the relative weights of each attribute's elements combined over all the attribute membership functions. We also do a case study for the selection of software packages. Our system could be easily applied to other project selection problems under uncertainty.
To study the Lombard (1911) reflex, more realistic databases representing real-world conditions need to be recorded and analyzed. In this paper we (1) summarize a procedure to record Lombard data which provides a good approximation of realistic conditions, (2) present an analysis per class of sounds for duration and energy of words recorded while subjects are listening to noise through open-ear headphones (a) when speakers are in communication with a recognition device and (b) when reading a list, and (3) report on the influence of speaking style on speaker dependent and speaker-independent experiments. This paper extends a previous study aimed at analyzing the influence of the communication factor on the Lombard reflex. We also show evidence that it is difficult to separate the speaker from the environment stressor (in this case the noise) when studying the Lombard reflex. The main conclusion of our pilot study is that the communication factor should not be neglected because it strongly influences the Lombard reflex.
We report for the first time some statistical properties of the indoor power line (PL) channel that exhibit some interesting similarities to the wireless channel, although some fundamental differences are also pointed out. In particular, we argue here that, while multipath propagation in wireless channels gives rise to Rayleigh distributed fading, multipath in PL channels gives rise to log normally distributed fading similarly to shadow fading. We also report for the first time that both channel gain and Root-Mean-Square Delay Spread (RMS-DS) of indoor channels are log normally distributed, leptokurtic and negatively correlated, thus suggesting that channels that introduce severe multipath are also characterized by large attenuation. These results are used to define a simplified PL channel model useful for comparative analysis of communication schemes and, at the same time, to draw some general conclusions on the design of multicarrier schemes.
We have used femtosecond laser pulses to drill submicron holes in single crystal silicon films in silicon-on-insulator structures. Cross-sectional transmission electron microscopy and energy dispersive x-ray analysis of material adjacent to the ablated holes indicates the formation of a layer of amorphous Si. This demonstrates that even when material is ablated using femtosecond pulses near the single pulse ablation threshold, sufficient heating of the surrounding material occurs to create a molten zone which solidifies so rapidly that crystallization is bypassed.
We study an important problem in multimedia database, namely the automatic extraction of indexing information from raw data based on video contents. The goal of our research project is to develop a prototype system for automatic indexing of sports videos. The novelty of our work is that we propose to integrate speech understanding and image analysis algorithms for extracting information. The main thrust of this work comes from the observation that in news or sports video indexing, usually speech analysis is more efficient in detecting events than image analysis. Therefore, in our system, the audio processing modules are first applied to locate candidates in the whole data. This information is passed to the video processing modules, which further analyze the video. The final products of video analysis are in the form of pointers to the locations of interesting events in a video. Our algorithms have been tested extensively with real TV programs, and results are presented and discussed.
We report ultra high voltage AlGaN/GaN heterojunction transistors (HFETs) on sapphire with thick poly-AlN passivation. Extremely high blocking voltage of 8300 V is achieved while maintaining relative low specific on-state resistance (Ron*A) of 186 mOmegaldrcm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> . Via-holes through sapphire at the drain electrodes enable very efficient layout of the lateral HFET array as well as better heat dissipation.
We present multifrequency VLA observations of the intensity and polarized emission from G359.54+0.18, a system of nonthermal filaments near the Galactic center. The intrinsic magnetic field lines run primarily along the filaments. The rotation measure (RM) varies between -4200 and -370 rad m-2 on scales of several arcseconds, implying that the ionized, magnetized medium responsible for the Faraday rotation is less than 0.1 pc thick. In turn, this implies that the magnetic field in the Faraday screen is large, suggesting that it is located close to the Galactic center. Further evidence is provided by the anisotropy of the RM fluctuations. The structure of the eastern portion of G359.54+0.18 suggests that the magnetic field in the filaments is highly distorted as a result of an interaction with an adjacent Galactic center molecular cloud.