Research Center for Information Technology Innovation, Academia Sinica

facilityTaipei, Taiwan

Research output, citation impact, and the most-cited recent papers from Research Center for Information Technology Innovation, Academia Sinica (Taiwan). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works

1.9K

Citations

75.7K

h-index

104

i10-index

1.3K

Also known as

Research Center for Information Technology Innovation, Academia Sinica中央研究院資訊科技創新研究中心

Top-cited papers from Research Center for Information Technology Innovation, Academia Sinica

cytoHubba: identifying hub objects and sub-networks from complex interactome

Chia-Hao Chin, Shu-Hwa Chen, Hsin-Hung Wu, Chin-Wen Ho +2 more

2014· BMC Systems Biology6.8Kdoi:10.1186/1752-0509-8-s4-s11

BACKGROUND: Network is a useful way for presenting many types of biological data including protein-protein interactions, gene regulations, cellular pathways, and signal transductions. We can measure nodes by their network features to infer their importance in the network, and it can help us identify central elements of biological networks. RESULTS: We introduce a novel Cytoscape plugin cytoHubba for ranking nodes in a network by their network features. CytoHubba provides 11 topological analysis methods including Degree, Edge Percolated Component, Maximum Neighborhood Component, Density of Maximum Neighborhood Component, Maximal Clique Centrality and six centralities (Bottleneck, EcCentricity, Closeness, Radiality, Betweenness, and Stress) based on shortest paths. Among the eleven methods, the new proposed method, MCC, has a better performance on the precision of predicting essential proteins from the yeast PPI network. CONCLUSIONS: CytoHubba provide a user-friendly interface to explore important nodes in biological networks. It computes all eleven methods in one stop shopping way. Besides, researchers are able to combine cytoHubba with and other plugins into a novel analysis scheme. The network and sub-networks caught by this topological analysis strategy will lead to new insights on essential regulatory networks and protein drug targets for experimental biologists. According to cytoscape plugin download statistics, the accumulated number of cytoHubba is around 6,700 times since 2010.

Computer-aided classification of lung nodules on computed tomography images via deep learning technique

Yu-Jen Yu-Jen Chen, Kai‐Lung Hua, Che-Hao Hsu, Wen-Huang Cheng +1 more

2015· OncoTargets and Therapy544doi:10.2147/ott.s80733

Lung cancer has a poor prognosis when not diagnosed early and unresectable lesions are present. The management of small lung nodules noted on computed tomography scan is controversial due to uncertain tumor characteristics. A conventional computer-aided diagnosis (CAD) scheme requires several image processing and pattern recognition steps to accomplish a quantitative tumor differentiation result. In such an ad hoc image analysis pipeline, every step depends heavily on the performance of the previous step. Accordingly, tuning of classification performance in a conventional CAD scheme is very complicated and arduous. Deep learning techniques, on the other hand, have the intrinsic advantage of an automatic exploitation feature and tuning of performance in a seamless fashion. In this study, we attempted to simplify the image analysis pipeline of conventional CAD with deep learning techniques. Specifically, we introduced models of a deep belief network and a convolutional neural network in the context of nodule classification in computed tomography images. Two baseline methods with feature computing steps were implemented for comparison. The experimental results suggest that deep learning methods could achieve better discriminative results and hold promise in the CAD application domain.

Multiple Kernel Fuzzy Clustering

Hsin-Chien Huang, Yung‐Yu Chuang, Chu‐Song Chen

2011· IEEE Transactions on Fuzzy Systems411doi:10.1109/tfuzz.2011.2170175

While fuzzy c-means is a popular soft-clustering method, its effectiveness is largely limited to spherical clusters. By applying kernel tricks, the kernel fuzzy c-means algorithm attempts to address this problem by mapping data with nonlinear relationships to appropriate feature spaces. Kernel combination, or selection, is crucial for effective kernel clustering. Unfortunately, for most applications, it is uneasy to find the right combination. We propose a multiple kernel fuzzy c-means (MKFC) algorithm that extends the fuzzy c-means algorithm with a multiple kernel-learning setting. By incorporating multiple kernels and automatically adjusting the kernel weights, MKFC is more immune to ineffective kernels and irrelevant features. This makes the choice of kernels less crucial. In addition, we show multiple kernel k-means to be a special case of MKFC. Experiments on both synthetic and real-world data demonstrate the effectiveness of the proposed MKFC algorithm.

No More Discrimination: Cross City Adaptation of Road Scene Segmenters

Yi‐Hsin Chen, Wei‐Yu Chen, Yu-Ting Chen, Bo-Cheng Tsai +2 more

2017370doi:10.1109/iccv.2017.220

Despite the recent success of deep-learning based semantic segmentation, deploying a pre-trained road scene segmenter to a city whose images are not presented in the training set would not achieve satisfactory performance due to dataset biases. Instead of collecting a large number of annotated images of each city of interest to train or refine the segmenter, we propose an unsupervised learning approach to adapt road scene segmenters across different cities. By utilizing Google Street View and its time-machine feature, we can collect unannotated images for each road scene at different times, so that the associated static-object priors can be extracted accordingly. By advancing a joint global and class-specific domain adversarial learning framework, adaptation of pre-trained segmenters to that city can be achieved without the need of any user annotation or interaction. We show that our method improves the performance of semantic segmentation in multiple cities across continents, while it performs favorably against state-of-the-art approaches requiring annotated training data.

Ordinal hyperplanes ranker with cost sensitivities for age estimation

Kuang-Yu Chang, Chu‐Song Chen, Yi‐Ping Hung

2011350doi:10.1109/cvpr.2011.5995437

In this paper, we propose an ordinal hyperplane ranking algorithm called OHRank, which estimates human ages via facial images. The design of the algorithm is based on the relative order information among the age labels in a database. Each ordinal hyperplane separates all the facial images into two groups according to the relative order, and a cost-sensitive property is exploited to find better hyperplanes based on the classification costs. Human ages are inferred by aggregating a set of preferences from the ordinal hyperplanes with their cost sensitivities. Our experimental results demonstrate that the proposed approach outperforms conventional multiclass-based and regression-based approaches as well as recently developed ranking-based age estimation approaches.

Noise Reduction in ECG Signals Using Fully Convolutional Denoising Autoencoders

Hsin-Tien Chiang, Yi-Yen Hsieh, Szu‐Wei Fu, Kuo-Hsuan Hung +2 more

2019· IEEE Access349doi:10.1109/access.2019.2912036

The electrocardiogram (ECG) is an efficient and noninvasive indicator for arrhythmia detection and prevention. In real-world scenarios, ECG signals are prone to be contaminated with various noises, which may lead to wrong interpretation. Therefore, significant attention has been paid on denoising of ECG for accurate diagnosis and analysis. A denoising autoencoder (DAE) can be applied to reconstruct the clean data from its noisy version. In this paper, a DAE using the fully convolutional network (FCN) is proposed for ECG signal denoising. Meanwhile, the proposed FCN-based DAE can perform compression with regard to the DAE architecture. The proposed approach is applied to ECG signals from the MIT-BIH Arrhythmia database and the added noise signals are obtained from the MIT-BIH Noise Stress Test database. The denoising performance is evaluated using the root-mean-square error (RMSE), percentage-root-mean-square difference (PRD), and improvement in signal-to-noise ratio (SNR <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">imp</sub> ). The results of the experiments conducted on noisy ECG signals of different levels of input SNR show that the FCN acquires better performance as compared to the deep fully connected neural network- and convolutional neural network-based denoising models. Moreover, the proposed FCN-based DAE reduces the size of the input ECG signals, where the compressed data is 32 times smaller than the original. The results of the study demonstrate the superiority of FCN in denoising, with lower RMSE and PRD, as well as higher SNR <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">imp</sub> . According to the results, we believe that the proposed FCN-based DAE has a good application prospect in clinical practice.

End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks

Szu‐Wei Fu, Tao-Wei Wang, Yu Tsao, Xugang Lu +1 more

2018· IEEE/ACM Transactions on Audio Speech and Language Processing341doi:10.1109/taslp.2018.2821903

Speech enhancement model is used to map a noisy speech to a clean speech. In the training stage, an objective function is often adopted to optimize the model parameters. However, in the existing literature, there is an inconsistency between the model optimization criterion and the evaluation criterion for the enhanced speech. For example, in measuring speech intelligibility, most of the evaluation metric is based on a short-time objective intelligibility (STOI) measure, while the frame based mean square error (MSE) between estimated and clean speech is widely used in optimizing the model. Due to the inconsistency, there is no guarantee that the trained model can provide optimal performance in applications. In this study, we propose an end-to-end utterance-based speech enhancement framework using fully convolutional neural networks (FCN) to reduce the gap between the model optimization and the evaluation criterion. Because of the utterance-based optimization, temporal correlation information of long speech segments, or even at the entire utterance level, can be considered to directly optimize perception-based objective functions. As an example, we implemented the proposed FCN enhancement framework to optimize the STOI measure. Experimental results show that the STOI of a test speech processed by the proposed approach is better than conventional MSE-optimized speech due to the consistency between the training and the evaluation targets. Moreover, by integrating the STOI into model optimization, the intelligibility of human subjects and automatic speech recognition system on the enhanced speech is also substantially improved compared to those generated based on the minimum MSE criterion.

Supervised Learning of Semantics-Preserving Hash via Deep Convolutional Neural Networks

Huei‐Fang Yang, Kevin Lin, Chu‐Song Chen

2017· IEEE Transactions on Pattern Analysis and Machine Intelligence326doi:10.1109/tpami.2017.2666812

This paper presents a simple yet effective supervised deep hash approach that constructs binary hash codes from labeled data for large-scale image search. We assume that the semantic labels are governed by several latent attributes with each attribute on or off, and classification relies on these attributes. Based on this assumption, our approach, dubbed supervised semantics-preserving deep hashing (SSDH), constructs hash functions as a latent layer in a deep network and the binary codes are learned by minimizing an objective function defined over classification error and other desirable hash codes properties. With this design, SSDH has a nice characteristic that classification and retrieval are unified in a single learning model. Moreover, SSDH performs joint learning of image representations, hash codes, and classification in a point-wised manner, and thus is scalable to large-scale datasets. SSDH is simple and can be realized by a slight enhancement of an existing deep architecture for classification; yet it is effective and outperforms other hashing approaches on several benchmarks and large datasets. Compared with state-of-the-art approaches, SSDH achieves higher retrieval accuracy, while the classification performance is not sacrificed.

Interference and Outage in Poisson Cognitive Networks

Chia-han Lee, Martin Haenggi

2012· IEEE Transactions on Wireless Communications297doi:10.1109/twc.2012.021512.110131

Consider a cognitive radio network with two types of users: primary users (PUs) and cognitive users (CUs), whose locations follow two independent Poisson point processes. The cognitive users follow the policy that a cognitive transmitter is active only when it is outside the primary user exclusion regions. We found that under this setup the active cognitive users form a point process called the Poisson hole process. Due to the interaction between the primary users and the cognitive users through exclusion regions, an exact calculation of the interference and the outage probability seems unfeasible. Instead, two different approaches are taken to tackle this problem. First, bounds for the interference (in the form of Laplace transforms) and the outage probability are derived, and second, it is shown how to use a Poisson cluster process to model the interference in this kind of network. Furthermore, the bipolar network model with different exclusion region settings is analyzed.

Voice conversion from non-parallel corpora using variational auto-encoder

Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao +1 more

2016296doi:10.1109/apsipa.2016.7820786

We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid of alignments. However, these requirements gravely limit the scope of practical applications of SC due to scarcity or even unavailability of parallel corpora. We propose an SC framework based on variational auto-encoder which enables us to exploit non-parallel corpora. The framework comprises an encoder that learns speaker-independent phonetic representations and a decoder that learns to reconstruct the designated speaker. It removes the requirement of parallel corpora or phonetic alignments to train a spectral conversion system. We report objective and subjective evaluations to validate our proposed method and compare it to SC methods that have access to aligned corpora.

Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks

Chin-Cheng Hsu, Hsin-Te Hwang, Yi-Chiao Wu, Yu Tsao +1 more

2017258doi:10.21437/interspeech.2017-63

Building a voice conversion (VC) system from non-parallel speech corpora is challenging but highly valuable in real application scenarios.In most situations, the source and the target speakers do not repeat the same texts or they may even speak different languages.In this case, one possible, although indirect, solution is to build a generative model for speech.Generative models focus on explaining the observations with latent variables instead of learning a pairwise transformation function, thereby bypassing the requirement of speech frame alignment.In this paper, we propose a non-parallel VC framework with a variational autoencoding Wasserstein generative adversarial network (VAW-GAN) that explicitly considers a VC objective when building the speech model.Experimental results corroborate the capability of our framework for building a VC system from unaligned data, and demonstrate improved conversion quality.

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks

Jen-Cheng Hou, Syu-Siang Wang, Ying-Hui Lai, Yu Tsao +2 more

2018· IEEE Transactions on Emerging Topics in Computational Intelligence250doi:10.1109/tetci.2017.2784878

Speech enhancement (SE) aims to reduce noise in speech signals. Most SE techniques focus only on addressing audio information. In this paper, inspired by multimodal learning, which utilizes data from different modalities, and the recent success of convolutional neural networks (CNNs) in SE, we propose an audio-visual deep CNNs (AVDCNN) SE model, which incorporates audio and visual streams into a unified network model. We also propose a multitask learning framework for reconstructing audio and visual signals at the output layer. Precisely speaking, the proposed AVDCNN model is structured as an audio-visual encoder-decoder network, in which audio and visual data are first processed using individual CNNs, and then fused into a joint network to generate enhanced speech (the primary task) and reconstructed images (the secondary task) at the output layer. The model is trained in an endto-end manner, and parameters are jointly learned through back propagation. We evaluate enhanced speech using five instrumental criteria. Results show that the AVDCNN model yields a notably superior performance compared with an audio-only CNN-based SE model and two conventional SE approaches, confirming the effectiveness of integrating visual information into the SE process. In addition, the AVDCNN model also outperforms an existing audio- visual SE model, confirming its capability of effectively combining audio and visual information in SE.

Self-Learning Based Image Decomposition With Applications to Single Image Denoising

De-An Huang, Li‐Wei Kang, Yu-Chiang Frank Wang, Chia‐Wen Lin

2013· IEEE Transactions on Multimedia227doi:10.1109/tmm.2013.2284759

Decomposition of an image into multiple semantic components has been an effective research topic for various image processing applications such as image denoising, enhancement, and inpainting. In this paper, we present a novel self-learning based image decomposition framework. Based on the recent success of sparse representation, the proposed framework first learns an over-complete dictionary from the high spatial frequency parts of the input image for reconstruction purposes. We perform unsupervised clustering on the observed dictionary atoms (and their corresponding reconstructed image versions) via affinity propagation, which allows us to identify image-dependent components with similar context information. While applying the proposed method for the applications of image denoising, we are able to automatically determine the undesirable patterns (e.g., rain streaks or Gaussian noise) from the derived image components directly from the input image, so that the task of single-image denoising can be addressed. Different from prior image processing works with sparse representation, our method does not need to collect training image data in advance, nor do we assume image priors such as the relationship between input and output image dictionaries. We conduct experiments on two denoising problems: single-image denoising with Gaussian noise and rain removal. Our empirical results confirm the effectiveness and robustness of our approach, which is shown to outperform state-of-the-art image denoising algorithms.

Anomaly Detection via Online Oversampling Principal Component Analysis

Yuh‐Jye Lee, Yi-Ren Yeh, Yu-Chiang Frank Wang

2013· IEEE Transactions on Knowledge and Data Engineering227doi:10.1109/tkde.2012.99

Anomaly detection has been an important research topic in data mining and machine learning. Many real-world applications such as intrusion or credit card fraud detection require an effective and efficient framework to identify deviated data instances. However, most anomaly detection methods are typically implemented in batch mode, and thus cannot be easily extended to large-scale problems without sacrificing computation and memory requirements. In this paper, we propose an online oversampling principal component analysis (osPCA) algorithm to address this problem, and we aim at detecting the presence of outliers from a large amount of data via an online updating technique. Unlike prior principal component analysis (PCA)-based approaches, we do not store the entire data matrix or covariance matrix, and thus our approach is especially of interest in online or large-scale problems. By oversampling the target instance and extracting the principal direction of the data, the proposed osPCA allows us to determine the anomaly of the target instance according to the variation of the resulting dominant eigenvector. Since our osPCA need not perform eigen analysis explicitly, the proposed framework is favored for online applications which have computation or memory limitations. Compared with the well-known power method for PCA and other popular anomaly detection algorithms, our experimental results verify the feasibility of our proposed method in terms of both accuracy and efficiency.

Joint Enhancement and Denoising Method via Sequential Decomposition

Xutong Ren, Mading Li, Wen-Huang Cheng, Jiaying Liu

2018227doi:10.1109/iscas.2018.8351427

Many low-light enhancement methods ignore intensive noise in original images. As a result, they often simultaneously enhance the noise as well. Furthermore, extra denoising procedures adopted by most methods ruin the details. In this paper, we introduce a joint low-light enhancement and denoising strategy, aimed at obtaining well-enhanced low-light images while getting rid of the inherent noise issue simultaneously. The proposed method performs Retinex model based decomposition in a successive sequence, which sequentially estimates a piece-wise smoothed illumination and a noise-suppressed reflectance. After getting the illumination and reflectance map, we adjust the illumination layer and generate our enhancement result. In this noise-suppressed sequential decomposition process we enforce the spatial smoothness on each component and skillfully make use of weight matrices to suppress the noise and improve the contrast. Results of extensive experiments demonstrate the effectiveness and practicability of our method. It performs well for a wide variety of images, and achieves better or comparable quality compared with the state-of-the-art methods.

Low-rank matrix recovery with structural incoherence for robust face recognition

Chih-Fan Chen, Chia-Po Wei, Yuting Wang

2012223doi:10.1109/cvpr.2012.6247981

We address the problem of robust face recognition, in which both training and test image data might be corrupted due to occlusion and disguise. From standard face recognition algorithms such as Eigenfaces to recently proposed sparse representation-based classification (SRC) methods, most prior works did not consider possible contamination of data during training, and thus the associated performance might be degraded. Based on the recent success of low-rank matrix recovery, we propose a novel low-rank matrix approximation algorithm with structural incoherence for robust face recognition. Our method not only decomposes raw training data into a set of representative basis with corresponding sparse errors for better modeling the face images, we further advocate the structural incoherence between the basis learned from different classes. These basis are encouraged to be as independent as possible due to the regularization on structural incoherence. We show that this provides additional discriminating ability to the original low-rank models for improved performance. Experimental results on public face databases verify the effectiveness and robustness of our method, which is also shown to outperform state-of-the-art SRC based approaches.

Raw waveform-based speech enhancement by fully convolutional networks

Szu‐Wei Fu, Yu Tsao, Xugang Lu, Hisashi Kawai

2017213doi:10.1109/apsipa.2017.8281993

This study proposes a fully convolutional network (FCN) model for raw waveform-based speech enhancement. The proposed system performs speech enhancement in an end-to-end (i.e., waveform-in and waveform-out) manner, which differs from most existing denoising methods that process the magnitude spectrum (e.g., log power spectrum (LPS)) only. Because the fully connected layers, which are involved in deep neural networks (DNN) and convolutional neural net-works (CNN), may not accurately characterize the local in-formation of speech signals, particularly with high frequency components, we employed fully convolutional layers to model the waveform. More specifically, FCN consists of only convolutional layers and thus the local temporal structures of speech signals can be efficiently and effectively preserved with relatively few weights. Experimental results show that DNN- and CNN-based models have limited capability to restore high frequency components of waveforms, thus leading to decreased intelligibility of enhanced speech. By contrast, the proposed FCN model can not only effectively recover the waveforms but also outperform the LPS- based DNN baseline in terms of short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ). In addition, the number of model parameters in FCN is approximately only 0.2% com-pared with that in both DNN and CNN.

Learning Cross-Domain Landmarks for Heterogeneous Domain Adaptation

Yao-Hung Hubert Tsai, Yi-Ren Yeh, Yu-Chiang Frank Wang

2016201doi:10.1109/cvpr.2016.549

While domain adaptation (DA) aims to associate the learning tasks across data domains, heterogeneous domain adaptation (HDA) particularly deals with learning from cross-domain data which are of different types of features. In other words, for HDA, data from source and target domains are observed in separate feature spaces and thus exhibit distinct distributions. In this paper, we propose a novel learning algorithm of Cross-Domain Landmark Selection (CDLS) for solving the above task. With the goal of deriving a domain-invariant feature subspace for HDA, our CDLS is able to identify representative cross-domain data, including the unlabeled ones in the target domain, for performing adaptation. In addition, the adaptation capabilities of such cross-domain landmarks can be determined accordingly. This is the reason why our CDLS is able to achieve promising HDA performance when comparing to state-of-the-art HDA methods. We conduct classification experiments using data across different features, domains, and modalities. The effectiveness of our proposed method can be successfully verified.

S1 and S2 Heart Sound Recognition Using Deep Neural Networks

Tien-En Chen, Shih-I Yang, Li‐Ting Ho, Kun-Hsi Tsai +4 more

2016· IEEE Transactions on Biomedical Engineering199doi:10.1109/tbme.2016.2559800

OBJECTIVE: This study focuses on the first (S1) and second (S2) heart sound recognition based only on acoustic characteristics; the assumptions of the individual durations of S1 and S2 and time intervals of S1-S2 and S2-S1 are not involved in the recognition process. The main objective is to investigate whether reliable S1 and S2 recognition performance can still be attained under situations where the duration and interval information might not be accessible. METHODS: A deep neural network (DNN) method is proposed for recognizing S1 and S2 heart sounds. In the proposed method, heart sound signals are first converted into a sequence of Mel-frequency cepstral coefficients (MFCCs). The K-means algorithm is applied to cluster MFCC features into two groups to refine their representation and discriminative capability. The refined features are then fed to a DNN classifier to perform S1 and S2 recognition. We conducted experiments using actual heart sound signals recorded using an electronic stethoscope. Precision, recall, F-measure, and accuracy are used as the evaluation metrics. RESULTS: The proposed DNN-based method can achieve high precision, recall, and F-measure scores with more than 91% accuracy rate. CONCLUSION: The DNN classifier provides higher evaluation scores compared with other well-known pattern classification methods. SIGNIFICANCE: The proposed DNN-based method can achieve reliable S1 and S2 recognition performance based on acoustic characteristics without using an ECG reference or incorporating the assumptions of the individual durations of S1 and S2 and time intervals of S1-S2 and S2-S1.

TEAM: Trust-Extended Authentication Mechanism for Vehicular Ad Hoc Networks

Ming-Chin Chuang, Jeng-Farn Lee

2013· IEEE Systems Journal193doi:10.1109/jsyst.2012.2231792

The security of vehicular ad hoc networks (VANETs) has been receiving a significant amount of attention in the field of wireless mobile networking because VANETs are vulnerable to malicious attacks. A number of secure authentication schemes based on asymmetric cryptography have been proposed to prevent such attacks. However, these schemes are not suitable for highly dynamic environments such as VANETs, because they cannot efficiently cope with the authentication procedure. Hence, this still calls for an efficient authentication scheme for VANETs. In this paper, we propose a decentralized lightweight authentication scheme called trust-extended authentication mechanism (TEAM) for vehicle-to-vehicle communication networks. TEAM adopts the concept of transitive trust relationships to improve the performance of the authentication procedure and only needs a few storage spaces. Moreover, TEAM satisfies the following security requirements: anonymity, location privacy, mutual authentication, forgery attack resistance, modification attack resistance, replay attack resistance, no clock synchronization problem, no verification table, fast error detection, perfect forward secrecy, man-in-the-middle attack resistance, and session key agreement.

Search all NobleBlocks papers mentioning “Research Center for Information Technology Innovation, Academia Sinica” →