NobleBlocks

NEC (United States)

companyIrving, Texas, United States

Research output, citation impact, and the most-cited recent papers from NEC (United States) (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works
1.3K
Citations
113.4K
h-index
149
i10-index
1.1K
Also known as
NEC (United States)NEC Corporation of America

Top-cited papers from NEC (United States)

Locality-constrained Linear Coding for image classification
Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv +2 more
20103.1Kdoi:10.1109/cvpr.2010.5540018

The traditional SPM approach based on bag-of-features (BoF) requires nonlinear classifiers to achieve good image classification performance. This paper presents a simple but effective coding scheme called Locality-constrained Linear Coding (LLC) in place of the VQ coding in traditional SPM. LLC utilizes the locality constraints to project each descriptor into its local-coordinate system, and the projected coordinates are integrated by max pooling to generate the final representation. With linear classifier, the proposed approach performs remarkably better than the traditional nonlinear SPM, achieving state-of-the-art performance on several benchmarks. Compared with the sparse coding strategy [22], the objective function used by LLC has an analytical solution. In addition, the paper proposes a fast approximated LLC method by first performing a K-nearest-neighbor search and then solving a constrained least square fitting problem, bearing computational complexity of O(M + K <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ). Hence even with very large codebooks, our system can still process multiple frames per second. This efficiency significantly adds to the practical values of LLC for real applications.

Linear spatial pyramid matching using sparse coding for image classification
Jianchao Yang, Kai Yu, Yihong Gong, Thomas S. Huang
2009· 2009 IEEE Conference on Computer Vision and Pattern Recognition2.9Kdoi:10.1109/cvpr.2009.5206757

Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ∼ n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> ) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algorithms to handlemore than thousands of training images. In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and propose a linear SPM kernel based on SIFT sparse codes. This new approach remarkably reduces the complexity of SVMs to O(n) in training and a constant in testing. In a number of image categorization experiments, we find that, in terms of classification accuracy, the suggested linear SPM based on sparse coding of SIFT descriptors always significantly outperforms the linear SPM kernel on histograms, and is even better than the nonlinear SPM kernels, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors.

A Dual-Stage Attention-Based Recurrent Neural Network for Time Series Prediction
Yao Qin, Dongjin Song, Haifeng Chen, Wei Cheng +2 more
20171.4Kdoi:10.24963/ijcai.2017/366

The Nonlinear autoregressive exogenous (NARX) model, which predicts the current value of a time series based upon its previous values as well as the current and past values of multiple driving (exogenous) series, has been studied for decades. Despite the fact that various NARX models have been developed, few of them can capture the long-term temporal dependencies appropriately and select the relevant driving series to make predictions. In this paper, we propose a dual-stage attention-based recurrent neural network (DA-RNN) to address these two issues. In the first stage, we introduce an input attention mechanism to adaptively extract relevant driving series (a.k.a., input features) at each time step by referring to the previous encoder hidden state. In the second stage, we use a temporal attention mechanism to select relevant encoder hidden states across all time steps. With this dual-stage attention scheme, our model can not only make predictions effectively, but can also be easily interpreted. Thorough empirical studies based upon the SML 2010 dataset and the NASDAQ 100 Stock dataset demonstrate that the DA-RNN can outperform state-of-the-art methods for time series prediction.

ONOS
Pankaj Berde, Matteo Gerola, Jonathan Hart, Yuta Higuchi +4 more
20141.1Kdoi:10.1145/2620728.2620744

We present our experiences to date building ONOS (Open Network Operating System), an experimental distributed SDN control platform motivated by the performance, scalability, and availability requirements of large operator networks. We describe and evaluate two ONOS prototypes. The first version implemented core features: a distributed, but logically centralized, global network view; scale-out; and fault tolerance. The second version focused on improving performance. Based on experience with these prototypes, we identify additional steps that will be required for ONOS to support use cases such as core network traffic engineering and scheduling, and to become a usable open source, distributed network OS platform that the SDN community can build upon.

Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection
Bo Zong, Song Qi, Martin Renqiang Min, Wei Cheng +3 more
20181.1K

Unsupervised anomaly detection on multi- or high-dimensional data is of great importance in both fundamental machine learning research and industrial applications, for which density estimation lies at the core. Although previous approaches based on dimensionality reduction followed by density estimation have made fruitful progress, they mainly suffer from decoupled model learning with inconsistent optimization goals and incapability of preserving essential information in the low-dimensional space. In this paper, we present a Deep Autoencoding Gaussian Mixture Model (DAGMM) for unsupervised anomaly detection. Our model utilizes a deep autoencoder to generate a low-dimensional representation and reconstruction error for each input data point, which is further fed into a Gaussian Mixture Model (GMM). Instead of using decoupled two-stage training and the standard Expectation-Maximization (EM) algorithm, DAGMM jointly optimizes the parameters of the deep autoencoder and the mixture model simultaneously in an end-to-end fashion, leveraging a separate estimation network to facilitate the parameter learning of the mixture model. The joint optimization, which well balances autoencoding reconstruction, density estimation of latent representation, and regularization, helps the autoencoder escape from less attractive local optima and further reduce reconstruction errors, avoiding the need of pre-training. Experimental results on several public benchmark datasets show that, DAGMM significantly outperforms state-of-the-art anomaly detection techniques, and achieves up to 14% improvement based on the standard F1 score.

DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents
Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher Choy +2 more
20171.1Kdoi:10.1109/cvpr.2017.233

We introduce a Deep Stochastic IOC RNN Encoder-decoder framework, DESIRE, for the task of future predictions of multiple interacting agents in dynamic scenes. DESIRE effectively predicts future locations of objects in multiple scenes by 1) accounting for the multi-modal nature of the future prediction (i.e., given the same context, future may vary), 2) foreseeing the potential future outcomes and make a strategic prediction based on that, and 3) reasoning not only from the past motion history, but also from the scene context as well as the interactions among the agents. DESIRE achieves these in a single end-to-end trainable neural network model, while being computationally efficient. The model first obtains a diverse set of hypothetical future prediction samples employing a conditional variational auto-encoder, which are ranked and refined by the following RNN scoring-regression module. Samples are scored by accounting for accumulated future rewards, which enables better long-term strategic decisions similar to IOC frameworks. An RNN scene context fusion module jointly captures past motion histories, the semantic scene context and interactions among multiple agents. A feedback mechanism iterates over the ranking and refinement to further boost the prediction accuracy. We evaluate our model on two publicly available datasets: KITTI and Stanford Drone Dataset. Our experiments show that the proposed model significantly improves the prediction accuracy compared to other baseline methods.

A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data
Chuxu Zhang, Dongjin Song, Yuncong Chen, Xinyang Feng +4 more
2019· Proceedings of the AAAI Conference on Artificial Intelligence765doi:10.1609/aaai.v33i01.33011409

Nowadays, multivariate time series data are increasingly collected in various real world systems, e.g., power plants, wearable devices, etc. Anomaly detection and diagnosis in multivariate time series refer to identifying abnormal status in certain time steps and pinpointing the root causes. Building such a system, however, is challenging since it not only requires to capture the temporal dependency in each time series, but also need encode the inter-correlations between different pairs of time series. In addition, the system should be robust to noise and provide operators with different levels of anomaly scores based upon the severity of different incidents. Despite the fact that a number of unsupervised anomaly detection algorithms have been developed, few of them can jointly address these challenges. In this paper, we propose a Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED), to perform anomaly detection and diagnosis in multivariate time series data. Specifically, MSCRED first constructs multi-scale (resolution) signature matrices to characterize multiple levels of the system statuses in different time steps. Subsequently, given the signature matrices, a convolutional encoder is employed to encode the inter-sensor (time series) correlations and an attention based Convolutional Long-Short Term Memory (ConvLSTM) network is developed to capture the temporal patterns. Finally, based upon the feature maps which encode the inter-sensor correlations and temporal information, a convolutional decoder is used to reconstruct the input signature matrices and the residual signature matrices are further utilized to detect and diagnose anomalies. Extensive empirical studies based on a synthetic dataset and a real power plant dataset demonstrate that MSCRED can outperform state-ofthe-art baseline methods.

Learning efficient object detection models with knowledge distillation
Guobin Chen, Wongun Choi, Yu Xiang, Tony Xiao Han +1 more
2017678

Despite significant accuracy improvement in convolutional neural networks (CNN) based object detectors, they often require prohibitive runtimes to process an image for real-time applications. State-of-the-art models often use very deep networks with a large number of floating point operations. Efforts such as model compression learn compact models with fewer number of parameters, but with much reduced accuracy. In this work, we propose a new framework to learn compact and fast object detection networks with improved accuracy using knowledge distillation [20] and hint learning [34]. Although knowledge distillation has demonstrated excellent improvements for simpler classification setups, the complexity of detection poses new challenges in the form of regression, region proposals and less voluminous labels. We address this through several innovations such as a weighted cross-entropy loss to address class imbalance, a teacher bounded loss to handle the regression component and adaptation layers to better learn from intermediate teacher distributions. We conduct comprehensive empirical evaluation with different distillation configurations over multiple datasets including PASCAL, KITTI, ILSVRC and MS-COCO. Our results show consistent improvement in accuracy-speed trade-offs for modern multi-class detection models.

Exploit All the Layers: Fast and Accurate CNN Object Detector with Scale Dependent Pooling and Cascaded Rejection Classifiers
Fan Yang, Wongun Choi, Yuanqing Lin
2016636doi:10.1109/cvpr.2016.234

In this paper, we investigate two new strategies to detect objects accurately and efficiently using deep convolutional neural network: 1) scale-dependent pooling and 2) layerwise cascaded rejection classifiers. The scale-dependent pooling (SDP) improves detection accuracy by exploiting appropriate convolutional features depending on the scale of candidate object proposals. The cascaded rejection classifiers (CRC) effectively utilize convolutional features and eliminate negative object proposals in a cascaded manner, which greatly speeds up the detection while maintaining high accuracy. In combination of the two, our method achieves significantly better accuracy compared to other state-of-the-arts in three challenging datasets, PASCAL object detection challenge, KITTI object detection benchmark and newly collected Inner-city dataset, while being more efficient.

CHEX
Long Lu, Zhichun Li, Zhenyu Wu, Wenke Lee +1 more
2012596doi:10.1145/2382196.2382223

An enormous number of apps have been developed for Android in recent years, making it one of the most popular mobile operating systems. However, the quality of the booming apps can be a concern [4]. Poorly engineered apps may contain security vulnerabilities that can severally undermine users' security and privacy. In this paper, we study a general category of vulnerabilities found in Android apps, namely the component hijacking vulnerabilities. Several types of previously reported app vulnerabilities, such as permission leakage, unauthorized data access, intent spoofing, and etc., belong to this category.

Analysis and characterization of inherent application resilience for approximate computing
Vinay K. Chippa, Srimat Chakradhar, Kaushik Roy, Anand Raghunathan
2013541doi:10.1145/2463209.2488873

Approximate computing is an emerging design paradigm that enables highly efficient hardware and software implementations by exploiting the inherent resilience of applications to in-exactness in their computations. Previous work in this area has demonstrated the potential for significant energy and performance improvements, but largely consists of ad hoc techniques that have been applied to a small number of applications. Taking approximate computing closer to mainstream adoption requires (i) a deeper understanding of inherent application resilience across a broader range of applications (ii) tools that can quantitatively establish the inherent resilience of an application, and (iii) methods to quickly assess the potential of various approximate computing techniques for a given application. We make two key contributions in this direction. Our primary contribution is the analysis and characterization of inherent application resilience present in a suite of 12 widely used applications from the domains of recognition, data mining, and search. Based on this analysis, we present several new insights into the nature of resilience and its relationship to various key application characteristics. To facilitate our analysis, we propose a systematic framework for Application Resilience Characterization (ARC) that (a) partitions an application into resilient and sensitive parts and (b) characterizes the resilient parts using approximation models that abstract a wide range of approximate computing techniques. We believe that the key insights that we present can help shape further research in the area of approximate computing, while automatic resilience characterization frameworks such as ARC can greatly aid designers in the adoption approximate computing.

Linear spatial pyramid matching using sparse coding for image classification
Jianchao Yang, Kai Yu, Yihong Gong, T. Huang
2009· 2009 IEEE Conference on Computer Vision and Pattern Recognition499doi:10.1109/cvprw.2009.5206757

Recently SVMs using spatial pyramid matching (SPM) kernel have been highly successful in image classification. Despite its popularity, these nonlinear SVMs have a complexity O(n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ∼ n <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> ) in training and O(n) in testing, where n is the training size, implying that it is nontrivial to scaleup the algorithms to handlemore than thousands of training images. In this paper we develop an extension of the SPM method, by generalizing vector quantization to sparse coding followed by multi-scale spatial max pooling, and propose a linear SPM kernel based on SIFT sparse codes. This new approach remarkably reduces the complexity of SVMs to O(n) in training and a constant in testing. In a number of image categorization experiments, we find that, in terms of classification accuracy, the suggested linear SPM based on sparse coding of SIFT descriptors always significantly outperforms the linear SPM kernel on histograms, and is even better than the nonlinear SPM kernels, leading to state-of-the-art performance on several benchmarks by using a single type of descriptors.

SegFlow: Joint Learning for Video Object Segmentation and Optical Flow
Jingchun Cheng, Yi‐Hsuan Tsai, Shengjin Wang, Ming–Hsuan Yang
2017466doi:10.1109/iccv.2017.81

This paper proposes an end-to-end trainable network, SegFlow, for simultaneously predicting pixel-wise object segmentation and optical flow in videos. The proposed SegFlow has two branches where useful information of object segmentation and optical flow is propagated bidirectionally in a unified framework. The segmentation branch is based on a fully convolutional network, which has been proved effective in image segmentation task, and the optical flow branch takes advantage of the FlowNet model. The unified framework is trained iteratively offline to learn a generic notion, and fine-tuned online for specific objects. Extensive experiments on both the video object segmentation and optical flow datasets demonstrate that introducing optical flow improves the performance of segmentation and vice versa, against the state-of-the-art algorithms.

Parallel Support Vector Machines: The Cascade SVM
Hans Peter Graf, Eric Cosatto, Léon Bottou, Igor Dourdanovic +1 more
2004409

We describe an algorithm for support vector machines (SVM) that can be parallelized efficiently and scales to very large problems with hundreds of thousands of training vectors. Instead of analyzing the whole training set in one optimization step, the data are split into subsets and optimized separately with multiple SVMs. The partial results are combined and filtered again in a ‘Cascade ’ of SVMs, until the global optimum is reached. The Cascade SVM can be spread over multiple processors with minimal communication overhead and requires far less memory, since the kernel matrices are much smaller than for a regular SVM. Convergence to the global optimum is guaranteed with multiple passes through the Cascade, but already a single pass provides good generalization. A single pass is 5x – 10x faster than a regular SVM for problems of 100,000 vectors when implemented on a single processor. Parallel implementations on a cluster of 16 processors were tested with over 1 million vectors (2-class problems), converging in a day or two, while a regular SVM never converged in over a week. 1

Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor
Wongun Choi
2015408doi:10.1109/iccv.2015.347

In this paper, we tackle two key aspects of multiple target tracking problem: 1) designing an accurate affinity measure to associate detections and 2) implementing an efficient and accurate (near) online multiple target tracking algorithm. As for the first contribution, we introduce a novel Aggregated Local Flow Descriptor (ALFD) that encodes the relative motion pattern between a pair of temporally distant detections using long term interest point trajectories (IPTs). Leveraging on the IPTs, the ALFD provides a robust affinity measure for estimating the likelihood of matching detections regardless of the application scenarios. As for another contribution, we present a Near-Online Multi-target Tracking (NOMT) algorithm. The tracking problem is formulated as a data-association between targets and detections in a temporal window, that is performed repeatedly at every frame. While being efficient, NOMT achieves robustness via integrating multiple cues including ALFD metric, target dynamics, appearance similarity, and long term trajectory regularization into the model. Our ablative analysis verifies the superiority of the ALFD metric over the other conventional affinity metrics. We run a comprehensive experimental evaluation on two challenging tracking datasets, KITTI [16] and MOT [2] datasets. The NOMT method combined with ALFD metric achieves the best accuracy in both datasets with significant margins (about 10% higher MOTA) over the state-of-the-art.

Large-scale image classification: Fast feature extraction and SVM training
Yuanqing Lin, Fengjun Lv, Shenghuo Zhu, Shuicheng Yan +4 more
2011382doi:10.1109/cvpr.2011.5995477

Most research efforts on image classification so far have been focused on medium-scale datasets, which are often defined as datasets that can fit into the memory of a desktop (typically 4G~48G). There are two main reasons for the limited effort on large-scale image classification. First, until the emergence of ImageNet dataset, there was almost no publicly available large-scale benchmark data for image classification. This is mostly because class labels are expensive to obtain. Second, large-scale classification is hard because it poses more challenges than its medium-scale counterparts. A key challenge is how to achieve efficiency in both feature extraction and classifier training without compromising performance. This paper is to show how we address this challenge using ImageNet dataset as an example. For feature extraction, we develop a Hadoop scheme that performs feature extraction in parallel using hundreds of mappers. This allows us to extract fairly sophisticated features (with dimensions being hundreds of thousands) on 1.2 million images within one day. For SVM training, we develop a parallel averaging stochastic gradient descent (ASGD) algorithm for training one-against-all 1000-class SVM classifiers. The ASGD algorithm is capable of dealing with terabytes of training data and converges very fast-typically 5 epochs are sufficient. As a result, we achieve state-of-the-art performance on the ImageNet 1000-class classification, i.e., 52.9% in classification accuracy and 71.8% in top 5 hit rate.

Evolutionary spectral clustering by incorporating temporal smoothness
Yün Chi, Xiaodan Song, Dengyong Zhou, Koji Hino +1 more
2007369doi:10.1145/1281192.1281212

Evolutionary clustering is an emerging research area essential to important applications such as clustering dynamic Web and blog contents and clustering data streams. In evolutionary clustering, a good clustering result should fit the current data well, while simultaneously not deviate too dramatically from the recent history. To fulfill this dual purpose, a measure of temporal smoothness is integrated in the overall measure of clustering quality. In this paper, we propose two frameworks that incorporate temporal smoothness in evolutionary spectral clustering. For both frameworks, we start with intuitions gained from the well-known k-means clustering problem, and then propose and solve corresponding cost functions for the evolutionary spectral clustering problems. Our solutions to the evolutionary spectral clustering problems provide more stable and consistent clustering results that are less sensitive to short-term noises while at the same time are adaptive to long-term cluster drifts. Furthermore, we demonstrate that our methods provide the optimal solutions to the relaxed versions of the corresponding evolutionary k-means clustering problems. Performance experiments over a number of real and synthetic data sets illustrate our evolutionary spectral clustering methods provide more robust clustering results that are not sensitive to noise and can adapt to data drifts.

Supervised translation-invariant sparse coding
Jianchao Yang, Kai Yu, Thomas S. Huang
2010361doi:10.1109/cvpr.2010.5539958

In this paper, we propose a novel supervised hierarchical sparse coding model based on local image descriptors for classification tasks. The supervised dictionary training is performed via back-projection, by minimizing the training error of classifying the image level features, which are extracted by max pooling over the sparse codes within a spatial pyramid. Such a max pooling procedure across multiple spatial scales offer the model translation invariant properties, similar to the Convolutional Neural Network (CNN). Experiments show that our supervised dictionary improves the performance of the proposed model significantly over the unsupervised dictionary, leading to state-of-the-art performance on diverse image databases. Further more, our supervised model targets learning linear features, implying its great potential in handling large scale datasets in real applications.

Development and Validation of a Protein-Based Risk Score for Cardiovascular Outcomes Among Patients With Stable Coronary Heart Disease
Peter Ganz, Bettina Heidecker, Kristian Hveem, Christian Jonasson +4 more
2016· JAMA361doi:10.1001/jama.2016.5951

IMPORTANCE: Precise stratification of cardiovascular risk in patients with coronary heart disease (CHD) is needed to inform treatment decisions. OBJECTIVE: To derive and validate a score to predict risk of cardiovascular outcomes among patients with CHD, using large-scale analysis of circulating proteins. DESIGN, SETTING, AND PARTICIPANTS: Prospective cohort study of participants with stable CHD. For the derivation cohort (Heart and Soul study), outpatients from San Francisco were enrolled from 2000 through 2002 and followed up through November 2011 (≤11.1 years). For the validation cohort (HUNT3, a Norwegian population-based study), participants were enrolled from 2006 through 2008 and followed up through April 2012 (5.6 years). EXPOSURES: Using modified aptamers, 1130 proteins were measured in plasma samples. MAIN OUTCOMES AND MEASURES: A 9-protein risk score was derived and validated for 4-year probability of myocardial infarction, stroke, heart failure, and all-cause death. Tests, including the C statistic, were used to assess performance of the 9-protein risk score, which was compared with the Framingham secondary event model, refit to the cohorts in this study. Within-person change in the 9-protein risk score was evaluated in the Heart and Soul study from paired samples collected 4.8 years apart. RESULTS: From the derivation cohort, 938 samples were analyzed, participants' median age at enrollment was 67.0 years, and 82% were men. From the validation cohort, 971 samples were analyzed, participants' median age at enrollment was 70.2 years, and 72% were men. In the derivation cohort, C statistics were 0.66 for refit Framingham, 0.74 for 9-protein, and 0.75 for refit Framingham plus 9-protein models. In the validation cohort, C statistics were 0.64 for refit Framingham, 0.70 for 9-protein, and 0.71 for refit Framingham plus 9-protein models. Adding the 9-protein risk score to the refit Framingham model increased the C statistic by 0.09 (95% CI, 0.06-0.12) in the derivation cohort, and in the validation cohort, the C statistic was increased by 0.05 (95% CI, 0.02-0.09). Compared with the refit Framingham model, the integrated discrimination index for the 9-protein model was 0.12 (95% CI, 0.08-0.16) in the derivation cohort and 0.08 (95% CI, 0.05-0.10) in the validation cohort. In analysis of paired samples among 139 participants with cardiovascular events after the second sample, absolute within-person annualized risk increased more for the 9-protein model (median, 1.86% [95% CI, 1.15%-2.54%]) than for the refit Framingham model (median, 1.00% [95% CI, 0.87%-1.19%]) (P = .002), while among 375 participants without cardiovascular events, both scores changed less and similarly (P = .30). CONCLUSIONS AND RELEVANCE: Among patients with stable CHD, a risk score based on 9 proteins performed better than the refit Framingham secondary event risk score in predicting cardiovascular events, but still provided only modest discriminative accuracy. Further research is needed to assess whether the score is more accurate in a lower-risk population.

Towards Large-Pose Face Frontalization in the Wild
Xi Yin, Yu Xiang, Kihyuk Sohn, Xiaoming Liu +1 more
2017354doi:10.1109/iccv.2017.430

Despite recent advances in face recognition using deep learning, severe accuracy drops are observed for large pose variations in unconstrained environments. Learning pose-invariant features is one solution, but needs expensively labeled large-scale data and carefully designed feature learning algorithms. In this work, we focus on frontalizing faces in the wild under various head poses, including extreme profile view's. We propose a novel deep 3D Morphable Model (3DMM) conditioned Face Frontalization Generative Adversarial Network (GAN), termed as FF-GAN, to generate neutral head pose face images. Our framework differs from both traditional GANs and 3DMM based modeling. Incorporating 3DMM into the GAN structure provides shape and appearance priors for fast convergence with less training data, while also supporting end-to-end training. The 3DMM-conditioned GAN employs not only the discriminator and generator loss but also a new masked symmetry loss to retain visual quality under occlusions, besides an identity loss to recover high frequency information. Experiments on face recognition, landmark localization and 3D reconstruction consistently show the advantage of our frontalization method on faces in the wild datasets.