NEC (China)
companyBeijing, China
Research output, citation impact, and the most-cited recent papers from NEC (China) (China). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from NEC (China)
The interpolation, prediction, and feature analysis of fine-gained air quality are three important topics in the area of urban air computing. The solutions to these topics can provide extremely useful information to support air pollution control, and consequently generate great societal and technical impacts. Most of the existing work solves the three problems separately by different models. In this paper, we propose a general and effective approach to solve the three problems in one model called the Deep Air Learning (DAL). The main idea of DAL lies in embedding feature selection and semi-supervised learning in different layers of the deep learning network. The proposed approach utilizes the information pertaining to the unlabeled spatio-temporal data to improve the performance of the interpolation and the prediction, and performs feature selection and association analysis to reveal the main relevant features to the variation of the air quality. We evaluate our approach with extensive experiments based on real data sources obtained in Beijing, China. Experiments show that DAL is superior to the peer models from the recent literature when solving the topics of interpolation, prediction, and feature analysis of fine-gained air quality.
With the rapid growth of social Web applications such as Twitter and online advertisements, the task of understanding short texts is becoming more and more important. Most traditional text mining techniques are designed to handle long text documents. For short text messages, many of the existing techniques are not effective due to the sparseness of text representations. To understand short messages, we observe that it is often possible to find topically related long texts, which can be utilized as the auxiliary data when mining the target short texts data. In this article, we present a novel approach to cluster short text messages via transfer learning from auxiliary long text data. We show that while some previous work exists that enhance short text clustering with related long texts, most of them ignore the semantic and topical inconsistencies between the target and auxiliary data and hurt the clustering performance. To accommodate the possible inconsistency between source and target data, we propose a novel topic model - Dual Latent Dirichlet Allocation (DLDA) model, which jointly learns two sets of topics on short and long texts and couples the topic parameters to cope with the potential inconsistency between data sets. We demonstrate through large-scale clustering experiments on both advertisements and Twitter data that we can obtain superior performance over several state-of-art techniques for clustering short text documents.
Pedestrian re-identification is a difficult problem due to the large variations in a person's appearance caused by different poses and viewpoints, illumination changes, and occlusions. Spatial alignment is commonly used to address these issues by treating the appearance of different body parts independently. However, a body part can also appear differently during different phases of an action. In this paper we consider the temporal alignment problem, in addition to the spatial one, and propose a new approach that takes the video of a walking person as input and builds a spatio-temporal appearance representation for pedestrian re-identification. Particularly, given a video sequence we exploit the periodicity exhibited by a walking person to generate a spatio-temporal body-action model, which consists of a series of body-action units corresponding to certain action primitives of certain body parts. Fisher vectors are learned and extracted from individual body-action units and concatenated into the final representation of the walking person. Unlike previous spatio-temporal features that only take into account local dynamic appearance information, our representation aligns the spatio-temporal appearance of a pedestrian globally. Extensive experiments on public datasets show the effectiveness of our approach compared with the state of the art.
The benefit of device-to-device (D2D) communication hinges on intelligent resource sharing between cellular and D2D users. This letter aims to optimize resource sharing for D2D communication to better utilize uplink resources in a multi-user cellular system with guaranteed quality of normal cellular communications. Despite the nonconvex difficulty, we provide an analytical characterization of the globally optimal resource sharing strategy, and furthermore propose two suboptimal strategies with less complexity. The superiority of the proposed resource sharing strategies is demonstrated through numerical examples.
Collaborative filtering algorithms attempt to predict a user's interests based on his past feedback. In real world applications, a user's feedback is often continuously collected over a long period of time. It is very common for a user's interests or an item's popularity to change over a long period of time. Therefore, the underlying recommendation algorithm should be able to adapt to such changes accordingly. However, most existing algorithms do not distinguish current and historical data when predicting the users' current interests. In this paper, we consider a new problem - online evolutionary collaborative filtering, which tracks user interests over time in order to make timely recommendations. We extended the widely used neighborhood based algorithms by incorporating temporal information and developed an incremental algorithm for updating neighborhood similarities with new data. Experiments on two real world datasets demonstrated both improved effectiveness and efficiency of the proposed approach.
A central goal of collaborative filtering (CF) is to rank items by their utilities with respect to individual users in order to make personalized recommendations. Traditionally, this is often formulated as a rating prediction problem. However, it is more desirable for CF algorithms to address the ranking problem directly without going through an extra rating prediction step. In this paper, we propose the probabilistic latent preference analysis (pLPA) model for ranking predictions by directly modeling user preferences with respect to a set of items rather than the rating scores on individual items. From a user's observed ratings, we extract his preferences in the form of pairwise comparisons of items which are modeled by a mixture distribution based on Bradley-Terry model. An EM algorithm for fitting the corresponding latent class model as well as a method for predicting the optimal ranking are described. Experimental results on real world data sets demonstrated the superiority of the proposed method over several existing CF algorithms based on rating predictions in terms of ranking performance measure NDCG.
This paper presents the SELC Model (SElf-Supervised, (Lexicon-based and (Corpus-based Model) for sentiment classification. The SELC Model includes two phases. The first phase is a lexicon-based iterative process. In this phase, some reviews are initially classified based on a sentiment dictionary. Then more reviews are classified through an iterative process with a negative/positive ratio control. In the second phase, a supervised classifier is learned by taking some reviews classified in the first phase as training data. Then the supervised classifier applies on other reviews to revise the results produced in the first phase. Experiments show the effectiveness of the proposed model. SELC totally achieves 6.63% F1-score improvement over the best result in previous studies on the same data (from 82.72% to 89.35%). The first phase of the SELC Model independently achieves 5.90% improvement (from 82.72% to 88.62%). Moreover, the standard deviation of F1-scores is reduced, which shows that the SELC Model could be more suitable for domain-independent sentiment classification.
Videos taken under low lighting condition usually have serious loss of visibility and contrast and are inconvenient for observation and analysis. To solve this problem, this paper presents a real-time night video enhancement approach. As observed that a pixel-wise inversion of a night video has quite similar appearance with the video acquired at foggy days, we use the similar idea of haze removal method to enhance the perceptual quality of the night videos. We present an improved dark channel prior model and integrate it with local smoothing and image Gaussian Pyramid operators. The experimental results demonstrate that the proposed approach can improve the perceptual quality of night videos in real-time in terms of not only enhancing details, but also effectively avoiding excessive enhancement phenomenon.
Abstract. We propose a security architecture that provides two fundamental security services for VANETS: i) non-repudiation and ii) privacy enhancement. Due to a new PKI concept, referred to as PKI+, users are autonomous in deriving public keys, certificates and pseudonyms which minimizes the communication to the certificate authority. Security techniques are supported on all layers of the protocol stack. In particular we show how to link the PKI+ concepts to solutions for routing in vehicleto-vehicle and vehicle-to-infrastructure communication. 1 Vehicular communication Vehicular Ad Hoc Networks (VANETs) currently seem to be one of the civilian ad hoc network applications which are most relevant due to their impact to market. A set of nearly 50 applications have been submitted by major car manufacturers BMW, Daimler-Chrysler, Ford, and GM which are based on Dedicated Short Range Communication (DSRC) technology [1]. The applications are roughly classified into public safety and private applications. Public safety applications
Most collaborative filtering algorithms are based on certain statistical models of user interests built from either explicit feedback (eg: ratings, votes) or implicit feedback (eg: clicks, purchases). Explicit feedbacks are more precise but more difficult to collect from users while implicit feedbacks are much easier to collect though less accurate in reflecting user preferences. In the existing literature, separate models have been developed for either of these two forms of user feedbacks due to their heterogeneous representation. However in most real world recommended systems both explicit and implicit user feedback are abundant and could potentially complement each other. It is desirable to be able to unify these two heterogeneous forms of user feedback in order to generate more accurate recommendations. In this work, we developed matrix factorization models that can be trained from explicit and implicit feedback simultaneously. Experimental results of multiple datasets showed that our algorithm could effectively combine these two forms of heterogeneous user feedback to improve recommendation quality.
Denial of Service (DoS) attacks frequently happen on the Internet, paralyzing Internet services and causing millions of dollars of financial loss. This work presents NetFence, a scalable DoS-resistant network architecture. NetFence uses a novel mechanism, secure congestion policing feedback, to enable robust congestion policing inside the network. Bottleneck routers update the feedback in packet headers to signal congestion, and access routers use it to police senders' traffic. Targeted DoS victims can use the secure congestion policing feedback as capability tokens to suppress unwanted traffic. When compromised senders and receivers organize into pairs to congest a network link, NetFence provably guarantees a legitimate sender its fair share of network resources without keeping per-host state at the congested link. We use a Linux implementation, ns-2 simulations, and theoretical analysis to show that NetFence is an effective and scalable DoS solution: it reduces the amount of state maintained by a congested router from per-host to at most per-(Autonomous System).
Traditional sentiment analysis mainly considers binary classifications of reviews, but in many real-world sentiment classification problems, nonbinary review ratings are more useful. This is especially true when consumers wish to compare two products, both of which are not negative. Previous work has addressed this problem by extracting various features from the review text for learning a predictor. Since the same word may have different sentiment effects when used by different reviewers on different products, we argue that it is necessary to model such reviewer and product dependent effects in order to predict review ratings more accurately. In this paper, we propose a novel learning framework to incorporate reviewer and product information into the text based learner for rating prediction. The reviewer, product and text features are modeled as a three-dimension tensor. Tensor factorization techniques can then be employed to reduce the data sparsity problems. We perform extensive experiments to demonstrate the effectiveness of our model, which has a significant improvement compared to state of the art methods, especially for reviews with unpopular products and inactive reviewers.
Achieving efficient and fair bandwidth allocation while minimizing packet loss and bottleneck queue in high bandwidth-delay product networks has long been a daunting challenge. Existing end-to-end congestion control (e.g., TCP) and traditional congestion notification schemes (e.g., TCP+AQM/ECN) have significant limitations in achieving this goal. While the XCP protocol addresses this challenge, it requires multiple bits to encode the congestion-related information exchanged between routers and end-hosts. Unfortunately, there is no space in the IP header for these bits, and solving this problem involves a non-trivial and time-consuming standardization process. In this paper, we design and implement a simple, low-complexity protocol, called variable-structure congestion control protocol (VCP), that leverages only the existing two ECN bits for network congestion feedback, and yet achieves comparable performance to XCP, i.e., high utilization, negligible packet loss rate, low persistent queue length, and reasonable fairness. On the downside, VCP converges significantly slower to a fair allocation than XCP. We evaluate the performance of VCP using extensive ns2 simulations over a wide range of network scenarios and find that it significantly outperforms many recently-proposed TCP variants, such as HSTCP, FAST, CUBIC, etc. To gain insight into the behavior of VCP, we analyze a simplified fluid model and prove its global stability for the case of a single bottleneck shared by synchronous flows with identical round-trip times.
This paper considers device-to-device (D2D) communications underlaying cellular networks with a multi-antenna base station (BS). The BS serves its own cellular users while letting another remote terminal directly transmit signals to its nearby receiver via a D2D link. Two transmit strategies including beamforming (BF) and interference cancellation (IC) are considered at the BS for performance evaluation in terms of achievable channel capacity. The capacity performance of two different cases with perfect and quantized channel knowledge at the transmitter is derived with closed-form expressions. Based on these results, an adaptive transmission scheme to switch between BF and IC is proposed. Numerical results verify the accuracy of the derived expressions and draw the operating regions of BF/IC strategies.
Most existing collaborative filtering models only consider the use of user feedback (e.g., ratings) and meta data (e.g., content, demographics). However, in most real world recommender systems, context information, such as time and social networks, are also very important factors that could be considered in order to produce more accurate recommendations. In this work, we address several challenges for the context aware movie recommendation tasks in CAMRa 2010: (1) how to combine multiple heterogeneous forms of user feedback? (2) how to cope with dynamic user and item characteristics? (3) how to capture and utilize social connections among users? For the first challenge, we propose a novel ranking based matrix factorization model to aggregate explicit and implicit user feedback. For the second challenge, we extend this model to a sequential matrix factorization model to enable time-aware parametrization. Finally, we introduce a network regularization function to constrain user parameters based on social connections. To the best of our knowledge, this is the first study that investigates the collective modeling of social and temporal dynamics. Experiments on the CAMRa 2010 dataset demonstrated clear improvements over many baselines.
Analyzing time series data can reveal the temporal behavior of the underlying mechanism producing the data. Time series motifs, which are similar subsequences or frequently occurring patterns, have significant meanings for researchers especially in medical domain. With the fast growth of time series data, traditional methods for motif discovery are inefficient and not applicable to large-scale data. This work proposes an efficient <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">M</i> otif <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">D</i> iscovery method for <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">La</i> rge-scale <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">t</i> ime <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">s</i> eries (MDLats). By computing standard motifs, MDLats eliminates a majority of redundant computation in the related arts and reuses existing information to the maximum. All the motif types and subsequences are generated for subsequent analysis and classification. Our system is implemented on a Hadoop platform and deployed in a hospital for clinical electrocardiography classification. The experiments on real-world healthcare data show that MDLats outperform the state-of-the-art methods even in large time series.
In this paper, we address a resource allocation problem, where a pair of device-to-device terminals are integrated into a time division duplex (TDD) cellular network. By introducing an incremental relay transmission scheme for the D2D communication, the D2D transmitter, traditionally believed to be the source of interference, are coordinated with other cellular user equipments (CUEs) in the uplink session. In consequence, both the D2D receiver and the central base station (CBS) are able to decode the message sent from the D2D transmitter. The CBS, in the following downlink session, may forward this message to the D2D receiver if the direct D2D link is in outage. We formulate and solve the cell throughput maximization problem for three transmission modes: cellular, underlay transmission, and incremental relay mode. Simulation results show that the proposed incremental relay is not only with higher spectral efficiency, but also provides more reliable D2D transmission than the cellular relay and the underlay scheme.
Multicast benefits data center group communications in saving network bandwidth and increasing application throughput. However, it is challenging to scale Multicast to support tens of thousands of concurrent group communications due to limited forwarding table memory space in the switches, particularly the low-end ones commonly used in modern data centers. Bloom Filter is an efficient tool to compress the Multicast forwarding table, but significant traffic leakage may occur when group membership testing is false positive. To reduce the Multicast traffic leakage, in this paper we bring forward a novel multi-class Bloom Filter (MBF), which extends the standard Bloom Filter by embracing element uncertainty. Specifically, MBF sets the number of hash functions in a per-element level, based on the probability for each Multicast group to be inserted into the Bloom Filter. We design a simple yet effective algorithm to calculate the number of hash functions for each Multicast group. We have prototyped a software based MBF forwarding engine on the Linux platform. Simulation and prototype evaluation results demonstrate that MBF can significantly reduce Multicast traffic leakage compared to the standard Bloom Filter, while causing little system overhead.
Optical data center networks (DCNs) are becoming increasingly attractive due to their technological strengths compared to traditional electrical networks. However, prior optical DCNs are either hard to scale, vulnerable to single point of failure, or provide limited network bisection bandwidth for many practical DCN workloads. To this end, we present WaveCube, a scalable, fault-tolerant, high-performance optical DCN architecture. To scale, WaveCube removes MEMS <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1</sup> , a potential bottleneck, from its design. Wave-Cube is fault-tolerant since it does not have single point of failure and there are multiple node-disjoint parallel paths between any pair of Top-of-Rack (ToR) switches. WaveCube delivers high performance by exploiting multi-pathing and dynamic link bandwidth along the path. Our extensive evaluation results show that WaveCube outperforms previous optical DCNs by up to 400% and delivers network bisection bandwidth that is 70%–85% of an ideal non-blocking network under both realistic and synthetic traffic patterns. WaveCube's performance degrades gracefully under failures — it drops 20% even with 20% links cut. WaveCube also holds promise in practice — its wiring complexity is orders of magnitude lower than Fattree, BCube and c-Through at large scale, and its power consumption is 35% of them.
In pervasive computing, localizing a user in wireless indoor environments is an important yet challenging task. Among the state-of-art localization methods, fingerprinting is shown to be quite successful by statistically learning the signal to location relations. However, a major drawback for fingerprinting is that, it usually requires a lot of labeled data to train an accurate localization model. To establish a fingerprinting-based localization model in a building with many floors, we have to collect sufficient labeled data on each floor. This effort can be very burdensome. In this paper, we study how to reduce this calibration effort by only collecting the labeled data on one floor, while collecting unlabeled data on other floors. Our idea is inspired by the observation that, although the wireless signals can be quite different, the floor-plans in a building are similar. Therefore, if we co-embed these different floors' data in some common low-dimensional manifold, we are able to align the unlabeled data with the labeled data well so that we can then propagate the labels to the unlabeled data. We conduct empirical evaluations on real-world multi-floor data sets to validate our proposed method.