Institute of Scientific and Technical Information of China
governmentBeijing, China
Research output, citation impact, and the most-cited recent papers from Institute of Scientific and Technical Information of China (China). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Institute of Scientific and Technical Information of China
The increasing pervasiveness of location-acquisition technologies (GPS, GSM networks, etc.) is leading to the collection of large spatio-temporal datasets and to the opportunity of discovering usable knowledge about movement behaviour, which fosters novel applications and services. In this paper, we move towards this direction and develop an extension of the sequential pattern mining paradigm that analyzes the trajectories of moving objects. We introduce trajectory patterns as concise descriptions of frequent behaviours, in terms of both space (i.e., the regions of space visited during movements) and time (i.e., the duration of movements). In this setting, we provide a general formal statement of the novel mining problem and then study several different instantiations of different complexity. The various approaches are then empirically evaluated over real data and synthetic benchmarks, comparing their strengths and weaknesses.
ICVL 2011 Conference paper
The automatic extraction of chemical information from text requires the recognition of chemical entity mentions as one of its key steps. When developing supervised named entity recognition (NER) systems, the availability of a large, manually annotated text corpus is desirable. Furthermore, large corpora permit the robust evaluation and comparison of different approaches that detect chemicals in documents. We present the CHEMDNER corpus, a collection of 10,000 PubMed abstracts that contain a total of 84,355 chemical entity mentions labeled manually by expert chemistry literature curators, following annotation guidelines specifically defined for this task. The abstracts of the CHEMDNER corpus were selected to be representative for all major chemical disciplines. Each of the chemical entity mentions was manually labeled according to its structure-associated chemical entity mention (SACEM) class: abbreviation, family, formula, identifier, multiple, systematic and trivial. The difficulty and consistency of tagging chemicals in text was measured using an agreement study between annotators, obtaining a percentage agreement of 91. For a subset of the CHEMDNER corpus (the test set of 3,000 abstracts) we provide not only the Gold Standard manual annotations, but also mentions automatically detected by the 26 teams that participated in the BioCreative IV CHEMDNER chemical mention recognition task. In addition, we release the CHEMDNER silver standard corpus of automatically extracted mentions from 17,000 randomly selected PubMed abstracts. A version of the CHEMDNER corpus in the BioC format has been generated as well. We propose a standard for required minimum information about entity annotations for the construction of domain specific corpora on chemical and drug entities. The CHEMDNER corpus and annotation guidelines are available at: http://www.biocreative.org/resources/biocreative-iv/chemdner-corpus/.
The rise of sophisticated machine learning models has brought accurate but obscure decision systems, which hide their logic, thus undermining transparency, trust, and the adoption of artificial intelligence (AI) in socially sensitive and safety-critical contexts. We introduce a local rule-based explanation method, providing faithful explanations of the decision made by a black box classifier on a specific instance. The proposed method first learns an interpretable, local classifier on a synthetic neighborhood of the instance under investigation, generated by a genetic algorithm. Then, it derives from the interpretable classifier an explanation consisting of a decision rule, explaining the factual reasons of the decision, and a set of counterfactuals, suggesting the changes in the instance features that would lead to a different outcome. Experimental results show that the proposed method outperforms existing approaches in terms of the quality of the explanations and of the accuracy in mimicking the black box.
Text classification is the task of assigning predefined categories to natural language documents, and it can provide conceptual views of document collections. The Naïve Bayes (NB) classifier is a family of simple probabilistic classifiers based on a common assumption that all features are independent of each other, given the category variable, and it is often used as the baseline in text classification. However, classical NB classifiers with multinomial, Bernoulli and Gaussian event models are not fully Bayesian. This study proposes three Bayesian counterparts, where it turns out that classical NB classifier with Bernoulli event model is equivalent to Bayesian counterpart. Finally, experimental results on 20 newsgroups and WebKB data sets show that the performance of Bayesian NB classifier with multinomial event model is similar to that of classical counterpart, but Bayesian NB classifier with Gaussian event model is obviously better than classical counterpart.
Data security has consistently been a major issue in information technology. In the cloud computing environment, it becomes particularly serious because the data is located in different places even in all the globe. Data security and privacy protection are the two main factors of user's concerns about the cloud technology. Though many techniques on the topics in cloud computing have been investigated in both academics and industries, data security and privacy protection are becoming more important for the future development of cloud computing technology in government, industry, and business. Data security and privacy protection issues are relevant to both hardware and software in the cloud architecture. This study is to review different security techniques and challenges from both software and hardware aspects for protecting data in the cloud and aims at enhancing the data security and privacy protection for the trustworthy cloud environment. In this paper, we make a comparative research analysis of the existing research work regarding the data security and privacy protection techniques used in the cloud computing.
A highly flexible porous ionic membrane (PIM) is fabricated from a polyvinyl alcohol/KOH polymer gel electrolyte, showing well-defined 3D porous structure. The conductance of the PIM changes more than 70 times as the relative humidity (RH) increases from 10.89% to 81.75% with fast and reversible response at room temperature. In addition, the PIM-based sensor is insensitive to temperature (0-95 °C) and pressure (0-6.8 kPa) change, which indicates that it can be used as highly selective flexible humidity sensor. A noncontact switch system containing PIM-based sensor is assembled, and results show that the switch responds favorably to RH change caused by an approaching finger. Moreover, an attachable smart label using PIM-based sensor is explored to measure the water contents of human skin, which shows a great linear relationship between the sensitivity of the sensor and the facial water contents measured by a commercial reference device.
Keywords are subset of words or phrases from a document that can describe the meaning of the document. Many text\nmining applications can take advantage from it. Unfortunately, a large portion of documents still do not have keywords assigned. On the other hand, manual assignment of high quality keywords is expensive, time-consuming, and error prone. Therefore, most algorithms and systems aimed to help people perform automatic keywords extraction have been proposed. Conditional Random Fields (CRF) model is a state-of-the-art sequence labeling method, which can use the features of documents more sufficiently and effectively. At the same time, keywords extraction can be considered as the string labeling. In this paper, keywords extraction based on CRF is proposed and implemented. As far as we know, using CRF model in keyword extraction has not been investigated previously. Experimental results show that the CRF model outperforms other machine learning methods such as support vector machine, multiple linear regression model etc. in the\ntask of keywords extraction.
In this paper we study the trade-offs in designing efficient caching systems for Web search engines. We explore the impact of different approaches, such as static vs. dynamic caching, and caching query results vs.caching posting lists. Using a query log spanning a whole year we explore the limitations of caching and we demonstrate that caching posting lists can achieve higher hit rates than caching query answers. We propose a new algorithm for static caching of posting lists, which outperforms previous methods. We also study the problem of finding the optimal way to split the static cache between answers and posting lists. Finally, we measure how the changes in the query log affect the effectiveness of static caching, given our observation that the distribution of the queries changes slowly over time. Our results and observations are applicable to different levels of the data-access hierarchy, for instance, for a memory/disk layer or a broker/remote server layer.
This paper introduces the conception of open complex giant system and the methodology for dealing with the system, with stress on its profound significance in development of science and technology. The authors conclude that the reductionism underlying the exact science is not suitable to open complex giant system, and the only feasible alternative is the meta-synthetic engineering from the qualitative to the quantitative.
Wireless sensor network (WSN) in the Internet of Things consists of a large number of nodes. The proposal of compressive sensing technology provides a novel way for data aggregation in WSN. Based on the clustering structure of WSN, a kind of effective data aggregating method based on compressive sensing is proposed in this paper. The aggregating process is divided into two parts: in the cluster, the sink node sets the corresponding seed vector based on the distribution of network and then sends it to each cluster head. Cluster head can generate corresponding own random spacing sparse matrix based on its received seed vector and collect data through compressive sensing technology. Among clusters, clusters forward measurement values to the sink node along multi-hop routing tree. Performance analysis and comparison with the relative methods show that our method is effective and superior to other methods regardless of intra-cluster or inter-cluster on the total energy consumption of network.
We describe an efficient technique for out-of-core construction and accurate view-dependent visualization of very large surface models. The method uses a regular conformal hierarchy of tetrahedra to spatially partition the model. Each tetrahedral cell contains a precomputed simplified version of the original model, represented using cache coherent indexed strips for fast rendering. The representation is constructed during a fine-to-coarse simplification of the surface contained in diamonds (sets of tetrahedral cells sharing their longest edge). The construction preprocess operates out-of-core and parallelizes nicely. Appropriate boundary constraints are introduced in the simplification to ensure that all conforming selective subdivisions of the tetrahedron hierarchy lead to correctly matching surface patches. For each frame at runtime, the hierarchy is traversed coarse-to-fine to select diamonds of the appropriate resolution given the view parameters. The resulting system can interatively render high quality views of out-of-core models of hundreds of millions of triangles at over 40Hz (or 70M triangles/s) on current commodity graphics platforms.
With the growth of the amount of information manipulated by embedded application systems, which are embedded into devices and offer access to the devices on the internet, the requirements of saving the information systemically is necessary so as to fulfil access from the client and the local processing more efficiently. For supporting mobile applications, a design and implementation solution of embedded un-interruptible power supply (UPS) system (in brief, EUPSS) is brought forward for long-distance monitoring and controlling of UPS based on Web. The implementation of system is based on ATmega161, RTL8019AS and Arm chips with TCP/IP protocol suite for communication. In the embedded UPS system, an embedded file system is designed and implemented which saves the data and index information on a serial EEPROM chip in a structured way and communicates with a microcontroller unit through I2C bus. By embedding the file system into UPS system or other information appliances, users can access and manipulate local data on the web client side. Embedded file system on chips will play a major role in the growth of IP networking. Based on our experiment tests, the mobile users can easily monitor and control UPS in different places of long-distance. The performance of EUPSS has satisfied the requirements of all kinds of Web-based mobile applications.
Digital technologies are transforming the way cultural heritage researchers, archaeologists, and curators work by providing new ways to collaborate, record excavations, and restore artifacts.
The Journal Citation Reports of the Science Citation Index 2004 were used to delineate a core set of nanotechnology journals and a nanotechnology-relevant set. In comparison with 2003, the core set has grown and the relevant set has decreased. This suggests a higher degree of codification in the field of nanotechnology: the field has become more focused in terms of citation practices. Using the citing patterns among journals at the aggregate level, a core group of ten nanotechnology journals in the vector space can be delineated on the criterion of betweenness centrality. National contributions to this core group of journals are evaluated for the years 2003, 2004, and 2005. Additionally, the specific class of nanotechnology patents in the database of the U. S. Patent and Trade Office (USPTO) is analyzed to determine if non-patent literature references can be used as a source for the delineation of the knowledge base in terms of scientific journals. The references are primarily to general science journals and letters, and therefore not specific enough for the purpose of delineating a journal set.
A new approach in the time domain is developed for the study of singular linear systems of the form <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">E\dot{x} = Ax + Bu, y = Cx</tex> with <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">E</tex> singular. Central to the approach is the fundamental triple <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">((\alpha E - A)^{-1}E, (\alphaE - A)^{-1}B, C)</tex> with α a real number satisfying det <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">(\alpha E - A) \neq 0</tex> . Controllability and observability matrices are expressed in terms of the fundamental triple. New tests for impulse controllability and impulse observability are also established. Feedback control problems including pole placement, decoupling, and disturbance localization are studied by use of a modified proportional and derivative feedback of the state in the form of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">u = F(\alpha x - \dot{x})+ v</tex> .
The last decade has witnessed the emergence of massive mobility datasets, such as tracks generated by GPS devices, call detail records, and geo-tagged posts from social media platforms. These datasets have fostered a vast scientific production on various applications of mobility analysis, ranging from computational epidemiology to urban planning and transportation engineering. A strand of literature addresses data cleaning issues related to raw spatiotemporal trajectories, while the second line of research focuses on discovering the statistical
Available accessibility guidelines do not necessarily guarantee usable web sites, in particular when specific groups of users with special needs are considered. We have identified 15 Web design criteria aiming to provide integrated support of accessibility and usability for vision-impaired users. In this paper, we present the results of a study investigating whether the application of such guidelines for vision-impaired users can actually improve their task performance when accessing Web applications. We report on two user tests, both involving vision-impaired users, which aim to provide empirical validation of the design criteria. During each test, users had to access and navigate two versions of a Web site: one version supporting the selected design criteria and one obtained with traditional techniques. Our results indicate that the 15 design criteria improved Web site usability both quantitatively and qualitatively by reducing the navigation time needed to perform the assigned tasks and by making the Web sites easier to navigate for blind and low-vision users. 1.
A kind of novel method of power allocation with limited cross-tier interference for cognitive radio network (CRN) is proposed in this paper. In this method, an interference-limited power allocation algorithm based on filter bank multi-carrier-offset quadrature amplitude modulation (FBMC-OQAM) is put forward. In order to improve the energy efficiency of the entire network and protect secondary users (SUs) in the network from too much interference, cross-tier interference limit is adopted, at the same time, virtual queue is designed to transform the extra packet delay caused by the contention for the channel of multi-user into the queuing delay. Taking the energy efficiency as the objective function, a nonlinear programming approach with nonlinear constraints is innovatively proposed under the constraints of time delay and transmission power. An iterative algorithm in order to solve the problem is also put forward. In the new algorithm, the fractional objective function is transformed into polynomial form, and the global optimal solution is obtained by iteration after reducing the computational complexity. In addition, a sub-optimal algorithm is proposed to reduce computational complexity. The experimental results show that the optimal algorithm has higher performance while the sub-optimal algorithm has a lower computational complexity. The presented method has very good practical importance for the CRN.
Desertification is a serious threat to the ecological environment and social economy in our world and there is a pressing need to develop a reasonable and reproducible method to assess it at different scales. In this paper, the Ordos Plateau in China was selected as the research region and a quantitative method for desertification assessment was developed by using Landsat MSS and TM/ETM+ data on a regional scale. In this method, NDVI, MSDI and land surface albedo were selected as assessment indicators of desertification to represent land surface conditions from vegetation biomass, landscape pattern and micrometeorology. Based on considering the effects of vegetation type and time of images acquired on assessment indictors, assessing rule sets were built and a decision tree approach was used to assess desertification of Ordos Plateau in 1980, 1990 and 2000. The average overall accuracy of three periods was higher than 90%. The results showed that although some local places of Ordos Plateau experienced an expanding trend of desertification, the trend of desertification of Ordos Plateau was an overall decrease in from 1980 to 2000. By analyzing the causes of desertification processes, it was found that climate change could benefit for the reversion of desertification from 1980 to 1990 at a regional scale and human activities might explain the expansion of desertification in this period; however human conservation activities were the main driving factor that induced the reversion of desertification from 1990 to 2000.