State Key Laboratory of Computer Science
facilityBeijing, China
Research output, citation impact, and the most-cited recent papers from State Key Laboratory of Computer Science. Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from State Key Laboratory of Computer Science
Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the hallucination of large language models (LLMs). However, existing research lacks rigorous evaluation of the impact of retrieval-augmented generation on different large language models, which make it challenging to identify the potential bottlenecks in the capabilities of RAG for different LLMs. In this paper, we systematically investigate the impact of Retrieval-Augmented Generation on large language models. We analyze the performance of different large language models in 4 fundamental abilities required for RAG, including noise robustness, negative rejection, information integration, and counterfactual robustness. To this end, we establish Retrieval-Augmented Generation Benchmark (RGB), a new corpus for RAG evaluation in both English and Chinese. RGB divides the instances within the benchmark into 4 separate testbeds based on the aforementioned fundamental abilities required to resolve the case. Then we evaluate 6 representative LLMs on RGB to diagnose the challenges of current LLMs when applying RAG. Evaluation reveals that while LLMs exhibit a certain degree of noise robustness, they still struggle significantly in terms of negative rejection, information integration, and dealing with false information. The aforementioned assessment outcomes indicate that there is still a considerable journey ahead to effectively apply RAG to LLMs.
Fine-grained visual categorization (FGVC) is an important but challenging task due to high intra-class variances and low inter-class variances caused by deformation, occlusion, illumination, etc. An attention convolutional binary neural tree architecture is presented to address those problems for weakly supervised FGVC. Specifically, we incorporate convolutional operations along edges of the tree structure, and use the routing functions in each node to determine the root-to-leaf computational paths within the tree. The final decision is computed as the summation of the predictions from leaf nodes. The deep convolutional operations learn to capture the representations of objects, and the tree structure characterizes the coarse-to-fine hierarchical feature learning process. In addition, we use the attention transformer module to enforce the network to capture discriminative features. The negative log-likelihood loss is used to train the entire network in an end-to-end fashion by SGD with back-propagation. Several experiments on the CUB-200-2011, Stanford Cars and Aircraft datasets demonstrate that the proposed method performs favorably against the state-of-the-arts.
Non-intrusive respiration sensing without any device attached to the target plays a particular important role in our everyday lives. However, existing solutions either require dedicated hardware or employ special-purpose signals which are not cost-effective, significantly limiting their real-life applications. Also very few work concerns about the theory behind and can explain the large performance variations in different scenarios. In this paper, we employ the cheap commodity Wi-Fi hardware already ubiquitously deployed around us for respiration sensing. For the first time, we utilize the Fresnel diffraction model to accurately quantify the relationship between the diffraction gain and human target's subtle chest displacement and thus successfully turn the previously considered "destructive" obstruction diffraction in the First Fresnel Zone (FFZ) into beneficial sensing capability. By not just considering the chest displacement at the frontside as the existing solutions, but also the subtle displacement at the backside, we achieve surprisingly matching results with respect to the theoretical plots and become the first to clearly explain the theory behind the performance distinction between lying and sitting for respiration sensing. With two cheap commodity Wi-Fi cards each equipped with just one antenna, we are able to achieve higher than 98% accuracy of respiration rate monitoring at more than 60% of the locations in the FFZ. Furthermore, we are able to present the detail heatmap of the sensing capability at each location inside the FFZ to guide the respiration sensing so users clearly know where are the good positions for respiration monitoring and if located at a bad position, how to move just slightly to reach a good position.
In 1953, Shannon proposed the question of quantification of structural information to analyze communication systems. The question has become one of the longest great challenges in information science and computer science. Here, we propose the first metric for structural information. Given a graph G , we define the K-dimensional structural information of G (or structure entropy of G), denoted by H <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">K</sup> (G) , to be the minimum overall number of bits required to determine the K-dimensional code of the node that is accessible from random walk in G. The K-dimensional structural information provides the principle for completely detecting the natural or true structure, which consists of the rules, regulations, and orders of the graphs, for fully distinguishing the order from disorder in structured noisy data, and for analyzing communication systems, solving the Shannon's problem and opening up new directions. The K-dimensional structural information is also the first metric of dynamical complexity of networks, measuring the complexity of interactions, communications, operations, and even evolution of networks. The metric satisfies a number of fundamental properties, including additivity, locality, robustness, local and incremental computability, and so on. We establish the fundamental theorems of the one- and two-dimensional structural information of networks, including both lower and upper bounds of the metrics of classic data structures, general graphs, the networks of models, and the networks of natural evolution. We propose algorithms to approximate the K-dimensional structural information of graphs by finding the K-dimensional structure of the graphs that minimizes the K-dimensional structure entropy. We find that the K-dimensional structure entropy minimization is the principle for detecting the natural or true structures in real-world networks. Consequently, our structural information provides the foundation for knowledge discovering from noisy data. We establish a black hole principle by using the two-dimensional structure information of graphs. We propose the natural rank of locally listing algorithms by the structure entropy minimization principle, providing the basis for a next-generation search engine.
In recent years, wireless sensing has been exploited as a promising research direction for contactless human activity recognition. However, one major issue hindering the real deployment of these systems is that the signal variation patterns induced by the human activities with different devices and environmental settings are neither stable nor consistent, resulting in unstable system performance. The existing machine learning based methods usually take the "black box" approach and fails to achieve consistent performance. In this paper, we argue that a deep understanding of radio signal propagation in wireless sensing is needed, and it may be possible to develop a deterministic sensing model to make the signal variation patterns predictable. With this intuition, in this paper we investigate: 1) how wireless signals are affected by human activities taking transceiver location and environment settings into consideration; 2) a new deterministic sensing approach to model the received signal variation patterns for different human activities; 3) a proof-of-concept prototype to demonstrate our approach and a case study to detect diverse activities. In particular, we propose a diffraction-based sensing model to quantitatively determine the signal change with respect to a target's motions, which eventually links signal variation patterns with motions, and hence can be used to recognize human activities. Through our case study, we demonstrate that the diffraction-based sensing model is effective and robust in recognizing exercises and daily activities. In addition, we demonstrate that the proposed model improves the recognition accuracy of existing machine learning systems by above 10%.
The efficiency of parameter estimation of quantum channels is studied in this paper. We introduce the concept of programmable parameters to the theory of estimation. It is found that programmable parameters obey the standard quantum limit strictly; hence, no speedup is possible in its estimation. We also construct a class of nonunitary quantum channels whose parameter can be estimated in a way that the standard quantum limit is broken. The study of estimation of general quantum channels also enables an investigation of the effect of noises on quantum estimation.
Enabling pervasive WiFi devices with non-contact sensing capability is an important topic in the field of integrated sensing and communication. Doppler effect has been widely exploited to estimate targets’ velocity from wireless signals. However, the separation of signal sources and receivers complicates the relationship between Doppler frequency shift (DFS) and target velocity in WiFi-based non-contact sensing systems. In contrast to existing works that rely on either approximated relations or coarse-grained information such as whether a target is moving toward or away from WiFi transceivers, this paper investigates rigorously the dependency of velocity estimation accuracy on target locations and headings in WiFi sensing systems. The theoretical insights allow us to derive a closed-form solution and understand the fundamental limitation of velocity estimation. To optimize velocity estimation performance, we devise a receiving device selection scheme that dynamically chooses the optimal set of receivers among multiple available WiFi devices. A prototype real-time target tracking system has been implemented using commodity WiFi devices. Extensive experimental results show that the proposed system outperforms state-of-the-art approaches in velocity estimation and tracking, and is able to achieve <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$9.38cm/s$ </tex-math></inline-formula> , 13.42°, <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$31.08cm$ </tex-math></inline-formula> median errors in speed, heading and location estimation amongst experiments conducted in three indoor environments with three device placements and eight human subjects over 15 trajectories.
The large-scale visual pretraining has significantly improve the performance of large vision models. However, we observe the low FLOPs pitfall that the existing low-FLOPs models cannot benefit from large-scale pretraining. In this paper, we introduce a novel design principle, termed ParameterNet, aimed at augmenting the number of parameters in large-scale visual pretraining models while minimizing the increase in FLOPs. We leverage dynamic convolutions to incorporate additional parameters into the networks with only a marginal rise in FLOPs. The ParameterNet approach allows low-FLOPs networks to take advantage of large-scale visual pretraining. Furthermore, we extend the ParameterNet concept to the language domain to enhance inference results while preserving inference speed. Experiments on the large-scale ImageNet-22K have shown the superiority of our ParameterNet scheme. For example, ParameterNet-600M can achieve higher accuracy than the widely-used Swin Transformer (81.6% vs. 80.9%) and has much lower FLOPs (0.6G vs. 4.5G). The code will be released at https://parameternet.github.io/.
The existing formal techniques are not suitable for elegantly modeling passing value indeterminacy and describing batch processing function in real-time cooperative systems. Moreover, the correct behavior of the systems depends on not only the logical correctness of the results obtained through running workflows but also the time of producing them before critical deadlines. For these purposes, this paper proposes an interorganizational logical workflow net (ILWN) for modeling and analyzing real-time cooperative systems based on time Petri nets, workflow techniques, and temporal logic. Through attaching logical expressions to some actions of an ILWN model, the size of the model is reduced. Thus, ILWNs can efficiently mitigate the state explosion problem to some extent. Also, this paper analyzes the soundness of a subclass of ILWNs: the or-restricted ILWNs. A rigorous analysis approach is given based on their static net structures only. The concepts and techniques proposed in this paper are illustrated with a seller-buyer example in electronic commerce.
PURPOSE: China has the largest cancer population globally. Surgery is the main choice for most solid cancer patients. Intraoperative fluorescence molecular imaging (FMI) has shown its great potential in assisting surgeons in achieving precise resection. We summarized the typical applications of intraoperative FMI and several new trends to promote the development of precision surgery. METHODS: The academic database and NIH clinical trial platform were systematically evaluated. We focused on the clinical application of intraoperative FMI in China. Special emphasis was placed on a series of typical studies with new technologies or high-level evidence. The emerging strategy of combining FMI with other modalities was also discussed. RESULTS: The clinical applications of clinically approved indocyanine green (ICG), methylene blue (MB), or fluorescein are on the rise in different surgical departments. Intraoperative FMI has achieved precise lesion detection, sentinel lymph node mapping, and lymphangiography for many cancers. Nerve imaging is also exploring to reduce iatrogenic injuries. Through different administration routes, these fluorescent imaging agents provided encouraging results in surgical navigation. Meanwhile, designing new cancer-specific fluorescent tracers is expected to be a promising trend to further improve the surgical outcome. CONCLUSIONS: Intraoperative FMI is in a rapid development in China. In-depth understanding of cancer-related molecular mechanisms is necessary to achieve precision surgery. Molecular-targeted fluorescent agents and multi-modal imaging techniques might play crucial roles in the era of precision surgery.
In the complex software systems, software agents always need to negotiate with other agents within their physical and social contexts when they execute tasks. Obviously, the capacity of a software agent to execute tasks is determined by not only itself but also its contextual agents; thus, the number of tasks allocated on an agent should be directly proportional to its self-owned resources as well as its contextual agents' resources. This paper presents a novel task allocation model based on the contextual resource negotiation. In the presented task allocation model, while a task comes to the software system, it is first assigned to a principal agent that has high contextual enrichment factor for the required resources; then, the principal agent will negotiate with its contextual agents to execute the assigned task. However, while multiple tasks come to the software system, it is necessary to make load balancing to avoid overconvergence of tasks at certain agents that are rich of contextual resources. Thus, this paper also presents a novel load balancing method: if there are overlarge number of tasks queued for a certain agent, the capacities of both the agent itself and its contextual agents to accept new tasks will be reduced. Therefore, in this paper, the task allocation and load balancing are implemented according to the contextual resource distribution of agents, which can be well suited for the characteristics of complex software systems; and the presented model can reduce more communication costs between allocated agents than the previous methods based on self-owned resource distribution of agents.
Incorporating factual knowledge in knowledge graph is regarded as a promising approach for mitigating the hallucination of large language models (LLMs). Existing methods usually only use the user's input to query the knowledge graph, thus failing to address the factual hallucination generated by LLMs during its reasoning process. To address this problem, this paper proposes Knowledge Graph-based Retrofitting (KGR), a new framework that incorporates LLMs with KGs to mitigate factual hallucination during the reasoning process by retrofitting the initial draft responses of LLMs based on the factual knowledge stored in KGs. Specifically, KGR leverages LLMs to extract, select, validate, and retrofit factual statements within the model-generated responses, which enables an autonomous knowledge verifying and refining procedure without any additional manual efforts. Experiments show that KGR can significantly improve the performance of LLMs on factual QA benchmarks especially when involving complex reasoning processes, which demonstrates the necessity and effectiveness of KGR in mitigating hallucination and enhancing the reliability of LLMs.
In cooperative systems (CSs), participants cannot usually ensure the correct behavior of their partners. Obligations and proofs of participants have to be performed together to achieve a common goal in a real cooperation. Without adequate accountability assurances of actions, there is no means of reliably enforcing punitive measures against fraudulent participants. However, the existing formal methods for analyzing CSs cannot properly deal with accountability and obligations. As such, this paper proposes a new class of labeled Petri net (LPN) models. The behavior of each partner is represented by an LPN, while a CS is modeled by the combination of all partners' LPN models. The behavioral properties of an overall modeled system can be well verified only by analyzing each individual LPN. LPNs provide the integration of formal notations with graphical notations and formal proofs with commonly used verification techniques. The obligations are verified based on LPN languages and the nonblocking properties of action sequences, while accountability can be proved by the network conditions and local action sequences on each partner's side. The proposed approaches are illustrated with the modeling and analysis of a purchase transaction using the Internet Open Trading Protocol.
Fully implicit methods are drawing more attention in scientific and engineering applications due to the allowance of large time steps in extreme-scale simulations. When using a fully implicit method to solve two-phase flow problems in porous media, one major challenge is the solution of the resultant nonlinear system at each time step. To solve such nonlinear systems, traditional nonlinear iterative methods, such as the class of the Newton methods, often fail to achieve the desired convergent rate due to the high nonlinearity of the system and/or the violation of the boundedness requirement of the saturation. In the paper, we reformulate the two-phase model as a variational inequality that naturally ensures the physical feasibility of the saturation variable. The variational inequality is then solved by an active-set reduced-space method with a nonlinear elimination preconditioner to remove the high nonlinear components that often causes the failure of the nonlinear iteration for convergence. To validate the effectiveness of the proposed method, we compare it with the classical implicit pressure-explicit saturation method for two-phase flow problems with strong heterogeneity. The numerical results show that our nonlinear solver overcomes the often severe limits on the time step associated with existing methods, results in superior convergence performance, and achieves reduction in the total computing time by more than one order of magnitude.
Device-free indoor localization and tracking using commercial millimeter wave radars have attracted much interest lately due to their non-intrusive nature and high spatial resolution. However, it is challenging to achieve high tracking accuracy due to rich multipath reflection and occlusion in indoor environments. Static objects with non-negligible reflectance of mmWave signals interact with moving human subjects and generate time-varying multipath ghosts and shadow ghosts, which can be easily confused as real subjects. To characterize the complex interactions, we first develop a geometric model that estimates the location of multipath ghosts given the locations of humans and static reflectors. Based on this model, the locations of static reflectors that form a reflection map are automatically estimated from received radar signals as a single person traverses the environment along arbitrary trajectories. The reflection map allows for the elimination of multipath and shadow ghost interference as well as the augmentation of weakly reflected human subjects in occluded areas. The proposed environment-aware multi-person tracking system can generate reflection maps with a mean error of 15.5cm and a 90-percentile error of 30.3cm, and achieve multi-person tracking accuracy with a mean error of 8.6cm and a 90-percentile error of 17.5cm, in four representative indoor spaces with diverse subjects using a single mmWave radar.
Next-generation mobile communication network (i.e., 6G) has been envisioned to go beyond classical communication functionality and provide integrated sensing and communication (ISAC) capability to enable more emerging ap-plications, such as smart cities, connected vehicles, AIoT and health care/elder care. Among all the ISAC proposals, the most practical and promising approach is to empower existing wireless network (e.g., WiFi, 4G/5G) with the augmented ability to sense the surrounding human and environment, and evolve wireless communication networks into intelligent communication and sensing network (e.g., 6G). In this paper, based on our experience on CSI-based wireless sensing with WiFi/4G/5G signals, we intend to identify ten major practical and theoretical problems that hinder real deployment of ISAC applications, and provide possible solutions to those critical challenges. Hopefully, this work will inspire further research to evolve existing WiFi/4G/5G networks into next-generation intelligent wireless network (i.e., 6G).
With the rapid progress made by industry and academia, quantum computers with dozens of qubits or even larger size are being realized. However, the fidelity of existing quantum computers often sharply decreases as the circuit depth increases. Thus, an ideal quantum circuit simulator on classical computers, especially on high-performance computers, is needed for benchmarking and validation. We design a large-scale simulator of universal random quantum circuits, often called “quantum supremacy circuits”, and implement it on Sunway TaihuLight. The simulator can be used to accomplish the following two tasks: 1) Computing a complete output state-vector; 2) Calculating one or a few amplitudes. We target the simulation of 49-qubit circuits. For task 1), we successfully simulate such a circuit of depth 39, and for task 2) we reach the 55-depth level. To the best of our knowledge, both of the simulation results reach the largest depth for 49-qubit quantum supremacy circuits.
The challenge of information extraction (IE) lies in the diversity of label schemas and the heterogeneity of structures. Traditional methods require task-specific model design and rely heavily on expensive supervision, making them difficult to generalize to new schemas. In this paper, we decouple IE into two basic abilities, structuring and conceptualizing, which are shared by different tasks and schemas. Based on this paradigm, we propose to universally model various IE tasks with Unified Semantic Matching (USM) framework, which introduces three unified token linking operations to model the abilities of structuring and conceptualizing. In this way, USM can jointly encode schema and input text, uniformly extract substructures in parallel, and controllably decode target structures on demand. Empirical evaluation on 4 IE tasks shows that the proposed method achieves state-of-the-art performance under the supervised experiments and shows strong generalization ability in zero/few-shot transfer settings.
The problem of finding a minimum vertex cover (MinVC) in a graph is a well known NP-hard combinatorial optimization problem of great importance in theory and practice. Due to its NP-hardness, there has been much interest in developing heuristic algorithms for finding a small vertex cover in reasonable time. Previously, heuristic algorithms for MinVC have focused on solving graphs of relatively small size, and they are not suitable for solving massive graphs as they usually have high-complexity heuristics. This paper explores techniques for solving MinVC in very large scale real-world graphs, including a construction algorithm, a local search algorithm and a preprocessing algorithm. Both the construction and search algorithms are based on low-complexity heuristics, and we combine them to develop a heuristic algorithm for MinVC called FastVC. Experimental results on a broad range of real-world massive graphs show that, our algorithms are very fast and have better performance than previous heuristic algorithms for MinVC. We also develop a preprocessing algorithm to simplify graphs for MinVC algorithms. By applying the preprocessing algorithm to local search algorithms, we obtain two efficient MinVC solvers called NuMVC2+p and FastVC2+p, which show further improvement on the massive graphs.
We introduce a vector representation called diffusion curve textures for mapping diffusion curve images (DCI) onto arbitrary surfaces. In contrast to the original implicit representation of DCIs [Orzan et al. 2008], where determining a single texture value requires iterative computation of the entire DCI via the Poisson equation, diffusion curve textures provide an explicit representation from which the texture value at any point can be solved directly, while preserving the compactness and resolution independence of diffusion curves. This is achieved through a formulation of the DCI diffusion process in terms of Green's functions. This formulation furthermore allows the texture value of any rectangular region (e.g. pixel area) to be solved in closed form, which facilitates anti-aliasing. We develop a GPU algorithm that renders anti-aliased diffusion curve textures in real time, and demonstrate the effectiveness of this method through high quality renderings with detailed control curves and color variations.