Shandong Institute of Automation
facilityJinan, China
Research output, citation impact, and the most-cited recent papers from Shandong Institute of Automation (China). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Shandong Institute of Automation
In this paper, we address the scene segmentation task by capturing rich contextual dependencies based on the self-attention mechanism. Unlike previous works that capture contexts by multi-scale features fusion, we propose a Dual Attention Networks (DANet) to adaptively integrate local features with their global dependencies. Specifically, we append two types of attention modules on top of traditional dilated FCN, which model the semantic interdependencies in spatial and channel dimensions respectively. The position attention module selectively aggregates the features at each position by a weighted sum of the features at all positions. Similar features would be related to each other regardless of their distances. Meanwhile, the channel attention module selectively emphasizes interdependent channel maps by integrating associated features among all channel maps. We sum the outputs of the two attention modules to further improve feature representation which contributes to more precise segmentation results. We achieve new state-of-the-art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff dataset. In particular, a Mean IoU score of 81.5% on Cityscapes test set is achieved without using coarse data.
Accurate and timely traffic flow information is important for the successful deployment of intelligent transportation systems. Over the last few years, traffic data have been exploding, and we have truly entered the era of big data for transportation. Existing traffic flow prediction methods mainly use shallow traffic prediction models and are still unsatisfying for many real-world applications. This situation inspires us to rethink the traffic flow prediction problem based on deep architecture models with big traffic data. In this paper, a novel deep-learning-based traffic flow prediction method is proposed, which considers the spatial and temporal correlations inherently. A stacked autoencoder model is used to learn generic traffic flow features, and it is trained in a greedy layerwise fashion. To the best of our knowledge, this is the first time that a deep architecture model is applied using autoencoders as building blocks to represent traffic flow features for prediction. Moreover, experiments demonstrate that the proposed method for traffic flow prediction has superior performance.
Large language models (LLMs) are gaining increasing popularity in both academia and industry, owing to their unprecedented performance in various applications. As LLMs continue to play a vital role in both research and daily use, their evaluation becomes increasingly critical, not only at the task level, but also at the society level for better understanding of their potential risks. Over the past years, significant efforts have been made to examine LLMs from various perspectives. This paper presents a comprehensive review of these evaluation methods for LLMs, focusing on three key dimensions: what to evaluate , where to evaluate , and how to evaluate . Firstly, we provide an overview from the perspective of evaluation tasks, encompassing general natural language processing tasks, reasoning, medical usage, ethics, education, natural and social sciences, agent applications, and other areas. Secondly, we answer the ‘where’ and ‘how’ questions by diving into the evaluation methods and benchmarks, which serve as crucial components in assessing the performance of LLMs. Then, we summarize the success and failure cases of LLMs in different tasks. Finally, we shed light on several future challenges that lie ahead in LLMs evaluation. Our aim is to offer invaluable insights to researchers in the realm of LLMs evaluation, thereby aiding the development of more proficient LLMs. Our key point is that evaluation should be treated as an essential discipline to better assist the development of LLMs. We consistently maintain the related open-source materials at: https://github.com/MLGroupJLU/LLM-eval-survey
Person re-identification is an important technique towards automatic search of a person's presence in a surveillance video. Two fundamental problems are critical for person re-identification, feature representation and metric learning. An effective feature representation should be robust to illumination and viewpoint changes, and a discriminant metric should be learned to match various person images. In this paper, we propose an effective feature representation called Local Maximal Occurrence (LOMO), and a subspace and metric learning method called Cross-view Quadratic Discriminant Analysis (XQDA). The LOMO feature analyzes the horizontal occurrence of local features, and maximizes the occurrence to make a stable representation against viewpoint changes. Besides, to handle illumination variations, we apply the Retinex transform and a scale invariant texture operator. To learn a discriminant metric, we propose to learn a discriminant low dimensional subspace by cross-view quadratic discriminant analysis, and simultaneously, a QDA metric is learned on the derived subspace. We also present a practical computation method for XQDA, as well as its regularization. Experiments on four challenging person re-identification databases, VIPeR, QMUL GRID, CUHK Campus, and CUHK03, show that the proposed method improves the state-of-the-art rank-1 identification rates by 2.2%, 4.88%, 28.91%, and 31.55% on the four databases, respectively.
In skeleton-based action recognition, graph convolutional networks (GCNs), which model the human body skeletons as spatiotemporal graphs, have achieved remarkable performance. However, in existing GCN-based methods, the topology of the graph is set manually, and it is fixed over all layers and input samples. This may not be optimal for the hierarchical GCN and diverse samples in action recognition tasks. In addition, the second-order information (the lengths and directions of bones) of the skeleton data, which is naturally more informative and discriminative for action recognition, is rarely investigated in existing methods. In this work, we propose a novel two-stream adaptive graph convolutional network (2s-AGCN) for skeleton-based action recognition. The topology of the graph in our model can be either uniformly or individually learned by the BP algorithm in an end-to-end manner. This data-driven method increases the flexibility of the model for graph construction and brings more generality to adapt to various data samples. Moreover, a two-stream framework is proposed to model both the first-order and the second-order information simultaneously, which shows notable improvement for the recognition accuracy. Extensive experiments on the two large-scale datasets, NTU-RGBD and Kinetics-Skeleton, demonstrate that the performance of our model exceeds the state-of-the-art with a significant margin.
Broad Learning System (BLS) that aims to offer an alternative way of learning in deep structure is proposed in this paper. Deep structure and learning suffer from a time-consuming training process because of a large number of connecting parameters in filters and layers. Moreover, it encounters a complete retraining process if the structure is not sufficient to model the system. The BLS is established in the form of a flat network, where the original inputs are transferred and placed as "mapped features" in feature nodes and the structure is expanded in wide sense in the "enhancement nodes." The incremental learning algorithms are developed for fast remodeling in broad expansion without a retraining process if the network deems to be expanded. Two incremental learning algorithms are given for both the increment of the feature nodes (or filters in deep structure) and the increment of the enhancement nodes. The designed model and algorithms are very versatile for selecting a model rapidly. In addition, another incremental learning is developed for a system that has been modeled encounters a new incoming input. Specifically, the system can be remodeled in an incremental way without the entire retraining from the beginning. Satisfactory result for model reduction using singular value decomposition is conducted to simplify the final structure. Compared with existing deep neural networks, experimental results on the Modified National Institute of Standards and Technology database and NYU NORB object recognition dataset benchmark data demonstrate the effectiveness of the proposed BLS.
For the last two decades, intelligent transportation systems (ITS) have emerged as an efficient way of improving the performance of transportation systems, enhancing travel security, and providing more choices to travelers. A significant change in ITS in recent years is that much more data are collected from a variety of sources and can be processed into various forms for different stakeholders. The availability of a large amount of data can potentially lead to a revolution in ITS development, changing an ITS from a conventional technology-driven system into a more powerful multifunctional data-driven intelligent transportation system (D <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ITS) : a system that is vision, multisource, and learning algorithm driven to optimize its performance. Furthermore, D <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ITS is trending to become a privacy-aware people-centric more intelligent system. In this paper, we provide a survey on the development of D <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ITS, discussing the functionality of its key components and some deployment issues associated with D <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ITS Future research directions for the development of D <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> ITS is also presented.
We introduce here a large tracking database that offers an unprecedentedly wide coverage of common moving objects in the wild, called GOT-10k. Specifically, GOT-10k is built upon the backbone of WordNet structure [1] and it populates the majority of over 560 classes of moving objects and 87 motion patterns, magnitudes wider than the most recent similar-scale counterparts [19], [20], [23], [26]. By releasing the large high-diversity database, we aim to provide a unified training and evaluation platform for the development of class-agnostic, generic purposed short-term trackers. The features of GOT-10k and the contributions of this article are summarized in the following. (1) GOT-10k offers over 10,000 video segments with more than 1.5 million manually labeled bounding boxes, enabling unified training and stable evaluation of deep trackers. (2) GOT-10k is by far the first video trajectory dataset that uses the semantic hierarchy of WordNet to guide class population, which ensures a comprehensive and relatively unbiased coverage of diverse moving objects. (3) For the first time, GOT-10k introduces the one-shot protocol for tracker evaluation, where the training and test classes are zero-overlapped. The protocol avoids biased evaluation results towards familiar objects and it promotes generalization in tracker development. (4) GOT-10k offers additional labels such as motion classes and object visible ratios, facilitating the development of motion-aware and occlusion-aware trackers. (5) We conduct extensive tracking experiments with 39 typical tracking algorithms and their variants on GOT-10k and analyze their results in this paper. (6) Finally, we develop a comprehensive platform for the tracking community that offers full-featured evaluation toolkits, an online evaluation server, and a responsive leaderboard. The annotations of GOT-10k's test data are kept private to avoid tuning parameters on it.
ChatGPT, an artificial intelligence generated content (AIGC) model developed by OpenAI, has attracted world-wide attention for its capability of dealing with challenging language understanding and generation tasks in the form of conversations. This paper briefly provides an overview on the history, status quo and potential future development of ChatGPT, helping to provide an entry point to think about ChatGPT. Specifically, from the limited open-accessed resources, we conclude the core techniques of ChatGPT, mainly including large-scale language models, in-context learning, reinforcement learning from human feedback and the key technical steps for developing Chat-GPT. We further analyze the pros and cons of ChatGPT and we rethink the duality of ChatGPT in various fields. Although it has been widely acknowledged that ChatGPT brings plenty of opportunities for various fields, mankind should still treat and use ChatGPT properly to avoid the potential threat, e.g., academic integrity and safety challenge. Finally, we discuss several open problems as the potential development of ChatGPT.
Images captured in foggy weather conditions often suffer from bad visibility. In this paper, we propose an efficient regularization method to remove hazes from a single input image. Our method benefits much from an exploration on the inherent boundary constraint on the transmission function. This constraint, combined with a weighted L_1-norm based contextual regularization, is modeled into an optimization problem to estimate the unknown scene transmission. A quite efficient algorithm based on variable splitting is also presented to solve the problem. The proposed method requires only a few general assumptions and can restore a high-quality haze-free image with faithful colors and fine image details. Experimental results on a variety of haze images demonstrate the effectiveness and efficiency of the proposed method.
The human brain has been described as a large, sparse, complex network characterized by efficient small-world properties, which assure that the brain generates and integrates information with high efficiency. Many previous neuroimaging studies have provided consistent evidence of 'dysfunctional connectivity' among the brain regions in schizophrenia; however, little is known about whether or not this dysfunctional connectivity causes disruption of the topological properties of brain functional networks. To this end, we investigated the topological properties of human brain functional networks derived from resting-state functional magnetic resonance imaging (fMRI). Data was obtained from 31 schizophrenia patients and 31 healthy subjects; then functional connectivity between 90 cortical and sub-cortical regions was estimated by partial correlation analysis and thresholded to construct a set of undirected graphs. Our findings demonstrated that the brain functional networks had efficient small-world properties in the healthy subjects; whereas these properties were disrupted in the patients with schizophrenia. Brain functional networks have efficient small-world properties which support efficient parallel information transfer at a relatively low cost. More importantly, in patients with schizophrenia the small-world topological properties are significantly altered in many brain regions in the prefrontal, parietal and temporal lobes. These findings are consistent with a hypothesis of dysfunctional integration of the brain in this illness. Specifically, we found that these altered topological measurements correlate with illness duration in schizophrenia. Detection and estimation of these alterations could prove helpful for understanding the pathophysiological mechanism as well as for evaluation of the severity of schizophrenia.
Digital watermarking has been proposed as a solution to the problem of copyright protection of multimedia documents in networked environments. There are two important issues that watermarking algorithms need to address. First, watermarking schemes are required to provide trustworthy evidence for protecting rightful ownership. Second, good watermarking schemes should satisfy the requirement of robustness and resist distortions due to common image manipulations (such as filtering, compression, etc.). In this paper, we propose a novel watermarking algorithm based on singular value decomposition (SVD). Analysis and experimental results show that the new watermarking method performs well in both security and robustness.
Medical imaging can assess the tumor and its environment in their entirety, which makes it suitable for monitoring the temporal and spatial characteristics of the tumor. Progress in computational methods, especially in artificial intelligence for medical image process and analysis, has converted these images into quantitative and minable data associated with clinical events in oncology management. This concept was first described as radiomics in 2012. Since then, computer scientists, radiologists, and oncologists have gravitated towards this new tool and exploited advanced methodologies to mine the information behind medical images. On the basis of a great quantity of radiographic images and novel computational technologies, researchers developed and validated radiomic models that may improve the accuracy of diagnoses and therapy response assessments. Here, we review the recent methodological developments in radiomics, including data acquisition, tumor segmentation, feature extraction, and modelling, as well as the rapidly developing deep learning technology. Moreover, we outline the main applications of radiomics in diagnosis, treatment planning and evaluations in the field of oncology with the aim of developing quantitative and personalized medicine. Finally, we discuss the challenges in the field of radiomics and the scope and clinical applicability of these methods.
With an increasing emphasis on security, automated personal identification based on biometrics has been receiving extensive attention over the past decade. Iris recognition, as an emerging biometric recognition approach, is becoming a very active topic in both research and practical applications. In general, a typical iris recognition system includes iris imaging, iris liveness detection, and recognition. This paper focuses on the last issue and describes a new scheme for iris recognition from an image sequence. We first assess the quality of each image in the input sequence and select a clear iris image from such a sequence for subsequent recognition. A bank of spatial filters, whose kernels are suitable for iris recognition, is then used to capture local characteristics of the iris so as to produce discriminating texture features. Experimental results show that the proposed method has an encouraging performance. In particular, a comparative study of existing methods for iris recognition is conducted on an iris image database including 2,255 sequences from 213 subjects. Conclusions based on such a comparison using a nonparametric statistical method (the bootstrap) provide useful information for further research.
Various hand-crafted features and metric learning methods prevail in the field of person re-identification. Compared to these methods, this paper proposes a more general way that can learn a similarity metric from image pixels directly. By using a "siamese" deep neural network, the proposed method can jointly learn the color feature, texture feature and metric in a unified framework. The network has a symmetry structure with two sub-networks which are connected by a cosine layer. Each sub network includes two convolutional layers and a full connected layer. To deal with the big variations of person images, binomial deviance is used to evaluate the cost between similarities and labels, which is proved to be robust to outliers. Experiments on VIPeR illustrate the superior performance of our method and a cross database experiment also shows its good generalization.
The skeleton data have been widely used for the action recognition tasks since they can robustly accommodate dynamic circumstances and complex backgrounds. In existing methods, both the joint and bone information in skeleton data have been proved to be of great help for action recognition tasks. However, how to incorporate these two types of data to best take advantage of the relationship between joints and bones remains a problem to be solved. In this work, we represent the skeleton data as a directed acyclic graph based on the kinematic dependency between the joints and bones in the natural human body. A novel directed graph neural network is designed specially to extract the information of joints, bones and their relations and make prediction based on the extracted features. In addition, to better fit the action recognition task, the topological structure of the graph is made adaptive based on the training process, which brings notable improvement. Moreover, the motion information of the skeleton sequence is exploited and combined with the spatial information to further enhance the performance in a two-stream framework. Our final model is tested on two large-scale datasets, NTU-RGBD and Skeleton-Kinetics, and exceeds state-of-the-art performance on both of them.
Parallel control and management have been proposed as a new mechanism for conducting operations of complex systems, especially those that involved complexity issues of both engineering and social dimensions, such as transportation systems. This paper presents an overview of the background, concepts, basic methods, major issues, and current applications of Parallel transportation Management Systems (PtMS). In essence, parallel control and management is a data-driven approach for modeling, analysis, and decision-making that considers both the engineering and social complexity in its processes. The developments and applications described here clearly indicate that PtMS is effective for use in networked complex traffic systems and is closely related to emerging technologies in cloud computing, social computing, and cyberphysical-social systems. A description of PtMS system architectures, processes, and components, including OTSt, Dyna CAS, aDAPTS, iTOP, and TransWorld is presented and discussed. Finally, the experiments and examples of real-world applications are illustrated and analyzed.
Electroencephalogram (EEG) plays an important role in identifying brain activity and behavior. However, the recorded electrical activity always be contaminated with artifacts and then affect the analysis of EEG signal. Hence, it is essential to develop methods to effectively detect and extract the clean EEG data during encephalogram recordings. Several methods have been proposed to remove artifacts, but the research on artifact removal continues to be an open problem. This paper tends to review the current artifact removal of various contaminations. We first discuss the characteristics of EEG data and the types of different artifacts. Then, a general overview of the state-of-the-art methods and their detail analysis are presented. Lastly, a comparative analysis is provided for choosing a suitable methods according to particular application.
Image forensics has now raised the anxiety of justice as increasing cases of abusing tampered images in newspapers and court for evidence are reported recently. With the goal of verifying image content authenticity, passive-blind image tampering detection is called for. More realistic open benchmark databases are also needed to assist the techniques. Recently, we collect a natural color image database with realistic tampering operations. The database is made publicly available for researchers to compare and evaluate their proposed tampering detection techniques. We call this database CASI-A Image Tampering Detection Evaluation Database. We describe the purpose, the design criterion, the organization and self-evaluation of this database in this paper.
This note investigates the finite-time attitude control problems for a single spacecraft and multiple spacecraft. First of all, a finite-time controller is designed to solve finite-time attitude tracking problem for a single spacecraft. Rigorous proof shows that the desired attitude can be tracked in finite time in the absence of disturbances. In the presence of disturbances, the tracking errors can reach a region around the origin in finite time. Then, based on the neighbor rule, a distributed finite-time attitude control law is proposed for a group of spacecraft with a leader-follower architecture. Under the finite-time control law, the attitude synchronization can be achieved in finite time.