RIKEN Center for Advanced Intelligence Project
facilityTokyo, Japan
Research output, citation impact, and the most-cited recent papers from RIKEN Center for Advanced Intelligence Project (Japan). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from RIKEN Center for Advanced Intelligence Project
Classification and identification of the materials lying over or beneath the earth's surface have long been a fundamental but challenging research topic in geoscience and remote sensing (RS), and have garnered a growing concern owing to the recent advancements of deep learning techniques. Although deep networks have been successfully applied in single-modality-dominated classification tasks, yet their performance inevitably meets the bottleneck in complex scenes that need to be finely classified, due to the limitation of information diversity. In this work, we provide a baseline solution to the aforementioned difficulty by developing a general multimodal deep learning (MDL) framework. In particular, we also investigate a special case of multi-modality learning (MML)-cross-modality learning (CML) that exists widely in RS image classification applications. By focusing on “what,” “where,” and “how” to fuse, we show different fusion strategies as well as how to train deep networks and build the network architecture. Specifically, five fusion architectures are introduced and developed, further being unified in our MDL framework. More significantly, our framework is not only limited to pixel-wise classification tasks but also applicable to spatial information modeling with convolutional neural networks (CNNs). To validate the effectiveness and superiority of the MDL framework, extensive experiments related to the settings of MML and CML are conducted on two different multimodal RS data sets. Furthermore, the codes and data sets will be available at https://github.com/danfenghong/IEEE_TGRS_MDL-RS, contributing to the RS community.
Hyperspectral imagery collected from airborne or satellite sources inevitably suffers from spectral variability, making it difficult for spectral unmixing to accurately estimate abundance maps. The classical unmixing model, the linear mixing model (LMM), generally fails to handle this sticky issue effectively. To this end, we propose a novel spectral mixture model, called the augmented linear mixing model (ALMM), to address spectral variability by applying a data-driven learning strategy in inverse problems of hyperspectral unmixing. The proposed approach models the main spectral variability (i.e., scaling factors) generated by variations in illumination or typography separately by means of the endmember dictionary. It then models other spectral variabilities caused by environmental conditions (e.g., local temperature and humidity, atmospheric effects) and instrumental configurations (e.g., sensor noise), as well as material nonlinear mixing effects, by introducing a spectral variability dictionary. To effectively run the data-driven learning strategy, we also propose a reasonable prior knowledge for the spectral variability dictionary, whose atoms are assumed to be low-coherent with spectral signatures of endmembers, which leads to a well-known low-coherence dictionary learning problem. Thus, a dictionary learning technique is embedded in the framework of spectral unmixing so that the algorithm can learn the spectral variability dictionary and estimate the abundance maps simultaneously. Extensive experiments on synthetic and real datasets are performed to demonstrate the superiority and effectiveness of the proposed method in comparison with previous state-of-the-art methods.
The discovery and development of catalysts and catalytic processes are essential components to maintaining an ecological balance in the future. Recent revolutions made in data science could have a great impact on traditional catalysis research in both industry and academia and could accelerate the development of catalysts. Machine learning (ML), a subfield of data science, can play a central role in this paradigm shift away from the use of traditional approaches. In this review, we present a user's guide for ML that we believe will be helpful for scientists performing research in the field of catalysis and summarize recent progress that has been made in utilizing ML to create homogeneous and heterogeneous catalysts. The focus of the review is on the design, synthesis, and characterization of catalytic materials/compounds as well as their applications to catalyzed processes. The ML technique not only enhances ways to discover catalysts but also serves as a powerful tool to establish a deeper understanding of relationships between the properties of materials/compounds and their catalytic activities, selectivities, and stabilities. This knowledge facilitates the establishment of principles employed to design catalysts and to enhance their efficiencies. Despite such advantages of ML, it is noteworthly that the current ML-assisted development of real catalysts remains in its infancy, mainly because of the complexity of catalysis associated with the fact that catalysis is a time-dependent dynamic event. In this review, we discuss how seamless integration of experiment, theory, and data science can be used to accelerate catalyst development and to guide future studies aimed at applications that will impact society's need to produce energy, materials, and chemicals. Moreover, the limitations and difficulties of ML in catalysis research originating from the complex nature of catalysis are discussed in order to make the catalysis community aware of challenges that need to be addressed for effective and practical use of ML in the field.
Entity representations are useful in natural language tasks involving entities. In this paper, we propose new pretrained contextualized representations of words and entities based on the bidirectional transformer The proposed model treats words and entities in a given text as independent tokens, and outputs contextualized representations of them. Our model is trained using a new pretraining task based on the masked language model of BERT (Devlin et al., 2019). The task involves predicting randomly masked words and entities in a large entity-annotated corpus retrieved from Wikipedia. We also propose an entity-aware self-attention mechanism that is an extension of the self-attention mechanism of the transformer, and considers the types of tokens (words or entities) when computing attention scores. The proposed model achieves impressive empirical performance on a wide range of entity-related tasks. In particular, it obtains state-of-the-art results on five well-known datasets: Open Entity (entity typing), TACRED (relation classification), CoNLL-2003 (named entity recognition), ReCoRD (cloze-style question answering), and SQuAD 1.1 (extractive question answering). Our source code and pretrained representations are available at
Cognitive developmental robotics (CDR) aims to provide new understanding of how human's higher cognitive functions develop by means of a synthetic approach that developmentally constructs cognitive functions. The core idea of CDR is ldquophysical embodimentrdquo that enables information structuring through interactions with the environment, including other agents. The idea is shaped based on the hypothesized development model of human cognitive functions from body representation to social behavior. Along with the model, studies of CDR and related works are introduced, and discussion on the model and future issues are argued.
Along with the advancement of several emerging computing paradigms and technologies, such as cloud computing, mobile computing, artificial intelligence, and big data, Internet of Things (IoT) technologies have been applied in a variety of fields. In particular, the Internet of Healthcare Things (IoHT) is becoming increasingly important in human activity recognition (HAR) due to the rapid development of wearable and mobile devices. In this article, we focus on the deep-learning-enhanced HAR in IoHT environments. A semisupervised deep learning framework is designed and built for more accurate HAR, which efficiently uses and analyzes the weakly labeled sensor data to train the classifier learning model. To better solve the problem of the inadequately labeled sample, an intelligent autolabeling scheme based on deep Q-network (DQN) is developed with a newly designed distance-based reward rule which can improve the learning efficiency in IoT environments. A multisensor based data fusion mechanism is then developed to seamlessly integrate the on-body sensor data, context sensor data, and personal profile data together, and a long short-term memory (LSTM)-based classification method is proposed to identify fine-grained patterns according to the high-level features contextually extracted from the sequential motion data. Finally, experiments and evaluations are conducted to demonstrate the usefulness and effectiveness of the proposed method using real-world data.
Primary open-angle glaucoma (POAG), is a heritable common cause of blindness world-wide. To identify risk loci, we conduct a large multi-ethnic meta-analysis of genome-wide association studies on a total of 34,179 cases and 349,321 controls, identifying 44 previously unreported risk loci and confirming 83 loci that were previously known. The majority of loci have broadly consistent effects across European, Asian and African ancestries. Cross-ancestry data improve fine-mapping of causal variants for several loci. Integration of multiple lines of genetic evidence support the functional relevance of the identified POAG risk loci and highlight potential contributions of several genes to POAG pathogenesis, including SVEP1, RERE, VCAM1, ZNF638, CLIC5, SLC2A12, YAP1, MXRA5, and SMAD6. Several drug compounds targeting POAG risk genes may be potential glaucoma therapeutic candidates.
Deep learning (DL) and reinforcement learning (RL) methods seem to be a part of indispensable factors to achieve human-level or super-human AI systems. On the other hand, both DL and RL have strong connections with our brain functions and with neuroscientific findings. In this review, we summarize talks and discussions in the "Deep Learning and Reinforcement Learning" session of the symposium, International Symposium on Artificial Intelligence and Brain Science. In this session, we discussed whether we can achieve comprehensive understanding of human intelligence based on the recent advances of deep learning and reinforcement learning algorithms. Speakers contributed to provide talks about their recent studies that can be key technologies to achieve human-level intelligence.
The rapidly developed Health 2.0 technology has provided people with more opportunities to conduct online medical consultation than ever before. Understanding contexts within different online medical communications and activities becomes a significant issue to facilitate patients' medical decision making process. As a subcategory of machine learning, neural networks have drawn increasing attentions in natural language processing applications. In this article, we focus on modeling and analyzing the patient-physician-generated data based on an integrated CNN-RNN framework, in order to deal with the situation that patients' online inquiries are usually not very long. A so-called DP-CRNN algorithm is developed with a newly designed neural network structure, to extract and highlight the combination of semantic and sequential features in terms of patient's inquiries. An intelligent recommendation method is then proposed to provide patients with automatic clinic guidance and pre-diagnosis suggestions, in which a clustering mechanism is utilized to refine the learning process with more precise diagnosis scope and more representative features. Experiments based on the collected real world data demonstrate the effectiveness of our proposed model and method for intelligent pre-diagnosis service in online medical environments.
This paper considers the problem of single image depth estimation. The employment of convolutional neural networks (CNNs) has recently brought about significant advancements in the research of this problem. However, most existing methods suffer from loss of spatial resolution in the estimated depth maps; a typical symptom is distorted and blurry reconstruction of object boundaries. In this paper, toward more accurate estimation with a focus on depth maps with higher spatial resolution, we propose two improvements to existing approaches. One is about the strategy of fusing features extracted at different scales, for which we propose an improved network architecture consisting of four modules: an encoder, decoder, multi-scale feature fusion module, and refinement module. The other is about loss functions for measuring inference errors used in training. We show that three loss terms, which measure errors in depth, gradients and surface normals, respectively, contribute to improvement of accuracy in an complementary fashion. Experimental results show that these two improvements enable to attain higher accuracy than the current state-of-the-arts, which is given by finer resolution reconstruction, for example, with small objects and object boundaries.
Most existing CNN-based super-resolution (SR) methods are developed based on an assumption that the degradation is fixed and known (e.g., bicubic downsampling). However, these methods suffer a severe performance drop when the real degradation is different from their assumption. To handle various unknown degradations in real-world applications, previous methods rely on degradation estimation to reconstruct the SR image. Nevertheless, degradation estimation methods are usually time-consuming and may lead to SR failure due to large estimation errors. In this paper, we propose an unsupervised degradation representation learning scheme for blind SR without explicit degradation estimation. Specifically, we learn abstract representations to distinguish various degradations in the representation space rather than explicit estimation in the pixel space. Moreover, we introduce a Degradation-Aware SR (DASR) network with flexible adaption to various degradations based on the learned representations. It is demonstrated that our degradation representation learning scheme can extract discriminative representations to obtain accurate degradation information. Experiments on both synthetic and real images show that our network achieves state-of-the-art performance for the blind SR task. Code is available at: https://github.com/LongguangWang/DASR.
With the increasing population of Industry 4.0, industrial big data (IBD) has become a hotly discussed topic in digital and intelligent industry field. The security problem existing in the signal processing on large scale of data stream is still a challenge issue in industrial internet of things, especially when dealing with the high-dimensional anomaly detection for intelligent industrial application. In this article, to mitigate the inconsistency between dimensionality reduction and feature retention in imbalanced IBD, we propose a variational long short-term memory (VLSTM) learning model for intelligent anomaly detection based on reconstructed feature representation. An encoder-decoder neural network associated with a variational reparameterization scheme is designed to learn the low-dimensional feature representation from high-dimensional raw data. Three loss functions are defined and quantified to constrain the reconstructed hidden variable into a more explicit and meaningful form. A lightweight estimation network is then fed with the refined feature representation to identify anomalies in IBD. Experiments using a public IBD dataset named UNSW-NB15 demonstrate that the proposed VLSTM model can efficiently cope with imbalance and high-dimensional issues, and significantly improve the accuracy and reduce the false rate in anomaly detection for IBD according to F1, area under curve (AUC), and false alarm rate (FAR).
Modern Internet-of-Things (IoT) applications are heavily data driven and often require reliable data streams to achieve high-quality data mining. The concept of edge computing is introduced to reduce data latency and communication bandwidth between the cloud server and IoT edge devices. However, inefficient routing that may cause transmission failure or unnecessary data (re)transmission is still a key obstacle to obtain good and reliable data mining results. In this article, network coding combined with opportunistic routing is used to improve energy efficiency in wireless IoT infrastructure, considering the existence of link correlation. Studies have shown that packet receptions on wireless links are correlated, which is completely contrary to the assumption of link independence used in existing routing mechanisms. This assumption causes estimation errors in the calculation of expected number of transmissions for forwarders, which further affects the selection of forwarder set, and ultimately affects the performance of the protocol. We propose an intrasession network coding mechanism based on the mining of link correlation. A novel smart routing method is proposed to accurately estimate the number of transmissions required by forwarders, together with an algorithm for selecting a forwarder set with more optimal number of transmissions. Simulation results demonstrate that the proposed mechanism can achieve fewer transmissions and offer more energy-efficient communications for wireless edge IoT applications.
Graph structured data has wide applicability in various domains such as physics, chemistry, biology, computer vision, and social networks, to name a few. Recently, graph neural networks (GNN) were shown to be successful in effectively representing graph structured data because of their good performance and generalization ability. However, explaining the effectiveness of GNN models is a challenging task because of the complex nonlinear transformations made over the iterations. In this paper, we propose GraphLIME, a local interpretable model explanation for graphs using the Hilbert-Schmidt Independence Criterion (HSIC) Lasso, which is a nonlinear feature selection method. GraphLIME is a generic GNN-model explanation framework that learns a nonlinear interpretable model locally in the subgraph of the node being explained. Through experiments on two real-world datasets, the explanations of GraphLIME are found to be of extraordinary degree and more descriptive in comparison to the existing explanation methods.
With the increasing population of Industry 4.0, both AI and smart techniques have been applied and become hotly discussed topics in industrial cyber-physical systems (CPS). Intelligent anomaly detection for identifying cyber-physical attacks to guarantee the work efficiency and safety is still a challenging issue, especially when dealing with few labeled data for cyber-physical security protection. In this article, we propose a few-shot learning model with Siamese convolutional neural network (FSL-SCNN), to alleviate the over-fitting issue and enhance the accuracy for intelligent anomaly detection in industrial CPS. A Siamese CNN encoding network is constructed to measure distances of input samples based on their optimized feature representations. A robust cost function design including three specific losses is then proposed to enhance the efficiency of training process. An intelligent anomaly detection algorithm is developed finally. Experiment results based on a fully labeled public dataset and a few labeled dataset demonstrate that our proposed FSL-SCNN can significantly improve false alarm rate (FAR) and F1 scores when detecting intrusion signals for industrial CPS security protection.
Advances in machine learning (ML) and artificial intelligence (AI) present an opportunity to build better tools and solutions to help address some of the world's most pressing challenges, and deliver positive social impact in accordance with the priorities outlined in the United Nations' 17 Sustainable Development Goals (SDGs). The AI for Social Good (AI4SG) movement aims to establish interdisciplinary partnerships centred around AI applications towards SDGs. We provide a set of guidelines for establishing successful long-term collaborations between AI researchers and application-domain experts, relate them to existing AI4SG projects and identify key opportunities for future AI applications targeted towards social good.
This paper presents the scientific outcomes of the 2018 Data Fusion Contest organized by the Image Analysis and Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society. The 2018 Contest addressed the problem of urban observation and monitoring with advanced multi-source optical remote sensing (multispectral LiDAR, hyperspectral imaging, and very high-resolution imagery). The competition was based on urban land use and land cover classification, aiming to distinguish between very diverse and detailed classes of urban objects, materials, and vegetation. Besides data fusion, it also quantified the respective assets of the novel sensors used to collect the data. Participants proposed elaborate approaches rooted in remote-sensing, and also in machine learning and computer vision, to make the most of the available data. Winning approaches combine convolutional neural networks with subtle earth-observation data scientist expertise.
Along with the rapid development of cloud computing, IoT, and AI technologies, cloud video surveillance (CVS) has become a hotly discussed topic, especially when facing the requirement of real-time analysis in smart applications. Object detection usually plays an important role for environment monitoring and activity tracking in surveillance system. The emerging edge-cloud computing paradigm provides us an opportunity to deal with the continuously generated huge amount of surveillance data in an on-site manner across IoT systems. However, the detection performance is still far away from satisfactions due to the complex surveilling environment. In this study, we focus on the multitarget detection for real-time surveillance in smart IoT systems. A newly designed deep neural network model called A-YONet, which is constructed by combining the advantages of YOLO and MTCNN, is proposed to be deployed in an end-edge-cloud surveillance system, in order to realize the lightweight training and feature learning with limited computing sources. An intelligent detection algorithm is then developed based on a preadjusting scheme of anchor box and a multilevel feature fusion mechanism. Experiments and evaluations using two data sets, including one public data set and one homemade data set obtained in a real surveillance system, demonstrate the effectiveness of our proposed method in enhancing training efficiency and detection precision, especially for multitarget detection in smart IoT application developments.
Existing enhancement methods are empirically expected to help the high-level end computer vision task: however, that is observed to not always be the case in practice. We focus on object or face detection in poor visibility enhancements caused by bad weathers (haze, rain) and low light conditions. To provide a more thorough examination and fair comparison, we introduce three benchmark sets collected in real-world hazy, rainy, and low-light conditions, respectively, with annotated objects/faces. We launched the UG2+ challenge Track 2 competition in IEEE CVPR 2019, aiming to evoke a comprehensive discussion and exploration about whether and how low-level vision techniques can benefit the high-level automatic visual recognition in various scenarios. To our best knowledge, this is the first and currently largest effort of its kind. Baseline results by cascading existing enhancement and detection models are reported, indicating the highly challenging nature of our new data as well as the large room for further technical innovations. Thanks to a large participation from the research community, we are able to analyze representative team solutions, striving to better identify the strengths and limitations of existing mindsets as well as the future directions.
The mangrove ecosystem plays a vital role in the global carbon cycle, by reducing greenhouse gas emissions and mitigating the impacts of climate change. However, mangroves have been lost worldwide, resulting in substantial carbon stock losses. Additionally, some aspects of the mangrove ecosystem remain poorly characterized compared to other forest ecosystems due to practical difficulties in measuring and monitoring mangrove biomass and their carbon stocks. Without a quantitative method for effectively monitoring biophysical parameters and carbon stocks in mangroves, robust policies and actions for sustainably conserving mangroves in the context of climate change mitigation and adaptation are more difficult. In this context, remote sensing provides an important tool for monitoring mangroves and identifying attributes such as species, biomass, and carbon stocks. A wide range of studies is based on optical imagery (aerial photography, multispectral, and hyperspectral) and synthetic aperture radar (SAR) data. Remote sensing approaches have been proven effective for mapping mangrove species, estimating their biomass, and assessing changes in their extent. This review provides an overview of the techniques that are currently being used to map various attributes of mangroves, summarizes the studies that have been undertaken since 2010 on a variety of remote sensing applications for monitoring mangroves, and addresses the limitations of these studies. We see several key future directions for the potential use of remote sensing techniques combined with machine learning techniques for mapping mangrove areas and species, and evaluating their biomass and carbon stocks.