Institute of Information Science, Academia Sinica
facilityTaipei, Taiwan
Research output, citation impact, and the most-cited recent papers from Institute of Information Science, Academia Sinica (Taiwan). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Institute of Information Science, Academia Sinica
Real-time object detection is one of the most important research topics in computer vision. As new approaches regarding architecture optimization and training optimization are continually being developed, we have found two research topics that have spawned when dealing with these latest state-of-the-art methods. To address the topics, we propose a trainable bag-of-freebies oriented solution. We combine the flexible and efficient training tools with the proposed architecture and the compound scaling method. YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 120 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. Source code is released in https://github.com/WongKinYiu/yolov7.
BACKGROUND: Network is a useful way for presenting many types of biological data including protein-protein interactions, gene regulations, cellular pathways, and signal transductions. We can measure nodes by their network features to infer their importance in the network, and it can help us identify central elements of biological networks. RESULTS: We introduce a novel Cytoscape plugin cytoHubba for ranking nodes in a network by their network features. CytoHubba provides 11 topological analysis methods including Degree, Edge Percolated Component, Maximum Neighborhood Component, Density of Maximum Neighborhood Component, Maximal Clique Centrality and six centralities (Bottleneck, EcCentricity, Closeness, Radiality, Betweenness, and Stress) based on shortest paths. Among the eleven methods, the new proposed method, MCC, has a better performance on the precision of predicting essential proteins from the yeast PPI network. CONCLUSIONS: CytoHubba provide a user-friendly interface to explore important nodes in biological networks. It computes all eleven methods in one stop shopping way. Besides, researchers are able to combine cytoHubba with and other plugins into a novel analysis scheme. The network and sub-networks caught by this topological analysis strategy will lead to new insights on essential regulatory networks and protein drug targets for experimental biologists. According to cytoscape plugin download statistics, the accumulated number of cytoHubba is around 6,700 times since 2010.
Neural networks have enabled state-of-the-art approaches to achieve incredible results on computer vision tasks such as object detection. However, such success greatly relies on costly computation resources, which hinders people with cheap devices from appreciating the advanced technology. In this paper, we propose Cross Stage Partial Network (CSPNet) to mitigate the problem that previous works require heavy inference computations from the network architecture perspective. We attribute the problem to the duplicate gradient information within network optimization. The proposed networks respect the variability of the gradients by integrating feature maps from the beginning and the end of a network stage, which, in our experiments, reduces computations by 20% with equivalent or even superior accuracy on the ImageNet dataset, and significantly outperforms state-of-the-art approaches in terms of AP <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">50</inf> on the MS COCO object detection dataset. The CSPNet is easy to implement and general enough to cope with architectures based on ResNet, ResNeXt, and DenseNet.
MicroRNAs (miRNAs) are small non-coding RNAs of ∼ 22 nucleotides that are involved in negative regulation of mRNA at the post-transcriptional level. Previously, we developed miRTarBase which provides information about experimentally validated miRNA-target interactions (MTIs). Here, we describe an updated database containing 422 517 curated MTIs from 4076 miRNAs and 23 054 target genes collected from over 8500 articles. The number of MTIs curated by strong evidence has increased ∼1.4-fold since the last update in 2016. In this updated version, target sites validated by reporter assay that are available in the literature can be downloaded. The target site sequence can extract new features for analysis via a machine learning approach which can help to evaluate the performance of miRNA-target prediction tools. Furthermore, different ways of browsing enhance user browsing specific MTIs. With these improvements, miRTarBase serves as more comprehensively annotated, experimentally validated miRNA-target interactions databases in the field of miRNA related research. miRTarBase is available at http://miRTarBase.mbc.nctu.edu.tw/.
MicroRNAs (miRNAs) are small non-coding RNAs (typically consisting of 18-25 nucleotides) that negatively control expression of target genes at the post-transcriptional level. Owing to the biological significance of miRNAs, miRTarBase was developed to provide comprehensive information on experimentally validated miRNA-target interactions (MTIs). To date, the database has accumulated >13,404 validated MTIs from 11,021 articles from manual curations. In this update, a text-mining system was incorporated to enhance the recognition of MTI-related articles by adopting a scoring system. In addition, a variety of biological databases were integrated to provide information on the regulatory network of miRNAs and its expression in blood. Not only targets of miRNAs but also regulators of miRNAs are provided to users for investigating the up- and downstream regulations of miRNAs. Moreover, the number of MTIs with high-throughput experimental evidence increased remarkably (validated by CLIP-seq technology). In conclusion, these improvements promote the miRTarBase as one of the most comprehensively annotated and experimentally validated miRNA-target interaction databases. The updated version of miRTarBase is now available at http://miRTarBase.cuhk.edu.cn/.
We show that the YOLOv4 object detection neural network based on the CSP approach, scales both up and down and is applicable to small and large networks while maintaining optimal speed and accuracy. We propose a network scaling approach that modifies not only the depth, width, resolution, but also structure of the network. YOLOv4-large model achieves state-of-the-art results: 55.5% AP (73.4% AP <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">50</inf> ) for the MS COCO dataset at a speed of ~ 16 FPS on Tesla V100, while with the test time augmentation, YOLOv4-large achieves 56.0% AP (73.3 AP <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">50</inf> ). To the best of our knowledge, this is currently the highest accuracy on the COCO dataset among any published work. The YOLOv4-tiny model achieves 22.0% AP (42.0% AP <inf xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">50</inf> ) at a speed of ~443 FPS on RTX 2080Ti, while by using TensorRT, batch size = 4 and FP16-precision the YOLOv4-tiny achieves 1774 FPS.
Travel time is a fundamental measure in transportation. Accurate travel-time prediction also is crucial to the development of intelligent transportation systems and advanced traveler information systems. We apply support vector regression (SVR) for travel-time prediction and compare its results to other baseline travel-time prediction methods using real highway traffic data. Since support vector machines have greater generalization ability and guarantee global minima for given training data, it is believed that SVR will perform well for time series analysis. Compared to other baseline predictors, our results show that the SVR predictor can significantly reduce both relative mean errors and root-mean-squared errors of predicted travel times. We demonstrate the feasibility of applying SVR in travel-time prediction and prove that SVR is applicable and performs well for traffic data analysis.
Abstract MicroRNAs (miRNAs) are small non-coding RNAs (typically consisting of 18–25 nucleotides) that negatively control expression of target genes at the post-transcriptional level. Owing to the biological significance of miRNAs, miRTarBase was developed to provide comprehensive information on experimentally validated miRNA–target interactions (MTIs). To date, the database has accumulated >13,404 validated MTIs from 11,021 articles from manual curations. In this update, a text-mining system was incorporated to enhance the recognition of MTI-related articles by adopting a scoring system. In addition, a variety of biological databases were integrated to provide information on the regulatory network of miRNAs and its expression in blood. Not only targets of miRNAs but also regulators of miRNAs are provided to users for investigating the up- and downstream regulations of miRNAs. Moreover, the number of MTIs with high-throughput experimental evidence increased remarkably (validated by CLIP-seq technology). In conclusion, these improvements promote the miRTarBase as one of the most comprehensively annotated and experimentally validated miRNA–target interaction databases. The updated version of miRTarBase is now available at http://miRTarBase.cuhk.edu.cn/.
Receptor-like kinases (RLKs) belong to the large RLK/Pelle gene family, and it is known that the Arabidopsis thaliana genome contains >600 such members, which play important roles in plant growth, development, and defense responses. Surprisingly, we found that rice (Oryza sativa) has nearly twice as many RLK/Pelle members as Arabidopsis does, and it is not simply a consequence of a larger predicted gene number in rice. From the inferred phylogeny of all Arabidopsis and rice RLK/Pelle members, we estimated that the common ancestor of Arabidopsis and rice had >440 RLK/Pelles and that large-scale expansions of certain RLK/Pelle members and fusions of novel domains have occurred in both the Arabidopsis and rice lineages since their divergence. In addition, the extracellular domains have higher nonsynonymous substitution rates than the intracellular domains, consistent with the role of extracellular domains in sensing diverse signals. The lineage-specific expansions in Arabidopsis can be attributed to both tandem and large-scale duplications, whereas tandem duplication seems to be the major mechanism for recent expansions in rice. Interestingly, although the RLKs that are involved in development seem to have rarely been duplicated after the Arabidopsis-rice split, those that are involved in defense/disease resistance apparently have undergone many duplication events. These findings led us to hypothesize that most of the recent expansions of the RLK/Pelle family have involved defense/resistance-related genes.
Rain removal from a video is a challenging problem and has been recently investigated extensively. Nevertheless, the problem of rain removal from a single image was rarely studied in the literature, where no temporal information among successive images can be exploited, making the problem very challenging. In this paper, we propose a single-image-based rain removal framework via properly formulating rain removal as an image decomposition problem based on morphological component analysis. Instead of directly applying a conventional image decomposition technique, the proposed method first decomposes an image into the low- and high-frequency (HF) parts using a bilateral filter. The HF part is then decomposed into a “rain component” and a “nonrain component” by performing dictionary learning and sparse coding. As a result, the rain component can be successfully removed from the image while preserving most original image details. Experimental results demonstrate the efficacy of the proposed algorithm.
Recent studies have demonstrated the important role of plant microRNAs (miRNAs) under nutrient deficiencies. In this study, deep sequencing of Arabidopsis (Arabidopsis thaliana) small RNAs was conducted to reveal miRNAs and other small RNAs that were differentially expressed in response to phosphate (Pi) deficiency. About 3.5 million sequence reads corresponding to 0.6 to 1.2 million unique sequence tags from each Pi-sufficient or Pi-deficient root or shoot sample were mapped to the Arabidopsis genome. We showed that upon Pi deprivation, the expression of miR156, miR399, miR778, miR827, and miR2111 was induced, whereas the expression of miR169, miR395, and miR398 was repressed. We found cross talk coordinated by these miRNAs under different nutrient deficiencies. In addition to miRNAs, we identified one Pi starvation-induced DICER-LIKE1-dependent small RNA derived from the long terminal repeat of a retrotransposon and a group of 19-nucleotide small RNAs corresponding to the 5' end of tRNA and expressed at a high level in Pi-starved roots. Importantly, we observed an increased abundance of TAS4-derived trans-acting small interfering RNAs (ta-siRNAs) in Pi-deficient shoots and uncovered an autoregulatory mechanism of PAP1/MYB75 via miR828 and TAS4-siR81(-) that regulates the biosynthesis of anthocyanin. This finding sheds light on the regulatory network between miRNA/ta-siRNA and its target gene. Of note, a substantial amount of miR399* accumulated under Pi deficiency. Like miR399, miR399* can move across the graft junction, implying a potential biological role for miR399*. This study represents a comprehensive expression profiling of Pi-responsive small RNAs and advances our understanding of the regulation of Pi homeostasis mediated by small RNAs.
Recently, the threat of Android malware is spreading rapidly, especially those repackaged Android malware. Although understanding Android malware using dynamic analysis can provide a comprehensive view, it is still subjected to high cost in environment deployment and manual efforts in investigation. In this study, we propose a static feature-based mechanism to provide a static analyst paradigm for detecting the Android malware. The mechanism considers the static information including permissions, deployment of components, Intent messages passing and API calls for characterizing the Android applications behavior. In order to recognize different intentions of Android malware, different kinds of clustering algorithms can be applied to enhance the malware modeling capability. Besides, we leverage the proposed mechanism and develop a system, called Droid Mat. First, the Droid Mat extracts the information (e.g., requested permissions, Intent messages passing, etc) from each application's manifest file, and regards components (Activity, Service, Receiver) as entry points drilling down for tracing API Calls related to permissions. Next, it applies K-means algorithm that enhances the malware modeling capability. The number of clusters are decided by Singular Value Decomposition (SVD) method on the low rank approximation. Finally, it uses kNN algorithm to classify the application as benign or malicious. The experiment result shows that the recall rate of our approach is better than one of well-known tool, Androguard, published in Black hat 2011, which focuses on Android malware analysis. In addition, Droid Mat is efficient since it takes only half of time than Androguard to predict 1738 apps as benign apps or Android malware.
We present a new approach, called local discriminant embedding (LDE), to manifold learning and pattern classification. In our framework, the neighbor and class relations of data are used to construct the embedding for classification problems. The proposed algorithm learns the embedding for the submanifold of each class by solving an optimization problem. After being embedded into a low-dimensional subspace, data points of the same class maintain their intrinsic neighbor relations, whereas neighboring points of different classes no longer stick to one another. Via embedding, new test data are thus more reliably classified by the nearest neighbor rule, owing to the locally discriminating nature. We also describe two useful variants: two-dimensional LDE and kernel LDE. Comprehensive comparisons and extensive experiments on face recognition are included to demonstrate the effectiveness of our method.
This paper shows that a $390 mass-market quad-core 2.4GHz Intel Westmere (Xeon E5620) CPU can create 109000 signatures per second and verify 71000 signatures per second on an elliptic curve at a 2128 security level. Public keys are 32 bytes, and signatures are 64 bytes. These performance figures include strong defenses against software side-channel attacks: there is no data flow from secret keys to array indices, and there is no data flow from secret keys to branch conditions.
This paper explores the possibility of using a large-scale array of microprocessors as a computational facility for the execution of massive numerical computations with a high degree of parallelism. By microprocessor we mean a processor realized on one or a few semiconductor chips that include arithmetic and logical facilities and some memory. The current state of LSI technology makes this approach a feasible and attractive candidate for use in a macrocomputer facility.
Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all the methods used and a statistical analysis of the results. We also demonstrate that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions.
Because of physical constraints, the optimum control of industrial robots is a difficult problem. An alternative approach is to divide the problem into two parts: optimum path planning for off-line processing followed by on-line path tracking. The path tracking can be achieved by adopting the existing approach. The path planning is done at the joint level. Cubic spline functions are used for constructing joint trajectories for industrial robots. The motion of the robot is specified by a sequence of Cartesian knots, i.e., positions and orientations of the hand. For an <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">N</tex> -joint robot, these Cartesian knots are transformed into <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">N</tex> sets of joint displacements, with one set for each joint. Piecewise cubic polynomials are used to fit the sequence of joint displacements for each of the <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">N</tex> joints. Because of the use of the cubic spline function idea, there are only <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n - 2</tex> equations to be solved for each joint, where <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">n</tex> is the number of selected knots. The problem is proved to be uniquely solvable. An algorithm is developed to schedule the time intervals between each pair of adjacent knots such that the total traveling time is minimized subject to the physical constraints on joint velocities, accelerations, and jerks. Fortran programs have been written to implement: 1) the procedure for constructing the cubic polynomial joint trajectories; and 2) the algorithm for minimizing the traveling time. Results are illustrated by means of a numerical example.
BACKGROUND: A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging. RESULTS: We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2. CONCLUSIONS: The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
While fuzzy c-means is a popular soft-clustering method, its effectiveness is largely limited to spherical clusters. By applying kernel tricks, the kernel fuzzy c-means algorithm attempts to address this problem by mapping data with nonlinear relationships to appropriate feature spaces. Kernel combination, or selection, is crucial for effective kernel clustering. Unfortunately, for most applications, it is uneasy to find the right combination. We propose a multiple kernel fuzzy c-means (MKFC) algorithm that extends the fuzzy c-means algorithm with a multiple kernel-learning setting. By incorporating multiple kernels and automatically adjusting the kernel weights, MKFC is more immune to ineffective kernels and irrelevant features. This makes the choice of kernels less crucial. In addition, we show multiple kernel k-means to be a special case of MKFC. Experiments on both synthetic and real-world data demonstrate the effectiveness of the proposed MKFC algorithm.
MOTIVATION: With the increasing availability of large protein-protein interaction networks, the question of protein network alignment is becoming central to systems biology. Network alignment is further delineated into two sub-problems: local alignment, to find small conserved motifs across networks, and global alignment, which attempts to find a best mapping between all nodes of the two networks. In this article, our aim is to improve upon existing global alignment results. Better network alignment will enable, among other things, more accurate identification of functional orthologs across species. RESULTS: We introduce IsoRankN (IsoRank-Nibble) a global multiple-network alignment tool based on spectral clustering on the induced graph of pairwise alignment scores. IsoRankN outperforms existing algorithms for global network alignment in coverage and consistency on multiple alignments of the five available eukaryotic networks. Being based on spectral methods, IsoRankN is both error tolerant and computationally efficient. AVAILABILITY: Our software is available freely for non-commercial purposes on request from: http://isorank.csail.mit.edu/.