Korea Post
governmentSeoul, South Korea
Research output, citation impact, and the most-cited recent papers from Korea Post (South Korea). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Korea Post
We propose a novel semantic segmentation algorithm by learning a deep deconvolution network. We learn the network on top of the convolutional layers adopted from VGG 16-layer net. The deconvolution network is composed of deconvolution and unpooling layers, which identify pixelwise class labels and predict segmentation masks. We apply the trained network to each proposal in an input image, and construct the final semantic segmentation map by combining the results from all proposals in a simple manner. The proposed algorithm mitigates the limitations of the existing methods based on fully convolutional networks by integrating deep deconvolution network and proposal-wise prediction, our segmentation method typically identifies detailed structures and handles objects in multiple scales naturally. Our network demonstrates outstanding performance in PASCAL VOC 2012 dataset, and we achieve the best accuracy (72.5%) among the methods trained without using Microsoft COCO dataset through ensemble with the fully convolutional network.
We propose a novel visual tracking algorithm based on the representations from a discriminatively trained Convolutional Neural Network (CNN). Our algorithm pretrains a CNN using a large set of videos with tracking ground-truths to obtain a generic target representation. Our network is composed of shared layers and multiple branches of domain-specific layers, where domains correspond to individual training sequences and each branch is responsible for binary classification to identify target in each domain. We train each domain in the network iteratively to obtain generic target representations in the shared layers. When tracking a target in a new sequence, we construct a new network by combining the shared layers in the pretrained CNN with a new binary classification layer, which is updated online. Online tracking is performed by evaluating the candidate windows randomly sampled around the previous target state. The proposed algorithm illustrates outstanding performance in existing tracking benchmarks.
The FactSage computer package consists of a series of information, calculation and manipulation modules that enable one to access and manipulate compound and solution databases. With the various modules running under Microsoft Windows® one can perform a wide variety of thermochemical calculations and generate tables, graphs and figures of interest to chemical and physical metallurgists, chemical engineers, corrosion engineers, inorganic chemists, geochemists, ceramists, electrochemists, environmentalists, etc. This paper presents a summary of the developments in the FactSage thermochemical software and databases during the last six years. Particular emphasis is placed on the new databases and developments in calculating and manipulating phase diagrams.
Knowledge distillation aims at transferring knowledge acquired in one model (a teacher) to another model (a student) that is typically smaller. Previous approaches can be expressed as a form of training the student to mimic output activations of individual data examples represented by the teacher. We introduce a novel approach, dubbed relational knowledge distillation (RKD), that transfers mutual relations of data examples instead. For concrete realizations of RKD, we propose distance-wise and angle-wise distillation losses that penalize structural differences in relations. Experiments conducted on different tasks show that the proposed method improves educated student models with a significant margin. In particular for metric learning, it allows students to outperform their teachers' performance, achieving the state of the arts on standard benchmark datasets.
This paper presents a fast deblurring method that produces a deblurring result from a single image of moderate size in a few seconds. We accelerate both latent image estimation and kernel estimation in an iterative deblurring process by introducing a novel prediction step and working with image derivatives rather than pixel values. In the prediction step, we use simple image processing techniques to predict strong edges from an estimated latent image, which will be solely used for kernel estimation. With this approach, a computationally efficient Gaussian prior becomes sufficient for deconvolution to estimate the latent image, as small deconvolution artifacts can be suppressed in the prediction. For kernel estimation, we formulate the optimization function using image derivatives, and accelerate the numerical process by reducing the number of Fourier transforms needed for a conjugate gradient method. We also show that the formulation results in a smaller condition number of the numerical system than the use of pixel values, which gives faster convergence. Experimental results demonstrate that our method runs an order of magnitude faster than previous work, while the deblurring quality is comparable. GPU implementation facilitates further speed-up, making our method fast enough for practical use.
The deficiency of segmentation labels is one of the main obstacles to semantic segmentation in the wild. To alleviate this issue, we present a novel framework that generates segmentation labels of images given their image-level class labels. In this weakly supervised setting, trained models have been known to segment local discriminative parts rather than the entire object area. Our solution is to propagate such local responses to nearby areas which belong to the same semantic entity. To this end, we propose a Deep Neural Network (DNN) called AffinityNet that predicts semantic affinity between a pair of adjacent image coordinates. The semantic propagation is then realized by random walk with the affinities predicted by AffinityNet. More importantly, the supervision employed to train AffinityNet is given by the initial discriminative part segmentation, which is incomplete as a segmentation annotation but sufficient for learning semantic affinities within small image areas. Thus the entire framework relies only on image-level class labels and does not require any extra data or annotations. On the PASCAL VOC 2012 dataset, a DNN learned with segmentation labels generated by our method outperforms previous models trained with the same level of supervision, and is even as competitive as those relying on stronger supervision.
We propose an attentive local feature descriptor suitable for large-scale image retrieval, referred to as DELE (DEep Local Feature). The new feature is based on convolutional neural networks, which are trained only with image-level annotations on a landmark image dataset. To identify semantically useful local features for image retrieval, we also propose an attention mechanism for key point selection, which shares most network layers with the descriptor. This frame-work can be used for image retrieval as a drop-in replacement for other keypoint detectors and descriptors, enabling more accurate feature matching and geometric verification. Our system produces reliable confidence scores to reject false positives–in particular, it is robust against queries that have no correct match in the database. To evaluate the proposed descriptor, we introduce a new large-scale dataset, referred to as Google-Landmarks dataset, which involves challenges in both database and query such as background clutter, partial occlusion, multiple landmarks, objects in variable scales, etc. We show that DELE outperforms the state-of-the-art global and local descriptors in the large-scale setting by significant margins.
Extracting geometric features from 3D scans or point clouds is the first step in applications such as registration, reconstruction, and tracking. State-of-the-art methods require computing low-level features as input or extracting patch-based features with limited receptive field. In this work, we present fully-convolutional geometric features, computed in a single pass by a 3D fully-convolutional network. We also present new metric learning losses that dramatically improve performance. Fully-convolutional geometric features are compact, capture broad spatial context, and scale to large scenes. We experimentally validate our approach on both indoor and outdoor datasets. Fully-convolutional geometric features achieve state-of-the-art accuracy without requiring prepossessing, are compact (32 dimensions), and are 290 times faster than the most accurate prior method.
This paper presents a novel approach for learning instance segmentation with image-level class labels as supervision. Our approach generates pseudo instance segmentation labels of training images, which are used to train a fully supervised model. For generating the pseudo labels, we first identify confident seed areas of object classes from attention maps of an image classification model, and propagate them to discover the entire instance areas with accurate boundaries. To this end, we propose IRNet, which estimates rough areas of individual instances and detects boundaries between different object classes. It thus enables to assign instance labels to the seeds and to propagate them within the boundaries so that the entire areas of instances can be estimated accurately. Furthermore, IRNet is trained with inter-pixel relations on the attention maps, thus no extra supervision is required. Our method with IRNet achieves an outstanding performance on the PASCAL VOC 2012 dataset, surpassing not only previous state-of-the-art trained with the same level of supervision, but also some of previous models relying on stronger supervision.
We propose a novel unsupervised domain adaptation framework based on domain-specific batch normalization in deep neural networks. We aim to adapt to both domains by specializing batch normalization layers in convolutional neural networks while allowing them to share all other model parameters, which is realized by a two-stage algorithm. In the first stage, we estimate pseudo-labels for the examples in the target domain using an external unsupervised domain adaptation algorithm-for example, MSTN or CPUA-integrating the proposed domain-specific batch normalization. The second stage learns the final models using a multi-task classification loss for the source and target domains. Note that the two domains have separate batch normalization layers in both stages. Our framework can be easily incorporated into the domain adaptation techniques based on deep neural networks with batch normalization layers. We also present that our approach can be extended to the problem with multiple source domains. The proposed algorithm is evaluated on multiple benchmark datasets and achieves the state-of-the-art accuracy in the standard setting and the multi-source domain adaption scenario.
The aim of this study was to build a mechanically enhanced three-dimensional (3D) bioprinted construct containing two different cell types for osteochondral tissue regeneration. Recently, the production of 3D cell-laden structures using various scaffold-free cell printing technologies has opened up new possibilities. However, ideal 3D complex tissues or organs have not yet been printed because gel-state hydrogels have been used as the principal material and are unable to maintain the desired 3D structure due to their poor mechanical strength. In this study, thermoplastic biomaterial polycaprolactone (PCL), which shows relatively high mechanical properties as compared with hydrogel, was used as a framework for enhancing the mechanical stability of the bioprinted construct. Two different alginate solutions were then infused into the previously prepared framework consisting of PCL to create the 3D construct for osteochondral printing. For this work, a multi-head tissue/organ building system (MtoBS), which was particularly designed to dispense thermoplastic biomaterial and hydrogel having completely different rheology properties, was newly developed and used to bioprint osteochondral tissue. It was confirmed that the line width, position and volume control of PCL and alginate solutions were adjustable in the MtoBS. Most importantly, dual cell-laden 3D constructs consisting of osteoblasts and chondrocytes were successfully fabricated. Further, the separately dispensed osteoblasts and chondrocytes not only retained their initial position and viability, but also proliferated up to 7 days after being dispensed.
The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL.E 2, autoregressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that naïvely increasing the capacity of the StyleGan architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel images in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.
We tackle image question answering (ImageQA) problem by learning a convolutional neural network (CNN) with a dynamic parameter layer whose weights are determined adaptively based on questions. For the adaptive parameter prediction, we employ a separate parameter prediction network, which consists of gated recurrent unit (GRU) taking a question as its input and a fully-connected layer generating a set of candidate weights as its output. However, it is challenging to construct a parameter prediction network for a large number of parameters in the fully-connected dynamic parameter layer of the CNN. We reduce the complexity of this problem by incorporating a hashing technique, where the candidate weights given by the parameter prediction network are selected using a predefined hash function to determine individual weights in the dynamic parameter layer. The proposed network-joint network with the CNN for ImageQA and the parameter prediction network-is trained end-to-end through back-propagation, where its weights are initialized using a pre-trained CNN and GRU. The proposed algorithm illustrates the state-of-the-art performance on all available public ImageQA benchmarks.
Even with the advent of more sophisticated, data-hungry methods, boosted deci-sion trees remain extraordinarily successful for fast rigid object detection, achiev-ing top accuracy on numerous datasets. While effective, most boosted detectors use decision trees with orthogonal (single feature) splits, and the topology of the resulting decision boundary may not be well matched to the natural topology of the data. Given highly correlated data, decision trees with oblique (multiple fea-ture) splits can be effective. Use of oblique splits, however, comes at considerable computational expense. Inspired by recent work on discriminative decorrelation of HOG features, we instead propose an efficient feature transform that removes correlations in local neighborhoods. The result is an overcomplete but locally decorrelated representation ideally suited for use with orthogonal decision trees. In fact, orthogonal trees with our locally decorrelated features outperform oblique trees trained over the original features at a fraction of the computational cost. The overall improvement in accuracy is dramatic: on the Caltech Pedestrian Dataset, we reduce false positives nearly tenfold over the previous state-of-the-art. 1
We have generated 47,932 T-DNA tag lines in japonica rice using activation-tagging vectors that contain tetramerized 35S enhancer sequences. To facilitate use of those lines, we isolated the genomic sequences flanking the inserted T-DNA via inverse polymerase chain reaction. For most of the lines, we performed four sets of amplifications using two different restriction enzymes toward both directions. In analyzing 41,234 lines, we obtained 27,621 flanking sequence tags (FSTs), among which 12,505 were integrated into genic regions and 15,116 into intergenic regions. Mapping of the FSTs on chromosomes revealed that T-DNA integration frequency was generally proportional to chromosome size. However, T-DNA insertions were non-uniformly distributed on each chromosome: higher at the distal ends and lower in regions close to the centromeres. In addition, several regions showed extreme peaks and valleys of insertion frequency, suggesting hot and cold spots for T-DNA integration. The density of insertion events was somewhat correlated with expressed, rather than predicted, gene density along each chromosome. Analyses of expression patterns near the inserted enhancer showed that at least half the test lines displayed greater expression of the tagged genes. Whereas in most of the increased lines expression patterns after activation were similar to those in the wild type, thereby maintaining the endogenous patterns, the remaining lines showed changes in expression in the activation tagged lines. In this case, ectopic expression was most frequently observed in mature leaves. Currently, the database can be searched with the gene locus number or location on the chromosome at http://www.postech.ac.kr/life/pfg/risd. On request, seeds of the T(1) or T(2) plants will be provided to the scientific community.
This paper presents a fast deblurring method that produces a deblurring result from a single image of moderate size in a few seconds. We accelerate both latent image estimation and kernel estimation in an iterative deblurring process by introducing a novel prediction step and working with image derivatives rather than pixel values. In the prediction step, we use simple image processing techniques to predict strong edges from an estimated latent image, which will be solely used for kernel estimation. With this approach, a computationally efficient Gaussian prior becomes sufficient for deconvolution to estimate the latent image, as small deconvolution artifacts can be suppressed in the prediction. For kernel estimation, we formulate the optimization function using image derivatives, and accelerate the numerical process by reducing the number of Fourier transforms needed for a conjugate gradient method. We also show that the formulation results in a smaller condition number of the numerical system than the use of pixel values, which gives faster convergence. Experimental results demonstrate that our method runs an order of magnitude faster than previous work, while the deblurring quality is comparable. GPU implementation facilitates further speed-up, making our method fast enough for practical use.
In multi-class indoor semantic segmentation using RGB-D data, it has been shown that incorporating depth feature into RGB feature is helpful to improve segmentation accuracy. However, previous studies have not fully exploited the potentials of multi-modal feature fusion, e.g., simply concatenating RGB and depth features or averaging RGB and depth score maps. To learn the optimal fusion of multimodal features, this paper presents a novel network that extends the core idea of residual learning to RGB-D semantic segmentation. Our network effectively captures multilevel RGB-D CNN features by including multi-modal feature fusion blocks and multi-level feature refinement blocks. Feature fusion blocks learn residual RGB and depth features and their combinations to fully exploit the complementary characteristics of RGB and depth data. Feature refinement blocks learn the combination of fused features from multiple levels to enable high-resolution prediction. Our network can efficiently train discriminative multi-level features from each modality end-to-end by taking full advantage of skip-connections. Our comprehensive experiments demonstrate that the proposed architecture achieves the state-of-the-art accuracy on two challenging RGB-D indoor datasets, NYUDv2 and SUN RGB-D.
This paper presents a non-photorealistic rendering technique that automatically generates a line drawing from a photograph. We aim at extracting a set of coherent, smooth, and stylistic lines that effectively capture and convey important shapes in the image. We first develop a novel method for constructing a smooth direction field that preserves the flow of the salient image features. We then introduce the notion of flow-guided anisotropic filtering for detecting highly coherent lines while suppressing noise. Our method is simple and easy to implement. A variety of experimental results are presented to show the effectiveness of our method in producing self-contained, high-quality line illustrations.
This paper addresses the problem of text-to-video temporal grounding, which aims to identify the time interval in a video semantically relevant to a text query. We tackle this problem using a novel regression-based model that learns to extract a collection of mid-level features for semantic phrases in a text query, which corresponds to important semantic entities described in the query (e.g., actors, objects, and actions), and reflect bi-modal interactions between the linguistic features of the query and the visual features of the video in multiple levels. The proposed method effectively predicts the target time interval by exploiting contextual information from local to global during bi-modal interactions. Through in-depth ablation studies, we find out that incorporating both local and global context in video and text interactions is crucial to the accurate grounding. Our experiment shows that the proposed method outperforms the state of the arts on Charades-STA and ActivityNet Captions datasets by large margins, 7.44% and 4.61% points at Recall@tIoU=0.5 metric, respectively.
Synthesis, optical properties, and applications of various types of benzocoumarin compounds are overviewed.