The Sense Innovation and Research Center

facilitySion, Switzerland

Research output, citation impact, and the most-cited recent papers from The Sense Innovation and Research Center. Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works

942

Citations

30.2K

h-index

i10-index

439

Also known as

Le Centre d’innovation et de recherche The SenseSense Innovation and Research CenterThe SenseThe Sense Innovation and Research CenterThe Sense Research and Innovation Center

Top-cited papers from The Sense Innovation and Research Center

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

Zhichao Yin, Jianping Shi

20181.3Kdoi:10.1109/cvpr.2018.00212

We propose GeoNet, a jointly unsupervised learning framework for monocular depth, optical flow and egomotion estimation from videos. The three components are coupled by the nature of 3D scene geometry, jointly learned by our framework in an end-to-end manner. Specifically, geometric relationships are extracted over the predictions of individual modules and then combined as an image reconstruction loss, reasoning about static and dynamic scene parts separately. Furthermore, we propose an adaptive geometric consistency loss to increase robustness towards outliers and non-Lambertian regions, which resolves occlusions and texture ambiguities effectively. Experimentation on the KITTI driving dataset reveals that our scheme achieves state-of-the-art results in all of the three tasks, performing better than previously unsupervised methods and comparably with supervised ones.

InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang +4 more

2023880doi:10.1109/cvpr52729.2023.01385

Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs. The effectiveness of our model is proven on challenging benchmarks including ImageNet, COCO, andADE20K. It is worth mentioning that InternImage-H achieved a new record 65.4 mAP on COCO test-dev and 62.9 mIoU on ADE20K, outperforming current leading CNNs and ViTs.

Look at Boundary: A Boundary-Aware Face Alignment Algorithm

Wenyan Wu, Chen Qian, Shuo Yang, Quan Wang +2 more

2018516doi:10.1109/cvpr.2018.00227

We present a novel boundary-aware face alignment algorithm by utilising boundary lines as the geometric structure of a human face to help facial landmark localisation. Unlike the conventional heatmap based method and regression based method, our approach derives face landmarks from boundary lines which remove the ambiguities in the landmark definition. Three questions are explored and answered by this work: 1. Why using boundary? 2. How to use boundary? 3. What is the relationship between boundary estimation and landmarks localisation? Our boundary-aware face alignment algorithm achieves 3.49% mean error on 300-W Fullset, which outperforms state-of-the-art methods by a large margin. Our method can also easily integrate information from other datasets. By utilising boundary information of 300-W dataset, our method achieves 3.92% mean error with 0.39% failure rate on COFW dataset, and 1.25% mean error on AFLW-Full dataset. Moreover, we propose a new dataset WFLW to unify training and testing across different factors, including poses, expressions, illuminations, makeups, occlusions, and blurriness. Dataset and model are publicly available at https://wywu.github.io/projects/LAB/LAB.html

Equalization Loss for Long-Tailed Object Recognition

Jingru Tan, Changbao Wang, Buyu Li, Quanquan Li +3 more

2020476doi:10.1109/cvpr42600.2020.01168

Object recognition techniques using convolutional neural networks (CNN) have achieved great success. However, state-of-the-art object detection methods still perform poorly on large vocabulary and long-tailed datasets, e.g. LVIS. In this work, we analyze this problem from a novel perspective: each positive sample of one category can be seen as a negative sample for other categories, making the tail categories receive more discouraging gradients. Based on it, we propose a simple but effective loss, named equalization loss, to tackle the problem of long-tailed rare categories by simply ignoring those gradients for rare categories. The equalization loss protects the learning of rare categories from being at a disadvantage during the network parameter updating. Thus the model is capable of learning better discriminative features for objects of rare classes. Without any bells and whistles, our method achieves AP gains of 4.1% and 4.8% for the rare and common categories on the challenging LVIS benchmark, compared to the Mask R-CNN baseline. With the utilization of the effective equalization loss, we finally won the 1st place in the LVIS Challenge 2019. Code has been made available at: https://github.com/tztztztztz/eql.detectron2.

3D Human Pose Estimation in the Wild by Adversarial Learning

Wei Yang, Wanli Ouyang, Xiaolong Wang, Jimmy Ren +2 more

2018415doi:10.1109/cvpr.2018.00551

Recently, remarkable advances have been achieved in 3D human pose estimation from monocular images because of the powerful Deep Convolutional Neural Networks (DCNNs). Despite their success on large-scale datasets collected in the constrained lab environment, it is difficult to obtain the 3D pose annotations for in-the-wild images. Therefore, 3D human pose estimation in the wild is still a challenge. In this paper, we propose an adversarial learning framework, which distills the 3D human pose structures learned from the fully annotated dataset to in-the-wild images with only 2D pose annotations. Instead of defining hard-coded rules to constrain the pose estimation results, we design a novel multi-source discriminator to distinguish the predicted 3D poses from the ground-truth, which helps to enforce the pose estimator to generate anthropometrically valid poses even with images in the wild. We also observe that a carefully designed information source for the discriminator is essential to boost the performance. Thus, we design a geometric descriptor, which computes the pairwise relative locations and distances between body joints, as a new information source for the discriminator. The efficacy of our adversarial learning framework with the new geometric descriptor has been demonstrated through extensive experiments on widely used public benchmarks. Our approach significantly improves the performance compared with previous state-of-the-art approaches.

Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition

Shuyang Sun, Zhanghui Kuang, Lu Sheng, Wanli Ouyang +1 more

2018345doi:10.1109/cvpr.2018.00151

Motion representation plays a vital role in human action recognition in videos. In this study, we introduce a novel compact motion representation for video action recognition, named Optical Flow guided Feature (OFF), which enables the network to distill temporal information through a fast and robust approach. The OFF is derived from the definition of optical flow and is orthogonal to the optical flow. The derivation also provides theoretical support for using the difference between two frames. By directly calculating pixel-wise spatio-temporal gradients of the deep feature maps, the OFF could be embedded in any existing CNN based video action recognition framework with only a slight additional cost. It enables the CNN to extract spatiotemporal information, especially the temporal information between frames simultaneously. This simple but powerful idea is validated by experimental results. The network with OFF fed only by RGB inputs achieves a competitive accuracy of 93.3% on UCF-101, which is comparable with the result obtained by two streams (RGB and optical flow), but is 15 times faster in speed. Experimental results also show that OFF is complementary to other motion modalities such as optical flow. When the proposed method is plugged into the state-of-the-art video action recognition framework, it has 96.0% and 74.2% accuracy on UCF-101 and HMDB-51 respectively. The code for this project is available at: https://github.com/kevin-ssy/Optical-Flow-Guided-Feature

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao +4 more

2023283doi:10.1109/cvpr52729.2023.01710

We present a novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and bet-suits modern image backbones. Existing state-of-the-art BEV detectors are often tied to certain depth pretrained backbones like Vo Vn et, hindering the synergy between booming image backbones and BEV detectors. To address this limitation, we prioritize easing the optimization of BEV detectors by introducing perspective view supervision. To this end, we propose a two-stage BEV detector; where proposals from the perspective head are fed into the bird’ s-eye-view head for final predictions. To evaluate the effectiveness of our model, we conduct extensive ablation studies focusing on the form of supervision and the gener-ality of the proposed detector. The proposed method is ver-ified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset. The code shall be released soon.

Monocyte chemoattractant protein 1 mediates retinal detachment-induced photoreceptor apoptosis

Toru Nakazawa, Toshio Hisatomi, Chifuyu Nakazawa, Kosuke Noda +4 more

2007· Proceedings of the National Academy of Sciences269doi:10.1073/pnas.0608167104

Photoreceptor apoptosis is a major cause of visual loss in retinal detachment (RD) and several other visual disorders, but the underlying mechanisms remain elusive. Recently, increased expression of monocyte chemoattractant protein 1 (MCP-1) was reported in vitreous humor samples of patients with RD and diabetic retinopathy as well as in the brain tissues of patients with neurodegenerative diseases, including Alzheimer's disease and multiple sclerosis. Here we report that MCP-1 plays a critical role in mediating photoreceptor apoptosis in an experimental model of RD. RD led to increased MCP-1 expression in the Müller glia and increased CD11b+ macrophage/microglia in the detached retina. An MCP-1 blocking antibody greatly reduced macrophage/microglia infiltration and RD-induced photoreceptor apoptosis. Confirming these results, MCP-1 gene-deficient mice showed significantly reduced macrophage/microglia infiltration after RD and very little photoreceptor apoptosis. In primary retinal mixed cultures, MCP-1 was cytotoxic for recoverin+ photoreceptors, and this toxicity was eliminated through immunodepleting macrophage/microglia from the culture. In vivo, deletion of the gene encoding CD11b/CD18 nearly eliminated macrophage/microglia infiltration to the retina after RD and the loss of photoreceptors. Thus, MCP-1 expression and subsequent macrophage/microglia infiltration and activation are critical for RD-induced photoreceptor apoptosis. This pathway may be an important therapeutic target for preventing photoreceptor apoptosis in RD and other CNS diseases that share a common etiology.

Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs

Liqiang Lu, Yun Liang, Qingcheng Xiao, Shengen Yan

2017243doi:10.1109/fccm.2017.64

In recent years, Convolutional Neural Networks (CNNs) have become widely adopted for computer vision tasks. FPGAs have been adequately explored as a promising hardware accelerator for CNNs due to its high performance, energy efficiency, and reconfigurability. However, prior FPGA solutions based on the conventional convolutional algorithm is often bounded by the computational capability of FPGAs (e.g., the number of DSPs). In this paper, we demonstrate that fast Winograd algorithm can dramatically reduce the arithmetic complexity, and improve the performance of CNNs on FPGAs. We first propose a novel architecture for implementing Winograd algorithm on FPGAs. Our design employs line buffer structure to effectively reuse the feature map data among different tiles. We also effectively pipeline the Winograd PE engine and initiate multiple PEs through parallelization. Meanwhile, there exists a complex design space to explore. We propose an analytical model to predict the resource usage and reason about the performance. Then, we use the model to guide a fast design space exploration. Experiments using the state-of-the-art CNNs demonstrate the best performance and energy efficiency on FPGAs. We achieve an average 1006.4 GOP/s for the convolutional layers and 854.6 GOP/s for the overall AlexNet and an average 3044.7 GOP/s for the convolutional layers and 2940.7 GOP/s for the overall VGG16 on Xilinx ZCU102 platform.

SimMatch: Semi-supervised Learning with Similarity Matching

Mingkai Zheng, Shan You, Lang Huang, Fei Wang +2 more

2022· 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)231doi:10.1109/cvpr52688.2022.01407

Learning with few labeled data has been a longstanding problem in the computer vision and machine learning research community. In this paper, we introduced a new semi-supervised learning framework, SimMatch, which simulta-neously considers semantic similarity and instance similarity. In SimMatch, the consistency regularization will be applied on both semantic-level and instance-level. The different augmented views of the same instance are encouraged to have the same class prediction and similar similarity re-lationship respected to other instances. Next, we instanti-ated a labeled memory buffer to fully leverage the ground truth labels on instance-level and bridge the gaps between the semantic and instance similarities. Finally, we proposed the unfolding and aggregation operation which allows these two similarities be isomorphically transformed with each other. In this way, the semantic and instance pseudo-labels can be mutually propagated to generate more high-quality and reliable matching targets. Extensive ex-perimental results demonstrate that SimMatch improves the performance of semi-supervised learning tasks across dif-ferent benchmark datasets and different settings. Notably, with 400 epochs of training, SimMatch achieves 67.2%, and 74.4% Top-1 Accuracy with 1% and 10% labeled examples on ImageNet, which significantly outperforms the baseline methods and is better than previous semi-supervised learning frameworks.

Learning Dual Convolutional Neural Networks for Low-Level Vision

Jinshan Pan, Sifei Liu, Deqing Sun, Jiawei Zhang +4 more

2018205doi:10.1109/cvpr.2018.00324

In this paper, we propose a general dual convolutional neural network (DualCNN) for low-level vision problems, e.g., super-resolution, edge-preserving filtering, deraining and dehazing. These problems usually involve the estimation of two components of the target signals: structures and details. Motivated by this, our proposed DualCNN consists of two parallel branches, which respectively recovers the structures and details in an end-to-end manner. The recovered structures and details can generate the target signals according to the formation model for each particular application. The DualCNN is a flexible framework for low-level vision tasks and can be easily incorporated into existing CNNs. Experimental results show that the DualCNN can be effectively applied to numerous low-level vision tasks with favorable performance against the state-of-the-art methods.

Deep Comprehensive Correlation Mining for Image Clustering

Jianlong Wu, Keyu Long, Fei Wang, Chen Qian +3 more

2019195doi:10.1109/iccv.2019.00824

Recent developed deep unsupervised methods allow us to jointly learn representation and cluster unlabelled data. These deep clustering methods %like DAC start with mainly focus on the correlation among samples, e.g., selecting high precision pairs to gradually tune the feature representation, which neglects other useful correlations. In this paper, we propose a novel clustering framework, named deep comprehensive correlation mining~(DCCM), for exploring and taking full advantage of various kinds of correlations behind the unlabeled data from three aspects: 1) Instead of only using pair-wise information, pseudo-label supervision is proposed to investigate category information and learn discriminative features. 2) The features' robustness to image transformation of input space is fully explored, which benefits the network learning and significantly improves the performance. 3) The triplet mutual information among features is presented for clustering problem to lift the recently discovered instance-level deep mutual information to a triplet-level formation, which further helps to learn more discriminative features. Extensive experiments on several challenging datasets show that our method achieves good performance, e.g., attaining 62.3% clustering accuracy on CIFAR-10, which is 10.1% higher than the state-of-the-art results.

Sharing of the IL-2 receptor γ chain with the functional IL-9 receptor complex

Yutaka Kimura, Toshikazu Takeshita, Motonari Kondo, Naoto Ishii +3 more

1995· International Immunology191doi:10.1093/intimm/7.1.115

The third subunit, the so-called common gamma (gamma c) chain, of the IL-2 receptor is shared among the receptors for IL-2, IL-4, IL-7 and IL-15, and dysfunction of the gamma c chain is thought to cause X-linked severe combined immunodeficiency (XSCID) ascribed to impairment of early T cell development. However, cytokines linked to XSCID are as yet unidentified. A mAb specific for the gamma c chain, TUGm2, profoundly inhibited cell proliferation in response to IL-9. Another mAb, TUGm3, immunoprecipitated [125I]IL-9 cross-linked with either the IL-9 receptor or the gamma c chain. These results demonstrate that the gamma c chain is included in the functional receptor complex for IL-9, which was initially characterized as a T cell growth factor and is essential for IL-9-dependent growth signal transduction.

Time Course of Angiogenesis and Lymphangiogenesis After Brief Corneal Inflammation

Claus Cursiefen, Kazuichi Maruyama, David G. Jackson, J. Wayne Streilein +1 more

2006· Cornea190doi:10.1097/01.ico.0000183485.85636.ff

PURPOSE: To study the time course of angiogenesis and lymphangiogenesis in the cornea after a short inflammatory insult. This might be helpful for the timing of corneal transplantation in high-risk eyes. METHODS: The mouse model of suture-induced inflammatory corneal neovascularization was used. After placement of 3 interrupted 11-0 sutures into the corneal stroma of BALB/c mice (left in place for 14 days), corneas were excised 2, 3, 5, 7, 14, and 21 days as well as 1, 2, 3, 6, and 8 months after surgery. Hem- and lymphangiogenesis were evaluated using double immunohistochemistry of corneas with CD31/PECAM1 as panendothelial and LYVE-1 as lymphatic endothelial marker. RESULTS: Both blood and lymphatic vessels grew into the cornea as early as day 2 after suture placement. The outgrowth was initially parallel. Hem- and lymphangiogenesis peaked around day 14. Thereafter, both vessel types started to regress. Regression of lymphatic vessels started earlier and was more pronounced than that of blood vessels. Whereas at 6 and 8 months (partly) perfused CD31+++/LYVE-1(-) blood vessels and (nonperfused) ghost vessels could still be observed, there were no CD31+/LYVE-1+++ lymphatic vessels detectable beyond 6 months after this short inflammation. CONCLUSIONS: After a temporary inflammatory insult to the cornea, there is initially parallel outgrowth of both blood and lymphatic vessels. But thereafter, lymphatic vessels regress earlier than blood vessels and are completely regressed by 6 months. Earlier regression of pathologic corneal lymph versus blood vessels suggests that corneal graft survival in high-risk eyes might best be delayed for a prolonged interval following an inflammatory insult.

Unconstrained Face Alignment via Cascaded Compositional Learning

Shizhan Zhu, Cheng Li, Chen Change Loy, Xiaoou Tang

2016190doi:10.1109/cvpr.2016.371

We present a practical approach to address the problem of unconstrained face alignment for a single image. In our unconstrained problem, we need to deal with large shape and appearance variations under extreme head poses and rich shape deformation. To equip cascaded regressors with the capability to handle global shape variation and irregular appearance-shape relation in the unconstrained scenario, we partition the optimisation space into multiple domains of homogeneous descent, and predict a shape as a composition of estimations from multiple domain-specific regressors. With a specially formulated learning objective and a novel tree splitting function, our approach is capable of estimating a robust and meaningful composition. In addition to achieving state-of-the-art accuracy over existing approaches, our framework is also an efficient solution (350 FPS), thanks to the on-the-fly domain exclusion mechanism and the capability of leveraging the fast pixel feature.

Exploring Disentangled Feature Representation Beyond Face Identification

Yu Liu, Fangyin Wei, Jing Shao, Lu Sheng +2 more

2018174doi:10.1109/cvpr.2018.00222

This paper proposes learning disentangled but complementary face features with a minimal supervision by face identification. Specifically, we construct an identity Distilling and Dispelling Autoencoder (D2AE) framework that adversarially learns the identity-distilled features for identity verification and the identity-dispelled features to fool the verification system. Thanks to the design of two-stream cues, the learned disentangled features represent not only the identity or attribute but the complete input image. Comprehensive evaluations further demonstrate that the proposed features not only preserve state-of-the-art identity verification performance on LFW, but also acquire comparable discriminative power for face attribute recognition on CelebA and LFWA. Moreover, the proposed system is ready to semantically control the face generation/editing based on various identities and attributes in an unsupervised manner.

GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting

Yiwen Chen, Zilong Chen, Chi Zhang, Feng Wang +4 more

2024163doi:10.1109/cvpr52733.2024.02029

3D editing plays a crucial role in many areas such as gaming and virtual reality. Traditional 3D editing methods, which rely on representations like meshes and point clouds, often fall short in realistically depicting complex scenes. On the other hand, methods based on implicit 3D representations, like Neural Radiance Field (NeRF), render complex scenes effectively but suffer from slow processing speeds and limited control over specific scene areas. In response to these challenges, our paper presents GaussianEditor, the first 3D editing algorithm based on Gaussian Splatting (GS), a novel 3D representation. GaussianEditor enhances precision and control in editing through our proposed Gaussian semantic tracing, which traces the editing target throughout the training process. Additionally, we propose Hierarchical Gaussian splatting (HGS) to achieve stabilized and fine results under stochastic generative guidance from 2D diffusion models. We also develop editing strategies for efficient object removal and integration, a challenging task for existing methods. Our comprehensive experiments demonstrate GaussianEditor's superior control, effective, and efficient performance, marking a significant advancement in 3D editing.

Sparsifying Neural Network Connections for Face Recognition

Yi Sun, Xiaogang Wang, Xiaoou Tang

2016157doi:10.1109/cvpr.2016.525

This paper proposes to learn high-performance deep ConvNets with sparse neural connections, referred to as sparse ConvNets, for face recognition. The sparse ConvNets are learned in an iterative way, each time one additional layer is sparsified and the entire model is re-trained given the initial weights learned in previous iterations. One important finding is that directly training the sparse ConvNet from scratch failed to find good solutions for face recognition, while using a previously learned denser model to properly initialize a sparser model is critical to continue learning effective features for face recognition. This paper also proposes a new neural correlation-based weight selection criterion and empirically verifies its effectiveness in selecting informative connections from previously learned models in each iteration. When taking a moderately sparse structure (26%-76% of weights in the dense model), the proposed sparse ConvNet model significantly improves the face recognition performance of the previous state-of-the-art DeepID2+ models given the same training data, while it keeps the performance of the baseline model with only 12% of the original parameters.

Deep Group-Shuffling Random Walk for Person Re-identification

Yantao Shen, Hongsheng Li, Tong Xiao, Shuai Yi +2 more

2018149doi:10.1109/cvpr.2018.00241

Person re-identification aims at finding a person of interest in an image gallery by comparing the probe image of this person with all the gallery images. It is generally treated as a retrieval problem, where the affinities between the probe image and gallery images (P2G affinities) are used to rank the retrieved gallery images. However, most existing methods only consider P2G affinities but ignore the affinities between all the gallery images (G2G affinity). Some frameworks incorporated G2G affinities into the testing process, which is not end-to-end trainable for deep neural networks. In this paper, we propose a novel group-shuffling random walk network for fully utilizing the affinity information between gallery images in both the training and testing processes. The proposed approach aims at end-to-end refining the P2G affinities based on G2G affinity information with a simple yet effective matrix operation, which can be integrated into deep neural networks. Feature grouping and group shuffle are also proposed to apply rich supervisions for learning better person features. The proposed approach outperforms state-of-the-art methods on the Market-1501, CUHK03, and DukeMTMC datasets by large margins, which demonstrate the effectiveness of our approach.

LSTM Pose Machines

Yue Luo, Jimmy Ren, Zhouxia Wang, Wenxiu Sun +4 more

2018145doi:10.1109/cvpr.2018.00546

We observed that recent state-of-the-art results on single image human pose estimation were achieved by multistage Convolution Neural Networks (CNN). Notwithstanding the superior performance on static images, the application of these models on videos is not only computationally intensive, it also suffers from performance degeneration and flicking. Such suboptimal results are mainly attributed to the inability of imposing sequential geometric consistency, handling severe image quality degradation (e.g. motion blur and occlusion) as well as the inability of capturing the temporal correlation among video frames. In this paper, we proposed a novel recurrent network to tackle these problems. We showed that if we were to impose the weight sharing scheme to the multi-stage CNN, it could be re-written as a Recurrent Neural Network (RNN). This property decouples the relationship among multiple network stages and results in significantly faster speed in invoking the network for videos. It also enables the adoption of Long Short-Term Memory (LSTM) units between video frames. We found such memory augmented RNN is very effective in imposing geometric consistency among frames. It also well handles input quality degradation in videos while successfully stabilizes the sequential outputs. The experiments showed that our approach significantly outperformed current state-of-the-art methods on two large-scale video pose estimation benchmarks. We also explored the memory cells inside the LSTM and provided insights on why such mechanism would benefit the prediction for video-based pose estimations.1

Search all NobleBlocks papers mentioning “The Sense Innovation and Research Center” →