Max Planck Center for Visual Computing and Communication

facilitySaarbrücken, Germany

Research output, citation impact, and the most-cited recent papers from Max Planck Center for Visual Computing and Communication (Germany). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works

Citations

2.2K

h-index

i10-index

Also known as

Max Planck Center for Visual Computing and Communication

Top-cited papers from Max Planck Center for Visual Computing and Communication

Non‐Rigid Registration Under Isometric Deformations

Qixing Huang, Bart Adams, Martin Wicke, Leonidas Guibas

2008· Computer Graphics Forum277doi:10.1111/j.1467-8659.2008.01285.x

Abstract We present a robust and efficient algorithm for the pairwise non‐rigid registration of partially overlapping 3D surfaces. Our approach treats non‐rigid registration as an optimization problem and solves it by alternating between correspondence and deformation optimization. Assuming approximately isometric deformations, robust correspondences are generated using a pruning mechanism based on geodesic consistency. We iteratively learn an appropriate deformation discretization from the current set of correspondences and use it to update the correspondences in the next iteration. Our algorithm is able to register partially similar point clouds that undergo large deformations, in just a few seconds. We demonstrate the potential of our algorithm in various applications such as example based articulated segmentation, and shape interpolation.

Selecting good views of high‐dimensional data using class consistency

Mike Sips, Boris Neubert, John Lewis, Pat Hanrahan

2009· Computer Graphics Forum206doi:10.1111/j.1467-8659.2009.01467.x

Abstract Many visualization techniques involve mapping high‐dimensional data spaces to lower‐dimensional views. Unfortunately, mapping a high‐dimensional data space into a scatterplot involves a loss of information; or, even worse, it can give a misleading picture of valuable structure in higher dimensions. In this paper, we propose class consistency as a measure of the quality of the mapping. Class consistency enforces the constraint that classes of n–D data are shown clearly in 2–D scatterplots. We propose two quantitative measures of class consistency, one based on the distance to the class's center of gravity, and another based on the entropies of the spatial distributions of classes. We performed an experiment where users choose good views, and show that class consistency has good precision and recall. We also evaluate both consistency measures over a range of data sets and show that these measures are efficient and robust.

Automatic Conversion of Mesh Animations into Skeleton‐based Animations

Edilson de Aguiar, Christian Theobalt, Sebastian Thrun, Hans‐Peter Seidel

2008· Computer Graphics Forum114doi:10.1111/j.1467-8659.2008.01136.x

Abstract Recently, it has become increasingly popular to represent animations not by means of a classical skeleton‐based model, but in the form of deforming mesh sequences. The reason for this new trend is that novel mesh deformation methods as well as new surface based scene capture techniques offer a great level of flexibility during animation creation. Unfortunately, the resulting scene representation is less compact than skeletal ones and there is not yet a rich toolbox available which enables easy post‐processing and modification of mesh animations. To bridge this gap between the mesh‐based and the skeletal paradigm, we propose a new method that automatically extracts a plausible kinematic skeleton, skeletal motion parameters, as well as surface skinning weights from arbitrary mesh animations. By this means, deforming mesh sequences can be fully‐automatically transformed into fullyrigged virtual subjects. The original input can then be quickly rendered based on the new compact bone and skin representation, and it can be easily modified using the full repertoire of already existing animation tools.

Multiview Video Compression

Markus Flierl, Bernd Girod

2007· IEEE Signal Processing Magazine111doi:10.1109/msp.2007.905699

Due to the vast raw bit rate of multiview video, efficient compression techniques are essential for 3D scene communication. As the video data originate from the same scene, the inherent similarities of the multiview imagery are exploited for efficient compression. These similarities can be classified into two types, inter-view similarity between adjacent camera views and temporal similarity between temporally successive images of each video.

Shape Decomposition using Modal Analysis

Qixing Huang, Martin Wicke, Bart Adams, Leonidas Guibas

2009· Computer Graphics Forum92doi:10.1111/j.1467-8659.2009.01380.x

Abstract We introduce a novel algorithm that decomposes a deformable shape into meaningful parts requiring only a single input pose. Using modal analysis, we are able to identify parts of the shape that tend to move rigidly. We define a deformation energy on the shape, enabling modal analysis to find the typical deformations of the shape. We then find a decomposition of the shape such that the typical deformations can be well approximated with deformation fields that are rigid in each part of the decomposition. We optimize for the best decomposition, which captures how the shape deforms. A hierarchical refinement scheme makes it possible to compute more detailed decompositions for some parts of the shape. Although our algorithm does not require user intervention, it is possible to control the process by directly changing the deformation energy, or interactively refining the decomposition as necessary. Due to the construction of the energy function and the properties of modal analysis, the computed decompositions are robust to changes in pose as well as meshing, noise, and even imperfections such as small holes in the surface.

Wasserstein Propagation for Semi-Supervised Learning

Justin Solomon, Raif M. Rustamov, Leonidas Guibas, Adrian Butscher

201484

Probability distributions and histograms are nat-ural representations for product ratings, traffic measurements, and other data considered in many machine learning applications. Thus, this pa-per introduces a technique for graph-based semi-supervised learning of histograms, derived from the theory of optimal transportation. Our method has several properties making it suitable for this application; in particular, its behavior can be char-acterized by the moments and shapes of the his-tograms at the labeled nodes. In addition, it can be used for histograms on non-standard domains like circles, revealing a strategy for manifold-valued semi-supervised learning. We also extend this technique to related problems such as smoothing distributions on graph nodes. 1.

Meshless Modeling of Deformable Shapes and their Motion

Bart Adams, Maks Ovsjanikov, Michael Wand, Hans‐Peter Seidel +1 more

2008· PubMed40doi:10.2312/sca/sca08/077-086

We present a new framework for interactive shape deformation modeling and key frame interpolation based on a meshless finite element formulation. Starting from a coarse nodal sampling of an object's volume, we formulate rigidity and volume preservation constraints that are enforced to yield realistic shape deformations at interactive frame rates. Additionally, by specifying key frame poses of the deforming shape and optimizing the nodal displacements while targeting smooth interpolated motion, our algorithm extends to a motion planning framework for deformable objects. This allows reconstructing smooth and plausible deformable shape trajectories in the presence of possibly moving obstacles. The presented results illustrate that our framework can handle complex shapes at interactive rates and hence is a valuable tool for animators to realistically and efficiently model and interpolate deforming 3D shapes.

Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold

Xingang Pan, Ayush Tewari, Thomas Leimkühler, Lingjie Liu +2 more

2023· arXiv (Cornell University)11doi:10.48550/arxiv.2305.10973

Synthesizing visual content that meets users' needs often requires flexible and precise controllability of the pose, shape, expression, and layout of the generated objects. Existing approaches gain controllability of generative adversarial networks (GANs) via manually annotated training data or a prior 3D model, which often lack flexibility, precision, and generality. In this work, we study a powerful yet much less explored way of controlling GANs, that is, to "drag" any points of the image to precisely reach target points in a user-interactive manner, as shown in Fig.1. To achieve this, we propose DragGAN, which consists of two main components: 1) a feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative generator features to keep localizing the position of the handle points. Through DragGAN, anyone can deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout of diverse categories such as animals, cars, humans, landscapes, etc. As these manipulations are performed on the learned generative image manifold of a GAN, they tend to produce realistic outputs even for challenging scenarios such as hallucinating occluded content and deforming shapes that consistently follow the object's rigidity. Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking. We also showcase the manipulation of real images through GAN inversion.

Illumination-Invariant Robust Multiview 3D Human Motion Capture

Nadia Robertini, Florian Bernard, Weipeng Xu, Christian Theobalt

20186doi:10.1109/wacv.2018.00185

In this work we address the problem of capturing human body motion under changing lighting conditions in a multiview setup. In order to account for changing lighting conditions we propose to use an intermediate image representation that is invariant to the scene lighting. In our approach this is achieved by solving time-varying segmentation problems that use frame- and view-dependent appearance costs that are able to adjust to the present conditions. Moreover, we use an adaptive combination of our lighting-invariant segmentation with CNN-based joint detectors in order to increase the robustness to segmentation errors. In our experimental validation we demonstrate that our method is able to handle difficult conditions better than existing works.

EventEgo3D++: 3D Human Motion Capture from A Head-Mounted Event Camera

Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon +4 more

2025· International Journal of Computer Vision1doi:10.1007/s11263-025-02489-1

Monocular egocentric 3D human motion capture remains a significant challenge, particularly under conditions of low lighting and fast movements, which are common in head-mounted device applications. Existing methods that rely on RGB cameras often fail under these conditions. To address these limitations, we introduce EventEgo3D++, the first approach that leverages a monocular event camera with a fisheye lens for 3D human motion capture. Event cameras excel in high-speed scenarios and varying illumination due to their high temporal resolution, providing reliable cues for accurate 3D human motion capture. EventEgo3D++ leverages the LNES representation of event streams to enable precise 3D reconstructions. We have also developed a mobile head-mounted device (HMD) prototype equipped with an event camera, capturing a comprehensive dataset that includes real event observations from both controlled studio environments and in-the-wild settings, in addition to a synthetic dataset. Additionally, to provide a more holistic dataset, we include allocentric RGB streams that offer different perspectives of the HMD wearer, along with their corresponding SMPL body model. Our experiments demonstrate that EventEgo3D++ achieves superior 3D accuracy and robustness compared to existing solutions, even in challenging conditions. Moreover, our method supports real-time 3D pose updates at a rate of 140Hz. This work is an extension of the EventEgo3D approach (CVPR 2024) and further advances the state of the art in egocentric 3D human motion capture. For more details, visit the project page at https://eventego3d.mpi-inf.mpg.de.

Predicting human face discrimination performance and FFA activation using a computational model of face neurons

Xiong Jiang, Thomas A. Zeffiro, John W. VanMeter, Volker Blanz +1 more

2010· Journal of Visiondoi:10.1167/5.8.631

The nature of the quantitative relationship between stimuli, neural activation and behavior is crucial to understanding how the brain performs complex cognitive tasks such as face perception, yet it is still poorly understood. We have recently presented a computational model of face processing in cortex (Rosen & Riesenhuber, VSS, 2004). The model shows that a population of highly selective “face neurons” (FN) can explain human face discrimination performance and effects such as the “Face Inversion Effect”. This predicts a direct link between FN tuning specificity and discrimination performance: If face discrimination is based on the comparisons of FN activation patterns, performance should increase with dissimilarity between target (T) and distractor (D) faces, as the corresponding activation patterns get increasingly dissimilar. Crucially, due to the tight tuning of FNs, for some T-D dissimilarity, both will activate disjoint subpopulations of FNs, and performance should asymptote, as further increasing the T-D dissimiliarity will not increase the dissimilarity of FN activation patterns. Likewise, in an fMRI rapid adaptation paradigm (fMRI-RA), adaptation of FFA FN stimulated with pairs of faces of increasing dissimilarity should decrease, and asymptote when the faces activate different subpopulations of FN. We used the model of FN to quantitatively predict the T-D dissimilarity (using morphed faces of parametrically varied similarity (Blanz & Vetter, 1999)) for which BOLD adaptation and behavior are expected to asymptote. We then conducted psychophysical (2AFC, 9 subjects) and fMRI-RA experiments (6 subjects, 3T Siemens Trio magnet) to test these predictions. We find that i) behavioral performance asymptotes as predicted, ii) BOLD adaptation decreases and asymptotes with increasing T-D dissimilarity, as predicted, and iii) the asymptotes in behavior and fMRI are in good agreement with model predictions. This supports the predicted quantitative link of FN tuning, FFA response, and behavior.

Virtual 3D bladder reconstruction for augmented medical records from white light cystoscopy (Conference Presentation)

Kristen L. Lurie, Dimitar V. Zlatev, Roland Angst, Joseph C. Liao +1 more

2016doi:10.1117/12.2213050

Bladder cancer has a high recurrence rate that necessitates lifelong surveillance to detect mucosal lesions. Examination with white light cystoscopy (WLC), the standard of care, is inherently subjective and data storage limited to clinical notes, diagrams, and still images. A visual history of the bladder wall can enhance clinical and surgical management. To address this clinical need, we developed a tool to transform in vivo WLC videos into virtual 3-dimensional (3D) bladder models using advanced computer vision techniques. WLC videos from rigid cystoscopies (1280 x 720 pixels) were recorded at 30 Hz followed by immediate camera calibration to control for image distortions. Video data were fed into an automated structure-from-motion algorithm that generated a 3D point cloud followed by a 3D mesh to approximate the bladder surface. The highest quality cystoscopic images were projected onto the approximated bladder surface to generate a virtual 3D bladder reconstruction. In intraoperative WLC videos from 36 patients undergoing transurethral resection of suspected bladder tumors, optimal reconstruction was achieved from frames depicting well-focused vasculature, when the bladder was maintained at constant volume with minimal debris, and when regions of the bladder wall were imaged multiple times. A significant innovation of this work is the ability to perform the reconstruction using video from a clinical procedure collected with standard equipment, thereby facilitating rapid clinical translation, application to other forms of endoscopy and new opportunities for longitudinal studies of cancer recurrence.

A novel representation for digital scenes

Thorsten Herfet, Christopher Haccius, V.I. Matvienko, Santi Fort

2013doi:10.1145/2503385.2503415

SCENE is an ongoing research project dedicated to create and deliver richer media experiences [Hilton and Fuenmayor 2013]. A con-sortium of international research and industry partners aims to enhance the whole chain of multidimensional media production: new capturing devices, advanced processing tools and a dedicated renderer. Central is a novel scene representation which enhances and facilitates production processes of video content.

3DPR: Single Image 3D Portrait Relighting with Generative Priors

Pramod Rao, Abhimitra Meka, Xilong Zhou, Gereon Fox +4 more

2025doi:10.1145/3757377.3763962

Rendering novel, relit views of a human head, given a monocular portrait image as input, is an inherently underconstrained problem. The traditional graphics solution is to explicitly decompose the input image into geometry, material and lighting via differentiable rendering; but this is constrained by the multiple assumptions and approximations of the underlying models and parameterizations of these scene components. We propose 3DPR, an image-based relighting model that leverages generative priors learnt from multi-view One-Light-at-A-Time (OLAT) images captured in a light stage. We introduce a new diverse and large-scale multi-view 4K OLAT dataset of 139 subjects to learn a high-quality prior over the distribution of high-frequency face reflectance. We leverage the latent space of a pre-trained generative head model that provides a rich prior over face geometry learnt from in-the-wild image datasets. The input portrait is first embedded in the latent manifold of such a model through an encoder-based inversion process. Then a novel triplane-based reflectance network trained on our lightstage data is used to synthesize high-fidelity OLAT images to enable image-based relighting. Our reflectance network operates in the latent space of the generative head model, crucially enabling a relatively small number of lightstage images to train the reflectance model. Combining the generated OLATs according to a given HDRI environment maps yields physically accurate environmental relighting results. Through quantitative and qualitative evaluations, we demonstrate that 3DPR outperforms previous methods, particularly in preserving identity and in capturing lighting effects such as specularities, self-shadows, and subsurface scattering.

Deep intelligence: a four-stage deep network for accurate brain tumor segmentation

Nirmala Paramanandham, Kishore Rajendiran, L Pavithra, Mahendra Singh Niranjan +3 more

2025· Scientific Reportsdoi:10.1038/s41598-025-18879-x

Image segmentation is an essential research field in image processing that has developed from traditional processing techniques to modern deep learning methods. In medical image processing, the primary goal of the segmentation process is to segment organs, lesions or tumors. Segmentation of tumors in the brain is a difficult task due to the vast variations in the intensity and size of gliomas. Clinical segmentation typically requires a high-quality image with relevant features and domain experts for the best results. Due to this, automatic segmentation is a necessity in modern society since gliomas are considered highly malignant. Encoder-decoder-based structures, as popular as they are, have some areas where the research is still in progress, like reducing the number of false positives and false negatives. Sometimes these models also struggled to capture the finest boundaries, producing jagged or inaccurate boundaries after segmentation. This research article introduces a novel and efficient method for segmenting out the tumorous region in brain images to overcome the research gap of the recent state-of-the-art deep learning-based segmentation approaches. The proposed 4-staged 2D-VNET + + is an efficient deep learning tumor segmentation network that introduces a context-boosting framework and a custom loss function to accomplish the task. The results show that the proposed model gives a Dice score of 99.287, Jaccard similarity index of 99.642 and a Tversky index of 99.743, all of which outperform the recent state-of-the-art techniques like 2D-VNet, Attention ResUNet with Guided Decoder (ARU-GD), MultiResUNet, 2D UNet, Link Net, TransUNet and 3D-UNet.

Search all NobleBlocks papers mentioning “Max Planck Center for Visual Computing and Communication” →