
Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute
facilityBerlin, State of Berlin, Germany
Research output, citation impact, and the most-cited recent papers from Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute (Germany). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute
H.264/AVC is newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goals of the H.264/AVC standardization effort have been enhanced compression performance and provision of a "network-friendly" video representation addressing "conversational" (video telephony) and "nonconversational" (storage, broadcast, or streaming) applications. H.264/AVC has achieved a significant improvement in rate-distortion efficiency relative to existing standards. This article provides an overview of the technical features of H.264/AVC, describes profiles and applications for the standard, and outlines the history of the standardization process.
High Efficiency Video Coding (HEVC) is currently being prepared as the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The main goal of the HEVC standardization effort is to enable significantly improved compression performance relative to existing standards-in the range of 50% bit-rate reduction for equal perceptual video quality. This paper provides an overview of the technical features and characteristics of the HEVC standard.
Understanding and interpreting classification decisions of automated image classification systems is of high value in many applications, as it allows to verify the reasoning of the system and provides additional information to the human expert. Although machine learning methods are solving very successfully a plethora of tasks, they have in most cases the disadvantage of acting as a black box, not providing any information about what made them arrive at a particular decision. This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers. We introduce a methodology that allows to visualize the contributions of single pixels to predictions for kernel-based classifiers over Bag of Words features and for multilayered neural networks. These pixel contributions can be visualized as heatmaps and are provided to a human expert who can intuitively not only verify the validity of the classification decision, but also focus further analysis on regions of potential interest. We evaluate our method for classifiers trained on PASCAL VOC 2009 images, synthetic image data containing geometric shapes, the MNIST handwritten digits data set and for the pre-trained ImageNet model available as part of the Caffe open source package.
With the introduction of the H.264/AVC video coding standard, significant improvements have recently been demonstrated in video compression capability. The Joint Video Team of the ITU-T VCEG and the ISO/IEC MPEG has now also standardized a Scalable Video Coding (SVC) extension of the H.264/AVC standard. SVC enables the transmission and decoding of partial bit streams to provide video services with lower temporal or spatial resolutions or reduced fidelity while retaining a reconstruction quality that is high relative to the rate of the partial bit streams. Hence, SVC provides functionalities such as graceful degradation in lossy transmission environments as well as bit rate, format, and power adaptation. These functionalities provide enhancements to transmission and storage applications. SVC has achieved significant improvements in coding efficiency with an increased degree of supported scalability relative to the scalable profiles of prior video coding standards. This paper provides an overview of the basic concepts for extending H.264/AVC towards SVC. Moreover, the basic tools for providing temporal, spatial, and quality scalability are described in detail and experimentally analyzed regarding their efficiency and complexity.
A unified approach to the coder control of video coding standards such as MPEG-2, H.263, MPEG-4, and the draft video coding standard H.264/AVC (advanced video coding) is presented. The performance of the various standards is compared by means of PSNR and subjective testing results. The results indicate that H.264/AVC compliant encoders typically achieve essentially the same reproduction quality as encoders that are compliant with the previous standards while typically requiring 60% or less of the bit rate.
This paper provides an entry point to the problem of interpreting a deep neural network model and explaining its predictions. It is based on a tutorial given at ICASSP 2017. As a tutorial paper, the set of methods covered here is not exhaustive, but sufficiently representative to discuss a number of questions in interpretability, technical challenges, and possible applications. The second part of the tutorial focuses on the recently proposed layer-wise relevance propagation (LRP) technique, for which we provide theory, recommendations, and tricks, to make most efficient use of it on real data.
The shapes and sizes of platinum nanoparticles were controlled by changes in the ratio of the concentration of the capping polymer material to the concentration of the platinum cations used in the reductive synthesis of colloidal particles in solution at room temperature. Tetrahedral, cubic, irregular-prismatic, icosahedral, and cubo-octahedral particle shapes were observed, whose distribution was dependent on the concentration ratio of the capping polymer material to the platinum cation. Controlling the shape of platinum nanoparticles is potentially important in the field of catalysis.
There is considerable pressure to define the key requirements of 5G, develop 5G standards, and perform technology trials as quickly as possible. Normally, these activities are best done in series but there is a desire to complete these tasks in parallel so that commercial deployments of 5G can begin by 2020. 5G will not be an incremental improvement over its predecessors; it aims to be a revolutionary leap forward in terms of data rates, latency, massive connectivity, network reliability, and energy efficiency. These capabilities are targeted at realizing high-speed connectivity, the Internet of Things, augmented virtual reality, the tactile internet, and so on. The requirements of 5G are expected to be met by new spectrum in the microwave bands (3.3-4.2 GHz), and utilizing large bandwidths available in mm-wave bands, increasing spatial degrees of freedom via large antenna arrays and 3-D MIMO, network densification, and new waveforms that provide scalability and flexibility to meet the varying demands of 5G services. Unlike the one size fits all 4G core networks, the 5G core network must be flexible and adaptable and is expected to simultaneously provide optimized support for the diverse 5G use case categories. In this paper, we provide an overview of 5G research, standardization trials, and deployment challenges. Due to the enormous scope of 5G systems, it is necessary to provide some direction in a tutorial article, and in this overview, the focus is largely user centric, rather than device centric. In addition to surveying the state of play in the area, we identify leading technologies, evaluating their strengths and weaknesses, and outline the key challenges ahead, with research test beds delivering promising performance but pre-commercial trials lagging behind the desired 5G targets.
Nanoparticles of CdTe were found to spontaneously reorganize into crystalline nanowires upon controlled removal of the protective shell of organic stabilizer. The intermediate step in the nanowire formation was found to be pearl-necklace aggregates. Strong dipole-dipole interaction is believed to be the driving force of nanoparticle self-organization. The linear aggregates subsequently recrystallized into nanowires whose diameter was determined by the diameter of the nanoparticles. The produced nanowires have high aspect ratio, uniformity, and optical activity. These findings demonstrate the collective behavior of nanoparticles as well as a convenient, simple technique for production of one-dimensional semiconductor colloids suitable for subsequent processing into quantum-confined superstructures, materials, and devices.
Versatile Video Coding (VVC) was finalized in July 2020 as the most recent international video coding standard. It was developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) to serve an ever-growing need for improved video compression as well as to support a wider variety of today’s media content and emerging applications. This paper provides an overview of the novel technical features for new applications and the core compression technologies for achieving significant bit rate reductions in the neighborhood of 50% over its predecessor for equal video quality, the High Efficiency Video Coding (HEVC) standard, and 75% over the currently most-used format, the Advanced Video Coding (AVC) standard. It is explained how these new features in VVC provide greater versatility for applications. Highlighted applications include video with resolutions beyond standard- and high-definition, video with high dynamic range and wide color gamut, adaptive streaming with resolution changes, computer-generated and screen-captured video, ultralow-delay streaming, 360° immersive video, and multilayer coding e.g., for scalability. Furthermore, early implementations are presented to show that the new VVC standard is implementable and ready for real-world deployment.
Nonlinear methods such as Deep Neural Networks (DNNs) are the gold standard for various challenging machine learning problems such as image recognition. Although these methods perform impressively well, they have a significant disadvantage, the lack of transparency, limiting the interpretability of the solution and thus the scope of application in practice. Especially DNNs act as black boxes due to their multilayer nonlinear structure. In this paper we introduce a novel methodology for interpreting generic multilayer neural networks by decomposing the network classification decision into contributions of its input elements. Although our focus is on image classification, the method is applicable to a broad set of input data, learning tasks and network architectures. Our method called deep Taylor decomposition efficiently utilizes the structure of the network by backpropagating the explanations from the output to the input layer. We evaluate the proposed method empirically on the MNIST and ILSVRC data sets.
With the broader and highly successful usage of machine learning (ML) in industry and the sciences, there has been a growing demand for explainable artificial intelligence (XAI). Interpretability and explanation methods for gaining a better understanding of the problem-solving abilities and strategies of nonlinear ML, in particular, deep neural networks, are, therefore, receiving increased attention. In this work, we aim to: 1) provide a timely overview of this active emerging field, with a focus on “post hoc” explanations, and explain its theoretical foundations; 2) put interpretability algorithms to a test both from a theory and comparative evaluation perspective using extensive simulations; 3) outline best practice aspects, i.e., how to best include interpretation methods into the standard usage of ML; and 4) demonstrate successful usage of XAI in a representative selection of application scenarios. Finally, we discuss challenges and possible future directions of this exciting foundational field of ML.
The compression capability of several generations of video coding standards is compared by means of peak signal-to-noise ratio (PSNR) and subjective testing results. A unified approach is applied to the analysis of designs, including H.262/MPEG-2 Video, H.263, MPEG-4 Visual, H.264/MPEG-4 Advanced Video Coding (AVC), and High Efficiency Video Coding (HEVC). The results of subjective tests for WVGA and HD sequences indicate that HEVC encoders can achieve equivalent subjective reproduction quality as encoders that conform to H.264/MPEG-4 AVC when using approximately 50% less bit rate on average. The HEVC design is shown to be especially effective for low bit rates, high-resolution video content, and low-delay communication applications. The measured subjective improvement somewhat exceeds the improvement measured by the PSNR metric.
We address the problem of joint downlink beamforming in a power-controlled network, where independent data streams are to be transmitted from a multiantenna base station to several decentralized single-antenna terminals. The total transmit power is limited and channel information (possibly statistical) is available at the transmitter. The design goal: jointly adjust the beamformers and transmission powers according to individual SINR requirements. In this context, there are two closely related optimization problems. P1: maximize the jointly achievable SINR margin under a total power constraint. P2: minimize the total transmission power while satisfying a set of SINR constraints. In this paper, both problems are solved within a unified analytical framework. Problem P1 is solved by minimizing the maximal eigenvalue of an extended crosstalk matrix. The solution provides a necessary and sufficient condition for the feasibility of the SINR requirements. Problem P2 is a variation of problem P1. An important step in our analysis is to show that the global optimum of the downlink beamforming problem is equivalently obtained from solving a dual uplink problem, which has an easier-to-handle analytical structure. Then, we make use of the special structure of the extended crosstalk matrix to develop a rapidly converging iterative algorithm. The optimality and global convergence of the algorithm is proven and stopping criteria are given.
Context-based adaptive binary arithmetic coding (CABAC) as a normative part of the new ITU-T/ISO/IEC standard H.264/AVC for video compression is presented. By combining an adaptive binary arithmetic coding technique with context modeling, a high degree of adaptation and redundancy reduction is achieved. The CABAC framework also includes a novel low-complexity method for binary arithmetic coding and probability estimation that is well suited for efficient hardware and software implementations. CABAC significantly outperforms the baseline entropy coding method of H.264/AVC for the typical area of envisaged target applications. For a set of test sequences representing typical material used in broadcast applications and for a range of acceptable video quality of about 30 to 38 dB, average bit-rate savings of 9%-14% are achieved.
The term "artificial intelligence" (AI) refers to the idea of machines being capable of performing human tasks. A subdomain of AI is machine learning (ML), which "learns" intrinsic statistical patterns in data to eventually cast predictions on unseen data. Deep learning is a ML technique using multi-layer mathematical operations for learning and inferring on complex data like imagery. This succinct narrative review describes the application, limitations and possible future of AI-based dental diagnostics, treatment planning, and conduct, for example, image analysis, prediction making, record keeping, as well as dental research and discovery. AI-based applications will streamline care, relieving the dental workforce from laborious routine tasks, increasing health at lower costs for a broader population, and eventually facilitate personalized, predictive, preventive, and participatory dentistry. However, AI solutions have not by large entered routine dental practice, mainly due to 1) limited data availability, accessibility, structure, and comprehensiveness, 2) lacking methodological rigor and standards in their development, 3) and practical questions around the value and usefulness of these solutions, but also ethics and responsibility. Any AI application in dentistry should demonstrate tangible value by, for example, improving access to and quality of care, increasing efficiency and safety of services, empowering and enabling patients, supporting medical research, or increasing sustainability. Individual privacy, rights, and autonomy need to be put front and center; a shift from centralized to distributed/federated learning may address this while improving scalability and robustness. Lastly, trustworthiness into, and generalizability of, dental AI solutions need to be guaranteed; the implementation of continuous human oversight and standards grounded in evidence-based dentistry should be expected. Methods to visualize, interpret, and explain the logic behind AI solutions will contribute ("explainable AI"). Dental education will need to accompany the introduction of clinical AI solutions by fostering digital literacy in the future dental workforce.
Deep neural networks (DNNs) have demonstrated impressive performance in complex machine learning tasks such as image classification or speech recognition. However, due to their multilayer nonlinear structure, they are not transparent, i.e., it is hard to grasp what makes them arrive at a particular classification or recognition decision, given a new unseen data sample. Recently, several approaches have been proposed enabling one to understand and interpret the reasoning embodied in a DNN for a single test image. These methods quantify the “importance” of individual pixels with respect to the classification decision and allow a visualization in terms of a heatmap in pixel/input space. While the usefulness of heatmaps can be judged subjectively by a human, an objective quality measure is missing. In this paper, we present a general methodology based on region perturbation for evaluating ordered collections of pixels such as heatmaps. We compare heatmaps computed by three different methods on the SUN397, ILSVRC2012, and MIT Places data sets. Our main result is that the recently proposed layer-wise relevance propagation algorithm qualitatively and quantitatively provides a better explanation of what made a DNN arrive at a particular classification decision than the sensitivity-based approach or the deconvolution method. We provide theoretical arguments to explain this result and discuss its practical implications. Finally, we investigate the use of heatmaps for unsupervised assessment of the neural network performance.
Electrocardiography (ECG) is a key non-invasive diagnostic tool for cardiovascular diseases which is increasingly supported by algorithms based on machine learning. Major obstacles for the development of automatic ECG interpretation algorithms are both the lack of public datasets and well-defined benchmarking procedures to allow comparison s of different algorithms. To address these issues, we put forward PTB-XL, the to-date largest freely accessible clinical 12-lead ECG-waveform dataset comprising 21837 records from 18885 patients of 10 seconds length. The ECG-waveform data was annotated by up to two cardiologists as a multi-label dataset, where diagnostic labels were further aggregated into super and subclasses. The dataset covers a broad range of diagnostic classes including, in particular, a large fraction of healthy records. The combination with additional metadata on demographics, additional diagnostic statements, diagnosis likelihoods, manually annotated signal properties as well as suggested folds for splitting training and test sets turns the dataset into a rich resource for the development and the evaluation of automatic ECG interpretation algorithms.
We present a deep neural network-based approach to image quality assessment (IQA). The network is trained end-to-end and comprises ten convolutional layers and five pooling layers for feature extraction, and two fully connected layers for regression, which makes it significantly deeper than related IQA models. Unique features of the proposed architecture are that: 1) with slight adaptations it can be used in a no-reference (NR) as well as in a full-reference (FR) IQA setting and 2) it allows for joint learning of local quality and local weights, i.e., relative importance of local quality to the global quality estimate, in an unified framework. Our approach is purely data-driven and does not rely on hand-crafted features or other types of prior domain knowledge about the human visual system or image statistics. We evaluate the proposed approach on the LIVE, CISQ, and TID2013 databases as well as the LIVE In the wild image quality challenge database and show superior performance to state-of-the-art NR and FR IQA methods. Finally, cross-database evaluation shows a high ability to generalize between different databases, indicating a high robustness of the learned features.
Coordinated multipoint or cooperative MIMO is one of the promising concepts to improve cell edge user data rate and spectral efficiency beyond what is possible with MIMOOFDM in the first versions of LTE or WiMAX. Interference can be exploited or mitigated by cooperation between sectors or different sites. Significant gains can be shown for both the uplink and downlink. A range of technical challenges were identified and partially addressed, such as backhaul traffic, synchronization and feedback design. This article also shows the principal feasibility of COMP in two field testbeds with multiple sites and different backhaul solutions between the sites. These activities have been carried out by a powerful consortium consisting of universities, chip manufacturers, equipment vendors, and network operators.