NobleBlocks

Intel (Poland)

companyGdańsk, Poland

Research output, citation impact, and the most-cited recent papers from Intel (Poland) (Poland). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works
132
Citations
1.8K
h-index
19
i10-index
41
Also known as
Intel (Poland)Intel Corporation

Top-cited papers from Intel (Poland)

Deep learning-based waste detection in natural and urban environments
Sylwia Majchrowska, Agnieszka Mikołajczyk, Maria Ferlin, Zuzanna Klawikowska +3 more
2021· Waste Management278doi:10.1016/j.wasman.2021.12.001

Waste pollution is one of the most significant environmental issues in the modern world. The importance of recycling is well known, both for economic and ecological reasons, and the industry demands high efficiency. Current studies towards automatic waste detection are hardly comparable due to the lack of benchmarks and widely accepted standards regarding the used metrics and data. Those problems are addressed in this article by providing a critical analysis of over ten existing waste datasets and a brief but constructive review of the existing Deep Learning-based waste detection approaches. This article collects and summarizes previous studies and provides the results of authors’ experiments on the presented datasets, all intended to create a first replicable baseline for litter detection. Moreover, new benchmark datasets detect-waste and classify-waste are proposed that are merged collections from the above-mentioned open-source datasets with unified annotations covering all possible waste categories: bio, glass, metal and plastic, non-recyclable, other, paper, and unknown. Finally, a two-stage detector for litter localization and classification is presented. EfficientDet-D2 is used to localize litter, and EfficientNet-B2 to classify the detected waste into seven categories. The classifier is trained in a semi-supervised fashion making the use of unlabeled images. The proposed approach achieves up to 70% of average precision in waste detection and around 75% of classification accuracy on the test dataset. The code and annotations used in the studies are publicly available online1.

A Customizable Matrix Multiplication Framework for the Intel HARPv2 Xeon+FPGA Platform
Duncan J. M. Moss, Srivatsan Krishnan, Eriko Nurvitadhi, Piotr Ratuszniak +4 more
201883doi:10.1145/3174243.3174258

General Matrix to Matrix multiplication (GEMM) is the cornerstone for a wide gamut of applications in high performance computing (HPC), scientific computing (SC) and more recently, deep learning. In this work, we present a customizable matrix multiplication framework for the Intel HARPv2 CPU+FPGA platform that includes support for both traditional single precision floating point and reduced precision workloads. Our framework supports arbitrary size GEMMs and consists of two parts: (1) a simple application programming interface (API) for easy configuration and integration into existing software and (2) a highly customizable hardware template. The API provides both compile and runtime options for controlling key aspects of the hardware template including dynamic precision switching; interleaving and block size control; and fused deep learning specific operations. The framework currently supports single precision floating point (FP32), 16, 8, 4 and 2 bit Integer and Fixed Point (INT16, INT8, INT4, INT2) and more exotic data types for deep learning workloads: INT16xTernary, INT8xTernary, BinaryxBinary.

Proceedings of the Detection and Classification of Acoustic Scenes and Events 2019 Workshop (DCASE2019)
Mandel, Michael, Salamon, Justin, Ellis, Daniel P.W.
201974doi:10.33682/1syg-dy60

In this paper, we describe our method for DCASE2019 task3: Sound Event Localization and Detection (SELD). We use four CRNN SELDnet-like single output models which run in a consecutive manner to recover all possible information of occurring events. We decompose the SELD task into estimating number of active sources, estimating direction of arrival of a single source, estimating direction of arrival of the second source where the direction of the first one is known and a multi-label classification task. We use custom consecutive ensemble to predict events' onset, offset, direction of arrival and class. The proposed approach is evaluated on the TAU Spatial Sound Events 2019 - Ambisonic and it is compared with other participants' submissions.

Evaluation of Docker Containers for Scientific Workloads in the Cloud
Pankaj Saha, Angel Beltre, Piotr W. Umiński, Madhusudhan Govindaraju
2018· Proceedings of the Practice and Experience on Advanced Research Computing63doi:10.1145/3219104.3229280

The HPC community is actively researching and evaluating tools to support execution of scientific applications in cloud-based environments. Among the various technologies, containers have recently gained importance as they have significantly better performance compared to full-scale virtualization, support for microservices and DevOps, and work seamlessly with workflow and orchestration tools. Docker is currently the leader in containerization technology because it offers low overhead, flexibility, portability of applications, and reproducibility. Singularity is another container solution that is of interest as it is designed specifically for scientific applications. It is important to conduct performance and feature analysis of the container technologies to understand their applicability for each application and target execution environment. <br>This paper presents a (1) performance evaluation of Docker and Singularity on bare metal nodes in the Chameleon cloud (2) mechanism by which Docker containers can be mapped with InfiniBand hardware with RDMA communication and (3) analysis of mapping elements of parallel workloads to the containers for optimal resource management with container-ready orchestration tools. <br>Our experiments are targeted toward application developers so that they can make informed decisions on choosing the container technologies and approaches that are suitable for their HPC workloads on cloud infrastructure. Our performance analysis shows that scientific workloads for both Docker and Singularity based containers can achieve near-native performance. <br>Singularity is designed specifically for HPC workloads. However, Docker still has advantages over Singularity for use in clouds as it provides overlay networking and an intuitive way to run MPI applications with one container per rank for fine-grained resources allocation. Both Docker and Singularity make it possible to directly use the underlying network fabric from the containers for coarse-grained resource allocation.

Simulation and Experiment of a Compact Wideband 90$^{\circ}$ Differential Phase Shifter
Michal Sorn, Rafał Lech, J. Mazur
2011· IEEE Transactions on Microwave Theory and Techniques59doi:10.1109/tmtt.2011.2175244

A compact wideband 90° differential phase shifter is developed by modifying ports termination in the Abbosh phase-shifter configuration. This novel phase-shifter arrangement consists of a 3-dB directional coupler with the coupled and transmission ports terminated with reactive loads. The proper reactance is found at the input of the coupled line section in which the remaining ports are open circuited. Both coupled sections utilize a multilayer broadside coupling microstrip-slot-microstrip tight coupler. A theoretical model is presented to explain the performance of the proposed phase shifter and design procedure. Further, the phase shifter was designed and manufactured. Results of calculation and measurement show that the developed circuit provides a 90° differential phase shift with deviation less than ±4° across the 3-7-GHz frequency band.

Multi step structural health monitoring approaches in debonding assessment in a sandwich honeycomb composite structure using ultrasonic guided waves
Kaleeswaran Balasubramaniam, Shirsendu Sikdar, Rohan Soman, Paweł Malinowski
2022· Measurement48doi:10.1016/j.measurement.2022.111057

This paper aims to investigate the use of ultrasonic guided wave (GW) propagation mechanism and the assessment of debonding in a sandwich composite structure (SCS) using a multi-step approach. Towards this, a series of GW propagation-based laboratory experiments and numerical simulations have been carried out on the SCS sample. The debonding regions of variable size and locations were assessed using a pre-defined network of piezoelectric lead zirconate transducers (PZT). Besides, several artificial masses were also placed in the SCS to validate the multi-step structural health monitoring (SHM) strategy. The SHM approach uses a proposed quick damage identification matrix maps and an improved elliptical wave processing (EWP) strategy of the registered GW signals to detect the locations of debonding and other damages in the SCS. The benefit of the proposed damage identification map is to locate the damaged area (sectors) quickly. This identification step is followed by applying the damage localization step using the improved EWP only on the previously identified damage sector region. The proposed EWP has shown the potential to effectively locate the hidden multiple debonding regions and damages in the SCS with a reduced number of calculations using a step-wise approach that uses only a selected number of grid points. The paper shows the effectiveness of the proposed approach based on data gathered from numerical simulations and experimental studies. Thus, using the above-mentioned SHM strategy debondings and damages present within and outside the sensor network are localized. The results were cross verified with nondestructive testing (NDT) methods such as infrared thermography and laser Doppler vibrometry.

Intel Nervana Neural Network Processor-T (NNP-T) Fused Floating Point Many-Term Dot Product
Brian Hickmann, Jieasheng Chen, Michael Rotzin, Andrew Yang +2 more
202046doi:10.1109/arith48897.2020.00029

Intel’s Nervana Neural Network Processor for Training (NNP-T) contains at its core an advanced floating point dot product design to accelerate the matrix multiplication operations found in many AI applications. Each Matrix Processing Unit (MPU) on the Intel NNP-T can process a 32x32 BFloat16 matrix multiplication every 32 cycles, accumulating the result in single precision (FP32). To reduce hardware costs, the MPU uses a fused many-term floating point dot product design with block alignment of the many input terms during addition, resulting in a unique datapath with several interesting design trade-offs. In this paper, we describe the details of the MPU pipeline, discuss the trade-offs made in the design, and present information on the accuracy of the computation as compared to traditional FMA implementations.

The Intel neuromorphic DNS challenge
Jonathan Timcheck, Sumit Bam Shrestha, Daniel Ben Dayan Rubin, Adam Kupryjanow +4 more
2023· Neuromorphic Computing and Engineering31doi:10.1088/2634-4386/ace737

Abstract A critical enabler for progress in neuromorphic computing research is the ability to transparently evaluate different neuromorphic solutions on important tasks and to compare them to state-of-the-art conventional solutions. The Intel Neuromorphic Deep Noise Suppression Challenge (Intel N-DNS Challenge), inspired by the Microsoft DNS Challenge, tackles a ubiquitous and commercially relevant task: real-time audio denoising. Audio denoising is likely to reap the benefits of neuromorphic computing due to its low-bandwidth, temporal nature and its relevance for low-power devices. The Intel N-DNS Challenge consists of two tracks: a simulation-based algorithmic track to encourage algorithmic innovation, and a neuromorphic hardware (Loihi 2) track to rigorously evaluate solutions. For both tracks, we specify an evaluation methodology based on energy, latency, and resource consumption in addition to output audio quality. We make the Intel N-DNS Challenge dataset scripts and evaluation code freely accessible, encourage community participation with monetary prizes, and release a neuromorphic baseline solution which shows promising audio quality, high power efficiency, and low resource consumption when compared to Microsoft NsNet2 and a proprietary Intel denoising model used in production. We hope the Intel N-DNS Challenge will hasten innovation in neuromorphic algorithms research, especially in the area of training tools and methods for real-time signal processing. We expect the winners of the challenge will demonstrate that for problems like audio denoising, significant gains in power and resources can be realized on neuromorphic devices available today compared to conventional state-of-the-art solutions.

A global-local damage localization and quantification approach in composite structures using ultrasonic guided waves and active infrared thermography
Kaleeswaran Balasubramaniam, Shirsendu Sikdar, Dominika Ziaja, Michał Jurek +2 more
2023· Smart Materials and Structures28doi:10.1088/1361-665x/acb578

Abstract The paper emphasizes an effective quantification of hidden damage in composite structures using ultrasonic guided wave (GW) propagation-based structural health monitoring (SHM) and an artificial neural network (ANN) based active infrared thermography (IRT) analysis. In recent years, there has been increased interest in using a global-local approach for damage localization purposes. The global approach is mainly used in identifying the damage, while the local approach is quantifying. This paper presents a proof-of-study to use such a global-local approach in damage localization and quantification. The main novelties in this paper are the implementation of an improved SHM GW algorithm to localize the damages, a new pixel-based confusion matrix to quantify the size of the damage threshold, and a newly developed IRT-ANN algorithm to validate the damage quantification. From the SHM methodology, it is realized that only three sensors are sufficient to localize the damage, and an ANN- IRT imaging algorithm with only five hidden neurons in quantifying the damage. The robust SHM methods effectively identified, localized, and quantified the different damage dimensions against the non-destructive testing-IRT method in different composite structures.

Intelligent Visual Quality Control System Based on Convolutional Neural Networks for Holonic Shop Floor Control of Industry 4.0 Manufacturing Systems
Przemysław Oborski, Przemysław Wysocki
2022· Advances in Science and Technology – Research Journal21doi:10.12913/22998624/145503

The article presents research on industrial quality control system based on AI deep learning method. They are a part of larger project focusing on development of Holonic Shop Floor Control System for integration of machines, machine operators and manufacturing process monitoring with information fl ow in whole production process according to Industry 4.0 requirements. A system connecting together machine operators, machine control, process and machine monitoring with companywide IT systems is developed. It is an answer on manufacture of airplane industry requirements. The main aim of the system is full automation of information fl ow between a management level and manufacturing process level. Intelligent, fl exible quality control system allowing for active manufacturing optimization on the base of achieved results as well as a historical data collection for further Big Data analysis is the main aim of the current research. During research number of selected AI algorithms were tested for assessing their suitability for performing tasks identifi ed in real manufacturing environment. As a result of the conducted analyzes, Convolutional Neural Networks were selected for further study. Number of built Convolutional Neural Networks algorithms were tested using sets of data and photos from the production line. A further step of research will be focused on testing a system in real manufacturing process for able possible construct a fully functional quality control system based on the use of Convolutional Neural Networks.

Deep Learning Optimization for Edge Devices: Analysis of Training Quantization Parameters
Alicja Kwaśniewska, Maciej Szankin, Mateusz Ozga, Jason Wolfe +4 more
201921doi:10.1109/iecon.2019.8927153

This paper focuses on convolution neural network quantization problem. The quantization has a distinct stage of data conversion from floating-point into integer-point numbers. In general, the process of quantization is associated with the reduction of the matrix dimension via limited precision of the numbers. However, the training and inference stages of deep learning neural network are limited by the space of the memory and a variety of factors including programming complexity and even reliability of the system. On the whole the process of quantization becomes more and more popular due to significant impact on performance and minimal accuracy loss. Various techniques for networks quantization have been already proposed, including quantization aware training and integer arithmetic-only inference. Yet, a detailed comparison of various quantization configurations, combining all proposed methods haven't been presented yet. This comparison is important to understand selection of quantization hyperparameters during training to optimize networks for inference while preserving their robustness. In this work, we perform in-depth analysis of parameters in the quantization aware training, the process of simulating precision loss in the forward pass by quantizing and dequantizing tensors. Specifically, we modify rounding modes, input preprocessing, output data signedness, bitwidth of the quantization and locations of precision loss simulation to evaluate how they affect accuracy of deep neural network aimed at performing efficient calculations on resource-constrained devices.

Multi-agent large-scale parallel crowd simulation
Artur Malinowski, Paweł Czarnul, Krzysztof Czuryƚo, Maciej Maciejewski +1 more
2017· Procedia Computer Science19doi:10.1016/j.procs.2017.05.036

This paper presents design, implementation and performance results of a new modular, parallel, agent-based and large scale crowd simulation environment. A parallel application, implemented with C and MPI, was implemented and run in this parallel environment for simulation and visualization of an evacuation scenario at Gdansk University of Technology, Poland and further in the area of districts of Gdansk. The application uses a parallel MPI I/O run on two different clusters and a two or three node Parallel File System (PFS) to store a current state in a file. In order to make this implementation efficient, we used our previously developed and tuned Byte-addressable Non-volatile RAM Distributed Cache - a solution that allows to access small data chunks from spread locations within a file efficiently. We have presented application execution times versus the number of agents (up to 100000), versus the number of simulation iterations (up to 25000), versus map size (up to 6km2) and versus the number of processes (up to more than 650) showing high speed-ups.

A Survey on Moving Target Defense for Networks: A Practical View
Łukasz Jalowski, Marek Zmuda, Mariusz Rawski
2022· Electronics18doi:10.3390/electronics11182886

The static nature of many of currently used network systems has multiple practical benefits, including cost optimization and ease of deployment, but it makes them vulnerable to attackers who can observe from the shadows to gain insight before launching a devastating attack against the infrastructure. Moving target defense (MTD) is one of the emerging areas that promises to protect against this kind of attack by continuously shifting system parameters and changing the attack surface of protected systems. The emergence of network functions virtualization (NFV) and software-defined networking (SDN) technology allows for the implementation of very sophisticated MTD techniques. Furthermore, the introduction of such solutions as field-programmable gate array (FPGA) programmable acceleration cards makes it possible to take the MTD concept to the next level. Applying hardware acceleration to existing concepts or developing new, dedicated methods will offer more robust, efficient, and secure solutions. However, to the best of the authors’ knowledge, there are still no major implementations of MTD schemes inside large-scale networks. This survey aims to understand why, by analyzing research made in the field of MTD to show current pitfalls and possible improvements that need to be addressed in future proposals to make MTD a viable solution to address current cybersecurity threats in real-life scenarios.

Scaling scrum with a customized nexus framework: A report from a joint <scp>industry‐academia</scp> research project
Andrzej Joskowski, Adam Przybyłek, Bartosz Marcinkowski
2023· Software Practice and Experience16doi:10.1002/spe.3201

Abstract Despite a wide range of scaling frameworks available, large‐scale agile transformations are not straightforward undertakings. Few organizations have structures in place that fit the predefined workflows – while once one applies an off‐the‐shelf framework outside of its prescribed process, guidance quickly runs out. In this paper, we demonstrate how to instantiate a method configuration process using a lightweight experimental approach embedded in Action Research cycles. The proposed approach was developed to assist practitioners working on a multiple‐team project at Intel Technology Poland to find the right practices to continue their Nexus‐based transformation and integrate their in‐house method into the already established company structures, processes, and routines. In particular, it enabled identifying a series of challenges with scaled practices and coping with those. The challenges ranged from logistical problems, through poor availability of the Product Owner, to lackluster knowledge transfer and a wide array of communication/coordination issues at meetings. The study broadens the current body of knowledge within technology management and the scaled agile method‐tailoring domain. It indicates potential corrective actions that may be taken advantage of by entities that are not inclined, due to organizational constraints, to directly implement an off‐the‐shelf framework. Furthermore, our study demonstrates that a gradual transition to large‐scale agile at the project level (1) is possible with the preservation of traditional command‐and‐control management practices; (2) requires neither middle management involvement nor upfront investment; and (3) does not need to disrupt the continuous delivery of the product.

Damage detection and localization based on different types of actuators using the electromechanical impedance method in 3D-printed material
Shishir Kumar Singh, Mohammad Ali Fakih, Paweł Malinowski
2023· Smart Materials and Structures15doi:10.1088/1361-665x/acfa7e

Abstract Electromechanical impedance (EMI) measurement, using piezoelectric transducers (PZTs) in the high-frequency range is a potential method for assessing the health of lightweight structures. The major objective of this work is to comprehend how different actuators react to damage in additively manufactured (AM) polymer structures. A novel frequency-range selection technique was suggested based on the maxima of the standard deviation of the impedance frequency spectra gathered for the referential and damage cases. A 3D-printed acrylonitrile butadiene styrene (ABS) plate was used for the investigation, where two PZT and one macro fiber composite (MFC) actuator were glued to the surface. Small magnets were used to simulate damage and were positioned at increasing distances from each transducer as EMI measurements were made using the MFC and 1 PZT. This served both in studying the transducers’ sensitivity to damage and selecting the proper frequency range for damage detection utilizing the standard-deviation approach. The EMI-acquired data from the MFC actuator displays damage-sensitive peaks in a low-frequency band (0–58 <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:mrow> <mml:mtext>kHz</mml:mtext> </mml:mrow> </mml:math> ), while the PZT shows a good sensitivity in a higher frequency range (94–304 <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" overflow="scroll"> <mml:mrow> <mml:mtext>kHz</mml:mtext> </mml:mrow> </mml:math> ). In order to evaluate the PZT and MFC actuators’ sensitivity to damage in the 3D-printed ABS plate, impact damage is also generated in the plate’s center. The impedance-based damage indices obtained from different types of PZTs (2 PZTs and 1 MFC) were projected to the same base level and then fused—for the first time—for impact-damage localization and further added magnetic mass damage localization. The obtained damage index values of impedance are encouraging for the evaluation of AM polymer structures with a 4.48 mm positional error from a real location by fusing data in the different frequency ranges for PZTs and MFC. The damage localization error increases significantly to the new location beyond the damage sensitivity range of the PZT2 and MFC for the added magnetic mass on the 3D printed structure.

Detection of the First Component of the Received LTE Signal in the OTDoA Method
Paweł Gadka, Jarosław Sadowski, Jacek Stefański
2019· Wireless Communications and Mobile Computing13doi:10.1155/2019/2708684

In a modern world there is a growing demand for localization services of various kinds. Position estimation can be realized via cellular networks, especially in the currently widely deployed LTE ( Long Term Evolution ) networks. However, it is not an easy task in harsh propagation conditions which often occur in dense urban environments. Recently, time-methods of terminal localization within the network have been the focus of attention, with the OTDoA ( Observed Time Difference of Arrival ) method in particular. One of the main factors influencing the accuracy of location estimation in the OTDoA method is the nature of the propagation channel that affects the ease of isolating the signal component travelling from the transmitter to the receiver through the shortest path. To obtain the smallest possible localization error, it is necessary to detect the first received component of the useful signal. This aim could be achieved by using a proper algorithm within the receiver. This paper proposes a new algorithm for effective detecting of the first component of the LTE downlink signal in the multipath environment. In a mobile terminal location estimation process, CSRS ( Cell Specific Reference Signal ) signals were used instead of dedicated PRS ( Positioning Reference Signal ) signals. New solution was verified during the measurement campaign in a real LTE network.

Implementation of an Agent-Based Parallel Tissue Modelling Framework for the Intel MIC Architecture
Maciej Cytowski, Zuzanna Szymańska, Piotr W. Umiński, Grzegorz Andrejczuk +1 more
2017· Scientific Programming13doi:10.1155/2017/8721612

Timothy is a novel large scale modelling framework that allows simulating of biological processes involving different cellular colonies growing and interacting with variable environment. Timothy was designed for execution on massively parallel High Performance Computing (HPC) systems. The high parallel scalability of the implementation allows for simulations of up to 10 9 individual cells (i.e., simulations at tissue spatial scales of up to 1 cm 3 in size). With the recent advancements of the Timothy model, it has become critical to ensure appropriate performance level on emerging HPC architectures. For instance, the introduction of blood vessels supplying nutrients to the tissue is a very important step towards realistic simulations of complex biological processes, but it greatly increased the computational complexity of the model. In this paper, we describe the process of modernization of the application in order to achieve high computational performance on HPC hybrid systems based on modern Intel® MIC architecture. Experimental results on the Intel Xeon Phi™ coprocessor x100 and the Intel Xeon Phi processor x200 are presented.

MobiBits: Multimodal Mobile Biometric Database
Ewelina Bartuzi, Katarzyna Roszczewska, Mateusz Trokielewicz, Radosław Białobrzeski
201813doi:10.23919/biosig.2018.8553108

This paper presents a novel database comprising representations of five different biometric characteristics, collected in a mobile, unconstrained or semi-constrained setting with three different mobile devices, including characteristics previously unavailable in existing datasets, namely hand images, thermal hand images, and thermal face images, all acquired with a mobile, off-the-shelf device. In addition to this collection of data we perform an extensive set of experiments providing insight on benchmark recognition performance that can be achieved with these data, carried out with existing commercial and academic biometric solutions. This is the first known to us mobile biometric database introducing samples of biometric traits such as thermal hand images and thermal face images. We hope that this contribution will make a valuable addition to the already existing databases and enable new experiments and studies in the field of mobile authentication. The MobiBits database is made publicly available to the research community at no cost for non-commercial purposes.

Checkpointing of Parallel MPI Applications Using MPI One-sided API with Support for Byte-addressable Non-volatile RAM
Piotr Dorożyński, Paweł Czarnul, Artur Malinowski, Krzysztof Czuryƚo +3 more
2016· Procedia Computer Science13doi:10.1016/j.procs.2016.05.295

The increasing size of computational clusters results in an increasing probability of failures, which in turn requires application checkpointing in order to survive those failures. Traditional checkpointing requires data to be copied from application memory into persistent storage medium, which increases application execution time as it is usually done in a separate step. In this paper we propose to use emerging byte-addressable non-volatile RAM (NVRAM) as a persistent storage medium and we analyze various methods of making consistent checkpoints with support of MPI one-sided API in order to minimize checkpointing overhead. We test our solution on two applications: HPCCG benchmark and PageRank algorithm. Our experiments showed that NVRAM based checkpointing performs much better than traditional disk based approach. We also simulated different possible latencies and bandwidth of future NVRAM and our experiments showed that only bandwidth had visible impact onto application execution time.

PLOC++
Carsten Benthin, Radoslaw Drabinski, L Tessari, Addis Dittebrandt
2022· Proceedings of the ACM on Computer Graphics and Interactive Techniques11doi:10.1145/3543867

We propose a novel version of the GPU-oriented massively parallel locally-ordered clustering (PLOC) algorithm for constructing bounding volume hierarchies (BVHs). Our method focuses on removing the weaknesses of the original approach by simplifying and fusing different phases, while replacing most performance critical parts by novel and more efficient algorithms. This combination allows for outperforming the original approach by a factor of 1.9 - 2.3x.