NobleBlocks

International Centre for Theoretical Physics Asia-Pacific

facilityBeijing, China

Research output, citation impact, and the most-cited recent papers from International Centre for Theoretical Physics Asia-Pacific (China). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works
732
Citations
28.1K
h-index
67
i10-index
399
Also known as
International Centre for Theoretical Physics Asia-Pacific国际理论物理中心-亚太地区

Top-cited papers from International Centre for Theoretical Physics Asia-Pacific

DianNao
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang +3 more
20141.3Kdoi:10.1145/2541940.2541967

Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope.

DianNao
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang +3 more
2014· ACM SIGARCH Computer Architecture News313doi:10.1145/2654822.2541967

Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm2 and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65 nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.

DianNao
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang +3 more
2014· ACM SIGPLAN Notices273doi:10.1145/2644865.2541967

Machine-Learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-the-art across many applications. As architectures evolve towards heterogeneous multi-cores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have focused on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02 mm2 and 485 mW; compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87x faster, and it can reduce the total energy by 21.08x. The accelerator characteristics are obtained after layout at 65 nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications.

The Irish potato famine pathogen <i>Phytophthora infestans</i> originated in central Mexico rather than the Andes
Erica M. Goss, Javier F. Tabima, David E. L. Cooke, Silvia Restrepo +4 more
2014· Proceedings of the National Academy of Sciences226doi:10.1073/pnas.1401884111

Phytophthora infestans is a destructive plant pathogen best known for causing the disease that triggered the Irish potato famine and remains the most costly potato pathogen to manage worldwide. Identification of P. infestan's elusive center of origin is critical to understanding the mechanisms of repeated global emergence of this pathogen. There are two competing theories, placing the origin in either South America or in central Mexico, both of which are centers of diversity of Solanum host plants. To test these competing hypotheses, we conducted detailed phylogeographic and approximate Bayesian computation analyses, which are suitable approaches to unraveling complex demographic histories. Our analyses used microsatellite markers and sequences of four nuclear genes sampled from populations in the Andes, Mexico, and elsewhere. To infer the ancestral state, we included the closest known relatives Phytophthora phaseoli, Phytophthora mirabilis, and Phytophthora ipomoeae, as well as the interspecific hybrid Phytophthora andina. We did not find support for an Andean origin of P. infestans; rather, the sequence data suggest a Mexican origin. Our findings support the hypothesis that populations found in the Andes are descendants of the Mexican populations and reconcile previous findings of ancestral variation in the Andes. Although centers of origin are well documented as centers of evolution and diversity for numerous crop plants, the number of plant pathogens with a known geographic origin are limited. This work has important implications for our understanding of the coevolution of hosts and pathogens, as well as the harnessing of plant disease resistance to manage late blight.

The Taiji program: A concise overview
Ziren Luo, Yan Wang, Yue-Liang Wu, Wenrui Hu +1 more
2020· Progress of Theoretical and Experimental Physics225doi:10.1093/ptep/ptaa083

Abstract Taiji is a Chinese space mission to detect gravitational waves in the frequency band 0.1 mHz to 1.0 Hz, which aims at detecting super (intermediate) mass black hole mergers and extreme (intermediate) mass ratio in-spirals. A brief introduction of its mission overview, scientific objectives, and payload design is presented. A roadmap is also given in which the launching time is set to the 2030s.

The Forward Physics Facility at the High-Luminosity LHC
Jonathan L. Feng, Felix Kling, Mary Hall Reno, Juan Rojo +4 more
2023· Research Explorer (The University of Manchester)220doi:10.1088/1361-6471/ac865e

High energy collisions at the High-Luminosity Large Hadron Collider (LHC) produce a large number of particles along the beam collision axis, outside of the acceptance of existing LHC experiments. The proposed Forward Physics Facility (FPF), to be located several hundred meters from the ATLAS interaction point and shielded by concrete and rock, will host a suite of experiments to probe standard model (SM) processes and search for physics beyond the standard model (BSM). In this report, we review the status of the civil engineering plans and the experiments to explore the diverse physics signals that can be uniquely probed in the forward region. FPF experiments will be sensitive to a broad range of BSM physics through searches for new particle scattering or decay signatures and deviations from SM expectations in high statistics analyses with TeV neutrinos in this low-background environment. High statistics neutrino detection will also provide valuable data for fundamental topics in perturbative and non-perturbative QCD and in weak interactions. Experiments at the FPF will enable synergies between forward particle production at the LHC and astroparticle physics to be exploited. We report here on these physics topics, on infrastructure, detector, and simulation studies, and on future directions to realize the FPF’s physics potential.

Complete set of dimension-eight operators in the standard model effective field theory
Hao-Lin Li, Zhe Ren, Jing Shu, Ming-Lei Xiao +2 more
2021· Physical review. D/Physical review. D.218doi:10.1103/physrevd.104.015026

We present a complete list of the dimension-eight operator basis in the standard model effective field theory using group theoretic techniques in a systematic and automated way. We adopt a new form of operators in terms of the irreducible representations of the Lorentz group and identify the Lorentz structures as states in a $SU(N)$ group. In this way, redundancy from equations of motion is absent and that from integration by part is treated using the fact that the independent Lorentz basis forms an invariant subspace of the $SU(N)$ group. We also decompose operators into the ones with definite permutation symmetries among flavor indices to deal with subtlety from repeated fields. For the first time to our knowledge, we provide the explicit form of independent flavor-specified operators in a systematic way. Our algorithm can easily be applied to higher-dimensional standard model effective field theory and other effective field theories, making these studies more approachable.

Solitons and Polarons in Conducting Polymers
Yu Lu
1988· WORLD SCIENTIFIC eBooks212doi:10.1142/0242

Polyacetylence, (CHx is the simplest conjugated polymer. Prestine polyacetylence is a good insulator, whereas its highly doped version exhibits metal-like electrical conductivity. This book gives a detailed introduction to this rapidly-developing field is given along with a collection of original papers. The main purpose is to help chemists and physicists grasp the main ideas and most important facts; an expert may also find it useful as a reference volume.

DianNao family
Yunji Chen, Tianshi Chen, Zhiwei Xu, Ninghui Sun +1 more
2016· Communications of the ACM205doi:10.1145/2996864

Machine Learning (ML) tasks are becoming pervasive in a broad range of applications, and in a broad range of systems (from embedded systems to data centers). As computer architectures evolve toward heterogeneous multi-cores composed of a mix of cores and hardware accelerators, designing hardware accelerators for ML techniques can simultaneously achieve high efficiency and broad application scope. While efficient computational primitives are important for a hardware accelerator, inefficient memory transfers can potentially void the throughput, energy, or cost advantages of accelerators, that is, an Amdahl's law effect, and thus, they should become a first-order concern, just like in processors, rather than an element factored in accelerator design on a second step. In this article, we introduce a series of hardware accelerators (i.e., the DianNao family) designed for ML (especially neural networks), with a special emphasis on the impact of memory on accelerator design, performance, and energy. We show that, on a number of representative neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip DaDianNao system (a member of the DianNao family).&lt;!-- END_PAGE_1 --&gt;

Solving the Sampling Problem of the Sycamore Quantum Circuits
Feng Pan, Keyang Chen, Pan Zhang
2022· Physical Review Letters144doi:10.1103/physrevlett.129.090502

We study the problem of generating independent samples from the output distribution of Google's Sycamore quantum circuits with a target fidelity, which is believed to be beyond the reach of classical supercomputers and has been used to demonstrate quantum supremacy. We propose a method to classically solve this problem by contracting the corresponding tensor network just once, and is massively more efficient than existing methods in generating a large number of uncorrelated samples with a target fidelity. For the Sycamore quantum supremacy circuit with 53 qubits and 20 cycles, we have generated 1×10^{6} uncorrelated bitstrings s which are sampled from a distribution P[over ^](s)=|ψ[over ^](s)|^{2}, where the approximate state ψ[over ^] has fidelity F≈0.0037. The whole computation has cost about 15 h on a computational cluster with 512 GPUs. The obtained 1×10^{6} samples, the contraction code and contraction order are made public. If our algorithm could be implemented with high efficiency on a modern supercomputer with ExaFLOPS performance, we estimate that ideally, the simulation would cost a few dozens of seconds, which is faster than Google's quantum hardware.

Simulation of Quantum Circuits Using the Big-Batch Tensor Network Method
Feng Pan, Pan Zhang
2022· Physical Review Letters137doi:10.1103/physrevlett.128.030501

We propose a tensor network approach to compute amplitudes and probabilities for a large number of correlated bitstrings in the final state of a quantum circuit. As an application, we study Google's Sycamore circuits, which are believed to be beyond the reach of classical supercomputers and have been used to demonstrate quantum supremacy. By employing a small computational cluster containing 60 graphical processing units (GPUs), we compute exact amplitudes and probabilities of 2×10^{6} correlated bitstrings with some entries fixed (which span a subspace of the output probability distribution) for the Sycamore circuit with 53 qubits and 20 cycles. The obtained results verify the Porter-Thomas distribution of the large and deep quantum circuits of Google, provide datasets and benchmarks for developing approximate simulation methods, and can be used for spoofing the linear cross entropy benchmark of quantum supremacy. Then we extend the proposed big-batch method to a full-amplitude simulation approach that is more efficient than the existing Schrödinger method on shallow circuits and the Schrödinger-Feynman method in general, enabling us to obtain the state vector of Google's simplifiable circuit with n=43 qubits and m=14 cycles using only one GPU. We also manage to obtain the state vector for Google's simplifiable circuits with n=50 qubits and m=14 cycles using a small GPU cluster, breaking the previous record on the number of qubits in full-amplitude simulations. Our method is general in computing bitstring probabilities for a broad class of quantum circuits and can find applications in the verification of quantum computers. We anticipate that our method will pave the way for combining tensor network-based classical computations and near-term quantum computations for solving challenging problems in the real world.

BenchNN: On the broad potential application scope of hardware neural network accelerators
Tianshi Chen, Yunji Chen, Marc Duranton, Qi Guo +4 more
2012109doi:10.1109/iiswc.2012.6402898

Recent technology trends have indicated that, although device sizes will continue to scale as they have in the past, supply voltage scaling has ended. As a result, future chips can no longer rely on simply increasing the operational core count to improve performance without surpassing a reasonable power budget. Alternatively, allocating die area towards accelerators targeting an application, or an application domain, appears quite promising, and this paper makes an argument for a neural network hardware accelerator. After being hyped in the 1990s, then fading away for almost two decades, there is a surge of interest in hardware neural networks because of their energy and fault-tolerance properties. At the same time, the emergence of high-performance applications like Recognition, Mining, and Synthesis (RMS) suggest that the potential application scope of a hardware neural network accelerator would be broad. In this paper, we want to highlight that a hardware neural network accelerator is indeed compatible with many of the emerging high-performance workloads, currently accepted as benchmarks for high-performance micro-architectures. For that purpose, we develop and evaluate software neural network implementations of 5 (out of 12) RMS applications from the PARSEC Benchmark Suite. Our results show that neural network implementations can achieve competitive results, with respect to application-specific quality metrics, on these 5 RMS applications.

On thermal gravitational contribution to particle production and dark matter
Yong Tang, Yue-Liang Wu
2017· Physics Letters B109doi:10.1016/j.physletb.2017.10.034

We investigate the particle production from thermal gravitational annihilation in the very early universe, which is an important contribution for particles that might not be in thermal equilibrium or/and might only have gravitational interaction, such as dark matter (DM). For particles with spin 0,1/2 and 1 we calculate the relevant cross sections through gravitational annihilation and give the analytic formulas with full mass-dependent terms. We find that DM with mass between TeV and 1016GeV could have the relic abundance that fits the observation, with small dependence on its spin. We also discuss the effects of gravitational annihilation from inflatons. Interestingly, contributions from inflatons could be dominant and have the same power dependence on Hubble parameter of inflation as that from vacuum fluctuation. Also, fermion production from inflaton, in comparison to boson, is suppressed by its mass due to helicity selection.

CutSplit: A Decision-Tree Combining Cutting and Splitting for Scalable Packet Classification
Wenjun Li, Xianfeng Li, Hui Li, Gaogang Xie
2018108doi:10.1109/infocom.2018.8485947

Efficient algorithmic solutions for multi-field packet classification have been a challenging problem for many years. This problem is becoming even worse in the era of Software Defined Network (SDN), where flow tables with increasing complexities are playing a central role in the forwarding plane of SDN. In this paper, we first conduct an unprecedented in-depth reasoning on issues that led to the unsuccess of the major quests for scalable algorithmic solutions. With the insights obtained, we propose a practical framework called CutSplit, which can exploit the benefits of cutting and splitting techniques adaptively. By addressing the central problem caused by uncontrollable rule replications suffered by the major efforts, CutSplit not only pushes the performance of algorithmic packet classification more closely to hardware-based solutions, but also reduces the memory consumption to a practical level. Moreover, our work achieves low pre-processing time for rule updates, a problem that has long been ignored by previous decision-trees, but is becoming more relevant in the context of SDN due to frequent updates of rules. Experimental results show that using ClassBench, CutSplit achieves a memory reduction over 10 times, as well as 3x improvement on performance in terms of the number of memory access on average.

A novel refinement approach for text categorization
Songbo Tan, Xueqi Cheng, Moustafa Ghanem, Bin Wang +1 more
2005106doi:10.1145/1099554.1099687

In this paper we present a novel strategy, DragPushing, for improving the performance of text classifiers. The strategy is generic and takes advantage of training errors to successively refine the classification model of a base classifier. We describe how it is applied to generate two new classification algorithms; a Refined Centroid Classifier and a Refined Naïve Bayes Classifier. We present an extensive experimental evaluation of both algorithms on three English collections and one Chinese corpus. The results indicate that in each case, the refined classifiers achieve significant performance improvement over the base classifiers used. Furthermore, the performance of the Refined Centroid Classifier implemented is comparable, if not better, to that of state-of-the-art support vector machine (SVM)-based classifier, but offers a much lower computational cost.

Lattice QCD Calculations of Transverse-Momentum-Dependent Soft Function through Large-Momentum Effective Theory
Qi-An Zhang, Jun Hua, Yi-Kai Huo, Xiangdong Ji +4 more
2020· Physical Review Letters102doi:10.1103/physrevlett.125.192001

The transverse-momentum-dependent (TMD) soft function is a key ingredient in QCD factorization of Drell-Yan and other processes with relatively small transverse momentum. We present a lattice QCD study of this function at moderately large rapidity on a 2+1 flavor CLS dynamic ensemble with a=0.098 fm. We extract the rapidity-independent (or intrinsic) part of the soft function through a large-momentum-transfer pseudoscalar meson form factor and its quasi-TMD wave function using leading-order factorization in large-momentum effective theory. We also investigate the rapidity-dependent part of the soft function-the Collins-Soper evolution kernel-based on the large-momentum evolution of the quasi-TMD wave function.

A hybrid renormalization scheme for quasi light-front correlations in large-momentum effective theory
Xiangdong Ji, Yizhuang Liu, Andreas Schäfer, Wei Wang +3 more
2021· Nuclear Physics B98doi:10.1016/j.nuclphysb.2021.115311

In large-momentum effective theory (LaMET), calculating parton physics starts from calculating coordinate-space-z correlation functions h˜(z,a,Pz) in a hadron of momentum Pz in lattice QCD. Such correlation functions involve both linear and logarithmic divergences in lattice spacing a, and thus need to be properly renormalized. We introduce a hybrid renormalization procedure to match these lattice correlations to those in the continuum MS‾ scheme, without introducing extra non-perturbative effects at large z. We analyze the effect of O(ΛQCD) ambiguity in the Wilson line self-energy subtraction involved in this hybrid scheme. To obtain the momentum-space distributions, we recommend to extrapolate the lattice data to the asymptotic z-region using the generic properties of the coordinate space correlations at moderate and large Pz, respectively.

Detection of early-universe gravitational-wave signatures and fundamental physics
Robert R. Caldwell, Yanou Cui, Huai-Ke Guo, Vuk Mandic +4 more
2022· General Relativity and Gravitation94doi:10.1007/s10714-022-03027-x

Detection of a gravitational-wave signal of non-astrophysical origin would be a landmark discovery, potentially providing a significant clue to some of our most basic, big-picture scientific questions about the Universe. In this white paper, we survey the leading early-Universe mechanisms that may produce a detectable signal-including inflation, phase transitions, topological defects, as well as primordial black holes-and highlight the connections to fundamental physics. We review the complementarity with collider searches for new physics, and multimessenger probes of the large-scale structure of the Universe.

Learning for search result diversification
Yadong Zhu, Yanyan Lan, Jiafeng Guo, Xueqi Cheng +1 more
201490doi:10.1145/2600428.2609634

Search result diversification has gained attention as a way to tackle the ambiguous or multi-faceted information needs of users. Most existing methods on this problem utilize a heuristic predefined ranking function, where limited features can be incorporated and extensive tuning is required for different settings. In this paper, we address search result diversification as a learning problem, and introduce a novel relational learning-to-rank approach to formulate the task. However, the definitions of ranking function and loss function for the diversification problem are challenging. In our work, we firstly show that diverse ranking is in general a sequential selection process from both empirical and theoretical aspects. On this basis, we define ranking function as the combination of relevance score and diversity score between the current document and those previously selected, and loss function as the likelihood loss of ground truth based on Plackett-Luce model, which can naturally model the sequential generation of a diverse ranking list. Stochastic gradient descent is then employed to conduct the unconstrained optimization, and the prediction of a diverse ranking list is provided by a sequential selection process based on the learned ranking function. The experimental results on the public TREC datasets demonstrate the effectiveness and robustness of our approach.

Boosted Dark Matter Interpretation of the XENON1T Excess
Bartosz Fornal, Pearl Sandick, Jing Shu, Meng Su +1 more
2020· Physical Review Letters89doi:10.1103/physrevlett.125.161804

We propose boosted dark matter (BDM) as a possible explanation for the excess of keV electron recoil events observed by XENON1T. BDM particles have velocities much larger than those typical of virialized dark matter, and, as such, BDM-electron scattering can naturally produce keV electron recoils. We show that the required BDM-electron scattering cross sections can be easily realized in a simple model with a heavy vector mediator. Though these cross sections are too large for BDM to escape from the Sun, the BDM flux can originate from the Galactic Center or from halo dark matter annihilations. Furthermore, a daily modulation of the BDM signal will be present, which could not only be used to differentiate it from various backgrounds but would also provide important directional information for the BDM flux.