Microsoft (Netherlands)
companyAmsterdam, Netherlands
Research output, citation impact, and the most-cited recent papers from Microsoft (Netherlands) (Netherlands). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Microsoft (Netherlands)
Conformer-rotamer sampling tool (CREST) is an open-source program for the efficient and automated exploration of molecular chemical space. Originally developed in Pracht et al. [Phys. Chem. Chem. Phys. 22, 7169 (2020)] as an automated driver for calculations at the extended tight-binding level (xTB), it offers a variety of molecular- and metadynamics simulations, geometry optimization, and molecular structure analysis capabilities. Implemented algorithms include automated procedures for conformational sampling, explicit solvation studies, the calculation of absolute molecular entropy, and the identification of molecular protonation and deprotonation sites. Calculations are set up to run concurrently, providing efficient single-node parallelization. CREST is designed to require minimal user input and comes with an implementation of the GFNn-xTB Hamiltonians and the GFN-FF force-field. Furthermore, interfaces to any quantum chemistry and force-field software can easily be created. In this article, we present recent developments in the CREST code and show a selection of applications for the most important features of the program. An important novelty is the refactored calculation backend, which provides significant speed-up for sampling of small or medium-sized drug molecules and allows for more sophisticated setups, for example, quantum mechanics/molecular mechanics and minimum energy crossing point calculations.
Abstract The design of functional materials with desired properties is essential in driving technological advances in areas such as energy storage, catalysis and carbon capture 1–3 . Generative models accelerate materials design by directly generating new materials given desired property constraints, but current methods have a low success rate in proposing stable crystals or can satisfy only a limited set of property constraints 4–11 . Here we present MatterGen, a model that generates stable, diverse inorganic materials across the periodic table and can further be fine-tuned to steer the generation towards a broad range of property constraints. Compared with previous generative models 4,12 , structures produced by MatterGen are more than twice as likely to be new and stable, and more than ten times closer to the local energy minimum. After fine-tuning, MatterGen successfully generates stable, new materials with desired chemistry, symmetry and mechanical, electronic and magnetic properties. As a proof of concept, we synthesize one of the generated structures and measure its property value to be within 20% of our target. We believe that the quality of generated materials and the breadth of abilities of MatterGen represent an important advancement towards creating a foundational generative model for materials design.
Over the past six decades, the computing systems field has experienced significant transformations, profoundly impacting society with transformational developments, such as the Internet and the commodification of computing. Underpinned by technological advancements, computer systems, far from being static, have been continuously evolving and adapting to cover multifaceted societal niches. This has led to new paradigms such as cloud, fog, edge computing, and the Internet of Things (IoT), which offer fresh economic and creative opportunities. Nevertheless, this rapid change poses complex research challenges, especially in maximizing potential and enhancing functionality. As such, to maintain an economical level of performance that meets ever-tighter requirements, one must understand the drivers of new model emergence and expansion, and how contemporary challenges differ from past ones. To that end, this article investigates and assesses the factors influencing the evolution of computing systems, covering established systems and architectures as well as newer developments, such as serverless computing, quantum computing, and on-device AI on edge devices. Trends emerge when one traces technological trajectory, which includes the rapid obsolescence of frameworks due to business and technical constraints, a move towards specialized systems and models, and varying approaches to centralized and decentralized control. This comprehensive review of modern computing systems looks ahead to the future of research in the field, highlighting key challenges and emerging trends, and underscoring their importance in cost-effectively driving technological progress.
Structure-based drug design (SBDD) aims to design small-molecule ligands that bind with high affinity and specificity to pre-determined protein targets. Generative SBDD methods leverage structural data of drugs with their protein targets to propose new drug candidates. However, most existing methods focus exclusively on bottom-up de novo design of compounds or tackle other drug development challenges with task-specific models. The latter requires curation of suitable datasets, careful engineering of the models and retraining from scratch for each task. Here we show how a single pretrained diffusion model can be applied to a broader range of problems, such as off-the-shelf property optimization, explicit negative design and partial molecular design with inpainting. We formulate SBDD as a three-dimensional conditional generation problem and present DiffSBDD, an SE(3)-equivariant diffusion model that generates novel ligands conditioned on protein pockets. Furthermore, we show how additional constraints can be used to improve the generated drug candidates according to a variety of computational metrics.
Abstract Deep generative models are increasingly powerful tools for the in silico design of novel proteins. Recently, a family of generative models called diffusion models has demonstrated the ability to generate biologically plausible proteins that are dissimilar to any actual proteins seen in nature, enabling unprecedented capability and control in de novo protein design. However, current state-of-the-art diffusion models generate protein structures, which limits the scope of their training data and restricts generations to a small and biased subset of protein design space. Here, we introduce a general-purpose diffusion framework, EvoDiff, that combines evolutionary-scale data with the distinct conditioning capabilities of diffusion models for controllable protein generation in sequence space. EvoDiff generates high-fidelity, diverse, and structurally-plausible proteins that cover natural sequence and functional space. We show experimentally that EvoDiff generations express, fold, and exhibit expected secondary structure elements. Critically, EvoDiff can generate proteins inaccessible to structure-based models, such as those with disordered regions, while maintaining the ability to design scaffolds for functional structural motifs. We validate the universality of our sequence-based formulation by experimentally characterizing intrinsically-disordered mitochondrial targeting signals, metal-binding proteins, and protein binders designed using EvoDiff. We envision that EvoDiff will expand capabilities in protein engineering beyond the structure-function paradigm toward programmable, sequence-first design.
The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a diffusion-based generative model that generates protein backbone structures via a procedure inspired by the natural folding process. We describe a protein backbone structure as a sequence of angles capturing the relative orientation of the constituent backbone atoms, and generate structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins natively twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for more complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release an open-source codebase and trained models for protein structure diffusion.
Abstract Fragment-based drug discovery has been an effective paradigm in early-stage drug development. An open challenge in this area is designing linkers between disconnected molecular fragments of interest to obtain chemically relevant candidate drug molecules. In this work, we propose DiffLinker, an E(3)-equivariant three-dimensional conditional diffusion model for molecular linker design. Given a set of disconnected fragments, our model places missing atoms in between and designs a molecule incorporating all the initial fragments. Unlike previous approaches that are only able to connect pairs of molecular fragments, our method can link an arbitrary number of fragments. Additionally, the model automatically determines the number of atoms in the linker and its attachment points to the input fragments. We demonstrate that DiffLinker outperforms other methods on the standard datasets, generating more diverse and synthetically accessible molecules. We experimentally test our method in real-world applications, showing that it can successfully generate valid linkers conditioned on target protein pockets.
Reliable forecasting of the Earth system is essential for mitigating natural disasters and supporting human progress. Traditional numerical models, although powerful, are extremely computationally expensive1. Recent advances in artificial intelligence (AI) have shown promise in improving both predictive performance and efficiency2,3, yet their potential remains underexplored in many Earth system domains. Here we introduce Aurora, a large-scale foundation model trained on more than one million hours of diverse geophysical data. Aurora outperforms operational forecasts in predicting air quality, ocean waves, tropical cyclone tracks and high-resolution weather, all at orders of magnitude lower computational cost. With the ability to be fine-tuned for diverse applications at modest expense, Aurora represents a notable step towards democratizing accurate and efficient Earth system predictions. These results highlight the transformative potential of AI in environmental forecasting and pave the way for broader accessibility to high-quality climate and weather information. Aurora, a new large-scale foundation model trained on more than one million hours of diverse geophysical data, outperforms operational forecasts in predicting air quality, ocean wave dynamics, tropical cyclone tracks and high-resolution weather.
Coarse-grained (CG) molecular dynamics enables the study of biological processes at temporal and spatial scales that would be intractable at an atomistic resolution. However, accurately learning a CG force field remains a challenge. In this work, we leverage connections between score-based generative models, force fields, and molecular dynamics to learn a CG force field without requiring any force inputs during training. Specifically, we train a diffusion generative model on protein structures from molecular dynamics simulations, and we show that its score function approximates a force field that can directly be used to simulate CG molecular dynamics. While having a vastly simplified training setup compared to previous work, we demonstrate that our approach leads to improved performance across several protein simulations for systems up to 56 amino acids, reproducing the CG equilibrium distribution and preserving the dynamics of all-atom simulations such as protein folding events.
The realization of hybrid superconductor-semiconductor quantum devices, in particular a topological qubit, calls for advanced techniques to readily and reproducibly engineer induced superconductivity in semiconductor nanowires. Here, we introduce an on-chip fabrication paradigm based on shadow walls that offers substantial advances in device quality and reproducibility. It allows for the implementation of hybrid quantum devices and ultimately topological qubits while eliminating fabrication steps such as lithography and etching. This is critical to preserve the integrity and homogeneity of the fragile hybrid interfaces. The approach simplifies the reproducible fabrication of devices with a hard induced superconducting gap and ballistic normal-/superconductor junctions. Large gate-tunable supercurrents and high-order multiple Andreev reflections manifest the exceptional coherence of the resulting nanowire Josephson junctions. Our approach enables the realization of 3-terminal devices, where zero-bias conductance peaks emerge in a magnetic field concurrently at both boundaries of the one-dimensional hybrids.
Benchmarks belong to the very standard repertory of tools deployed in database development. Assessing the capabilities of a system, analyzing actual and potential bottlenecks, and, naturally, comparing the pros and cons of different systems architectures have become indispensable tasks as databases management systems grow in complexity and capacity. In the course of the development of XML databases the need for a benchmark framework has become more and more evident: a great many different ways to store XML data have been suggested in the past, each with its genuine advantages, disadvantages and consequences that propagate through the layers of a complex database system and need to be carefully considered. The different storage schemes render the query characteristics of the data variably different. However, no conclusive methodology for assessing these differences is available to date.In this paper, we outline desiderata for a benchmark for XML databases drawing from our own experience of developing an XML repository, involvement in the definition of the standard query language, and experience with standard benchmarks for relational databases.
In 2007, Shacham published a seminal paper on Return-Oriented Programming (ROP), the first systematic formulation of code reuse. The paper has been highly influential, profoundly shaping the way we still think about code reuse today: an attacker analyzes the "geometry" of victim binary code to locate gadgets and chains these to craft an exploit. This model has spurred much research, with a rapid progression of increasingly sophisticated code reuse attacks and defenses over time. After ten years, the common perception is that state-of-the-art code reuse defenses are effective in significantly raising the bar and making attacks exceedingly hard.
Selective area growth is a promising technique to realize semiconductor–superconductor hybrid nanowire networks, potentially hosting topologically protected Majorana-based qubits. In some cases, however, such as the molecular beam epitaxy of InSb on InP or GaAs substrates, nucleation and selective growth conditions do not necessarily overlap. To overcome this challenge, we propose a metal-sown selective area growth (MS SAG) technique, which allows decoupling selective deposition and nucleation growth conditions by temporarily isolating these stages. It consists of three steps: (i) selective deposition of In droplets only inside the mask openings at relatively high temperatures favoring selectivity, (ii) nucleation of InSb under Sb flux from In droplets, which act as a reservoir of group III adatoms, done at relatively low temperatures, favoring nucleation of InSb, and (iii) homoepitaxy of InSb on top of the formed nucleation layer under a simultaneous supply of In and Sb fluxes at conditions favoring selectivity and high crystal quality. We demonstrate that complex InSb nanowire networks of high crystal and electrical quality can be achieved this way. We extract mobility values of 10 000–25 000 cm2 V–1 s–1 consistently from field-effect and Hall mobility measurements across single nanowire segments as well as wires with junctions. Moreover, we demonstrate ballistic transport in a 440 nm long channel in a single nanowire under a magnetic field below 1 T. We also extract a phase-coherent length of ∼8 μm at 50 mK in mesoscopic rings.
Hybrid superconducting circuits have been used to investigate mesoscopic superconductivity, but mostly just at low magnetic fields, as typical Al-based circuits are incompatible with magnetic fields, and superconducting circuits are sensitive to external magnetic flux noise. To overcome these challenges, the authors build a hybrid fluxonium system composed of (Nb,Ti)N, with a gradiometric design. They observe the spectrum of the hybrid fluxonium in fields of up to 1T, and probe the ${\ensuremath{\phi}}_{0}$ Josephson effect. These results enable future exploration of topological superconductivity, as well as readout of long-lifetime spin-polarized qubits using superconducting circuitry.
Tight-binding approaches, especially the Density Functional Tight-Binding (DFTB) and the extended tight-binding schemes, allow for efficient quantum mechanical simulations of large systems and long-time scales. They are derived from ab initio density functional theory using pragmatic approximations and some empirical terms, ensuring a fine balance between speed and accuracy. Their accuracy can be improved by tuning the empirical parameters using machine learning techniques, especially when information about the local environment of the atoms is incorporated. As the significant quantum mechanical contributions are still provided by the tight-binding models, and only short-ranged corrections are fitted, the learning procedure is typically shorter and more transferable as it were with predicting the quantum mechanical properties directly with machine learning without an underlying physically motivated model. As a further advantage, derived quantum mechanical quantities can be calculated based on the tight-binding model without the need for additional learning. We have developed the open-source framework-Tight-Binding Machine Learning Toolkit-which allows the easy implementation of such combined approaches. The toolkit currently contains layers for the DFTB method and an interface to the GFN1-xTB Hamiltonian, but due to its modular structure and its well-defined interfaces, additional atom-based schemes can be implemented easily. We are discussing the general structure of the framework, some essential implementation details, and several proof-of-concept applications demonstrating the perspectives of the combined methods and the functionality of the toolkit.
Citus is an open source distributed database engine for PostgreSQL that is implemented as an extension. Citus gives users the ability to distribute data, queries, and transactions in PostgreSQL across a cluster of PostgreSQL servers to handle the needs of data-intensive applications. The development of Citus has largely been driven by conversations with companies looking to scale PostgreSQL beyond a single server and their workload requirements. This paper describes the requirements of four common workload patterns and how Citus addresses those requirements. It also shares benchmark results demonstrating the performance and scalability of Citus in each of the workload patterns and describes how Microsoft uses Citus to address one of its most challenging data problems.
We have developed a new method to accurately account for solvation effects in semiempirical quantum mechanics based on a polarizable continuum model (PCM). The extended conductor-like polarizable continuum model (CPCM-X) incorporates a computationally efficient domain decomposition conductor-like screening model (ddCOSMO) for extended tight binding (xTB) methods and uses a post-processing approach based on established solvation models, like the conductor-like screening model for real solvents (COSMO-RS) and the universal solvent model based on solute electron density (SMD). According to various benchmarks, the approach performs well across a broad range of systems and applications, including hydration free energies, non-aqueous solvation free energies, and large supramolecular association reactions of neutral and charged species. Our method for computing solvation free energies is much more accurate than the current methods in the xtb program package. It improves the accuracy of solvation free energies by up to 40% for larger supramolecular association reactions to match even the accuracy of higher-level DFT-based solvation models like COSMO-RS and SMD while being computationally more than 2 orders of magnitude faster. The proposed method and the underlying ddCOSMO model are readily available for a wide variety of solvents and are accessible in xtb for use in various computational applications.
For ground- and excited-state studies of large molecules, it is the state of the art to combine (time-dependent) DFT with dispersion-corrected range-separated hybrid functionals (RSHs), which ensures an asymptotically correct description of exchange effects and London dispersion. Specifically for studying excited states, it is common practice to tune the range-separation parameter ω (optimal tuning), which can further improve the accuracy. However, since optimal tuning essentially changes the functional, it is unclear if and how much the parameters used for the dispersion correction depend on the chosen ω value. To answer this question, we explore this interdependency by refitting the DFT-D4 dispersion model for six established RSHs over a wide range of ω values (0.05–0.45 a0–1) using a set of noncovalently bound molecular complexes. The results reveal some surprising differences among the investigated functionals: While PBE-based RSHs and ωB97M-D4 generally exhibit a weak interdependency and robust performance over a wide range of ω values, B88-based RSHs, specifically LC-BLYP, are strongly affected. For these, even a minor reduction of ω from the default value manifests in strong systematic overbinding and poor performance in the typical range of optimally tuned ω values. Finally, we discuss strategies to mitigate these issues and reflect the results in the context of the employed D4 parameter optimization algorithm and fit set, outlining strategies for future improvements.
The authors attribute the mixed h/e and h/2e-periodic superconducting interference patterns to crossed Andreev reflection (CAR) through the accumulation edge modes. This is shown by comparing Josephson junctions made of InSb flakes with different edge crystal orientations, and further emphasized by the h/eperiodic SQUID oscillation in the depleted regime, indicating the CAR amplitude exceeds the normal Andreev reflection.
-xTB and -FF methods are applied in the ONIOM framework to elucidate reaction energies, geometry optimizations, and explicit solvation effects for metal-organic systems with up to several hundreds of atoms. It is shown that an ONIOM-based combination of density functional theory, semi-empirical, and force-field methods can be used to drastically reduce the computational costs and thus enable the investigation of huge systems at almost no significant loss in accuracy.