University of Washington

UniversitySeattle, Washington, United States

Research output, citation impact, and the most-cited recent papers from University of Washington (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works

520.5K

Citations

94.5M

h-index

3585

i10-index

600.8K

Also known as

Universidad de WashingtonUniversity of WashingtonUniversité de Washington

Top-cited papers from University of Washington

Scikit-learn: Machine Learning in Python

Fabián Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel +4 more

2012· arXiv (Cornell University)63.8Kdoi:10.48550/arxiv.1201.0490

Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.org.

XGBoost

Tianqi Chen, Carlos Guestrin

201649.2Kdoi:10.1145/2939672.2939785

Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

Three Approaches to Qualitative Content Analysis

Hsiu-Fang Hsieh, Sarah E. Shannon

2005· Qualitative Health Research44.2Kdoi:10.1177/1049732305276687

Content analysis is a widely used qualitative research technique. Rather than being a single method, current applications of content analysis show three distinct approaches: conventional, directed, or summative. All three approaches are used to interpret meaning from the content of text data and, hence, adhere to the naturalistic paradigm. The major differences among the approaches are coding schemes, origins of codes, and threats to trustworthiness. In conventional content analysis, coding categories are derived directly from the text data. With a directed approach, analysis starts with a theory or relevant research findings as guidance for initial codes. A summative content analysis involves counting and comparisons, usually of keywords or content, followed by the interpretation of the underlying context. The authors delineate analytic procedures specific to each approach and techniques addressing trustworthiness with hypothetical examples drawn from the area of end-of-life care.

CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP

Joseph Felsenstein

1985· Evolution41.5Kdoi:10.1111/j.1558-5646.1985.tb00420.x

The recently-developed statistical method known as the "bootstrap" can be used to place confidence intervals on phylogenies. It involves resampling points from one's own data, with replacement, to create a series of bootstrap samples of the same size as the original data. Each of these is analyzed, and the variation among the resulting estimates taken to indicate the size of the error involved in making estimates from the original data. In the case of phylogenies, it is argued that the proper method of resampling is to keep all of the original species while sampling characters with replacement, under the assumption that the characters have been independently drawn by the systematist and have evolved independently. Majority-rule consensus trees can be used to construct a phylogeny showing all of the inferred monophyletic groups that occurred in a majority of the bootstrap samples. If a group shows up 95% of the time or more, the evidence for it is taken to be statistically significant. Existing computer programs can be used to analyze different bootstrap samples by using weights on the characters, the weight of a character being how many times it was drawn in bootstrap sampling. When all characters are perfectly compatible, as envisioned by Hennig, bootstrap sampling becomes unnecessary; the bootstrap method would show significant evidence for a group if it is defined by three or more characters.

SciPy 1.0: fundamental algorithms for scientific computing in Python

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland +4 more

2020· Nature Methods38.3Kdoi:10.1038/s41592-019-0686-2

SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.

Initial sequencing and analysis of the human genome

Eric S. Lander, Lauren Linton, Bruce W. Birren, Chad Nusbaum +4 more

2001· Nature24.6Kdoi:10.1038/35057062

The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.

Atherosclerosis — An Inflammatory Disease

Russell Ross

1999· New England Journal of Medicine21.7Kdoi:10.1056/nejm199901143400207

Atherosclerosis is an inflammatory disease. Because high plasma concentrations of cholesterol, in particular those of low-density lipoprotein (LDL) cholesterol, are one of the principal risk factors for atherosclerosis,1 the process of atherogenesis has been considered by many to consist largely of the accumulation of lipids within the artery wall; however, it is much more than that. Despite changes in lifestyle and the use of new pharmacologic approaches to lower plasma cholesterol concentrations,2,3 cardiovascular disease continues to be the principal cause of death in the United States, Europe, and much of Asia.4,5 In fact, the lesions of atherosclerosis represent . . .

A climbing image nudged elastic band method for finding saddle points and minimum energy paths

Graeme Henkelman, Blas P. Uberuaga, Hannes Jónsson

2000· The Journal of Chemical Physics21.3Kdoi:10.1063/1.1329672

A modification of the nudged elastic band method for finding minimum energy paths is presented. One of the images is made to climb up along the elastic band to converge rigorously on the highest saddle point. Also, variable spring constants are used to increase the density of images near the top of the energy barrier to get an improved estimate of the reaction coordinate near the saddle point. Applications to CH4 dissociative adsorption on Ir(111) and H2 on Si(100) using plane wave based density functional theory are presented.

A global reference for human genetic variation

Corresponding authors, Adam Auton, Gonçalo R. Abecasis, David M. Altshuler +4 more

2015· Nature20.0Kdoi:10.1038/nature15393

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Results for the final phase of the 1000 Genomes Project are presented including whole-genome sequencing, targeted exome sequencing, and genotyping on high-density SNP arrays for 2,504 individuals across 26 populations, providing a global reference data set to support biomedical genetics. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.

Observational Evidence from Supernovae for an Accelerating Universe and a Cosmological Constant

Adam G. Riess, A. V. Filippenko, P. Challis, A. Clocchiatti +4 more

1998· The Astronomical Journal19.5Kdoi:10.1086/300499

We present spectral and photometric observations of 10 Type Ia supernovae (SNe Ia) in the redshift range 0.16 z 0.62. The luminosity distances of these objects are determined by methods that employ relations between SN Ia luminosity and light curve shape. Combined with previous data from our High-z Supernova Search Team and recent results by Riess et al., this expanded set of 16 high-redshift M \ 1) methods. We estimate the dynamical age of the universe to be 14.2 ^1.7 Gyr including systematic uncertainties in the current Cepheid distance scale. We estimate the likely e ect of several sources of systematic error, including progenitor and metallicity evolution, extinction, sample selection bias, local perturbations in the expansion rate, gravitational lensing, and sample contamination. Presently, none of these e ects appear to reconcile the data with and ) " \ 0 q 0 0.

YOLO9000: Better, Faster, Stronger

Joseph Redmon, Ali Farhadi

201719.0Kdoi:10.1109/cvpr.2017.690

We introduce YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories. First we propose various improvements to the YOLO detection method, both novel and drawn from prior work. The improved model, YOLOv2, is state-of-the-art on standard detection tasks like PASCAL VOC and COCO. Using a novel, multi-scale training method the same YOLOv2 model can run at varying sizes, offering an easy tradeoff between speed and accuracy. At 67 FPS, YOLOv2 gets 76.8 mAP on VOC 2007. At 40 FPS, YOLOv2 gets 78.6 mAP, outperforming state-of-the-art methods like Faster RCNN with ResNet and SSD while still running significantly faster. Finally we propose a method to jointly train on object detection and classification. Using this method we train YOLO9000 simultaneously on the COCO detection dataset and the ImageNet classification dataset. Our joint training allows YOLO9000 to predict detections for object classes that dont have labelled detection data. We validate our approach on the ImageNet detection task. YOLO9000 gets 19.7 mAP on the ImageNet detection validation set despite only having detection data for 44 of the 200 classes. On the 156 classes not in COCO, YOLO9000 gets 16.0 mAP. YOLO9000 predicts detections for more than 9000 different object categories, all in real-time.

Antibiotic Susceptibility Testing by a Standardized Single Disk Method

Alfred W. Bauer, William Kirby, J. C. Sherris, Marvin Turck

1966· American Journal of Clinical Pathology18.5Kdoi:10.1093/ajcp/45.4_ts.493

A. W. Bauer, M.D., W. M. M. Kirby, M.D., J. C. Sherris, M.D., M. Turck, M.D.; Antibiotic Susceptibility Testing by a Standardized Single Disk Method, Ameri

"Why Should I Trust You?"

Marco Túlio Ribeiro, Sameer Singh, Carlos Guestrin

201615.4Kdoi:10.1145/2939672.2939778

Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one.

The 1982 revised criteria for the classification of systemic lupus erythematosus

Eng M. Tan, Alan S. Cohen, James F. Fries, Alfonse T. Masi +4 more

1982· Arthritis & Rheumatism14.6Kdoi:10.1002/art.1780251101

The 1971 preliminary criteria for the classification of systemic lupus erythematosus (SLE) were revised and updated to incorporate new immunologic knowledge and improve disease classification. The 1982 revised criteria include fluorescence antinuclear antibody and antibody to native DNA and Sm antigen. Some criteria involving the same organ systems were aggregated into single criteria. Raynaud's phenomenon and alopecia were not included in the 1982 revised criteria because of low sensitivity and specificity. The new criteria were 96% sensitive and 96% specific when tested with SLE and control patient data gathered from 18 participating clinics. When compared with the 1971 criteria, the 1982 revised criteria showed gains in sensitivity and specificity.

Observation of Gravitational Waves from a Binary Black Hole Merger

B. P. Abbott, R. Abbott, T. D. Abbott, M. R. Abernathy +4 more

2016· Physical Review Letters14.3Kdoi:10.1103/physrevlett.116.061102

On September 14, 2015 at 09:50:45 UTC the two detectors of the Laser Interferometer Gravitational-Wave Observatory simultaneously observed a transient gravitational-wave signal. The signal sweeps upwards in frequency from 35 to 250 Hz with a peak gravitational-wave strain of 1.0×10(-21). It matches the waveform predicted by general relativity for the inspiral and merger of a pair of black holes and the ringdown of the resulting single black hole. The signal was observed with a matched-filter signal-to-noise ratio of 24 and a false alarm rate estimated to be less than 1 event per 203,000 years, equivalent to a significance greater than 5.1σ. The source lies at a luminosity distance of 410(-180)(+160) Mpc corresponding to a redshift z=0.09(-0.04)(+0.03). In the source frame, the initial black hole masses are 36(-4)(+5)M⊙ and 29(-4)(+4)M⊙, and the final black hole mass is 62(-4)(+4)M⊙, with 3.0(-0.5)(+0.5)M⊙c(2) radiated in gravitational waves. All uncertainties define 90% credible intervals. These observations demonstrate the existence of binary stellar-mass black hole systems. This is the first direct detection of gravitational waves and the first observation of a binary black hole merger.

<i>The Properties of Gases and Liquids</i>

Robert C. Reid, T. K. Sherwood, Robert E. Street

1959· Physics Today14.2Kdoi:10.1063/1.3060771

Share Icon Share Twitter Facebook Reddit LinkedIn Reprints and Permissions Cite Icon Cite Search Site Citation Robert C. Reid, Thomas K. Sherwood, Robert E. Street; The Properties of Gases and Liquids. Physics Today 1 April 1959; 12 (4): 38–40. https://doi.org/10.1063/1.3060771 Download citation file: Ris (Zotero) Reference Manager EasyBib Bookends Mendeley Papers EndNote RefWorks BibTex toolbar search Search Dropdown Menu toolbar search search input Search input auto suggest filter your search All ContentPhysics Today Search Advanced Search

Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017

Spencer L James, Degu Abate, Kalkidan Hassen Abate, Solomón Mequanente Abay +4 more

2018· The Lancet14.1Kdoi:10.1016/s0140-6736(18)32279-7

BACKGROUND: The Global Burden of Diseases, Injuries, and Risk Factors Study 2017 (GBD 2017) includes a comprehensive assessment of incidence, prevalence, and years lived with disability (YLDs) for 354 causes in 195 countries and territories from 1990 to 2017. Previous GBD studies have shown how the decline of mortality rates from 1990 to 2016 has led to an increase in life expectancy, an ageing global population, and an expansion of the non-fatal burden of disease and injury. These studies have also shown how a substantial portion of the world's population experiences non-fatal health loss with considerable heterogeneity among different causes, locations, ages, and sexes. Ongoing objectives of the GBD study include increasing the level of estimation detail, improving analytical strategies, and increasing the amount of high-quality data. METHODS: We estimated incidence and prevalence for 354 diseases and injuries and 3484 sequelae. We used an updated and extensive body of literature studies, survey data, surveillance data, inpatient admission records, outpatient visit records, and health insurance claims, and additionally used results from cause of death models to inform estimates using a total of 68 781 data sources. Newly available clinical data from India, Iran, Japan, Jordan, Nepal, China, Brazil, Norway, and Italy were incorporated, as well as updated claims data from the USA and new claims data from Taiwan (province of China) and Singapore. We used DisMod-MR 2.1, a Bayesian meta-regression tool, as the main method of estimation, ensuring consistency between rates of incidence, prevalence, remission, and cause of death for each condition. YLDs were estimated as the product of a prevalence estimate and a disability weight for health states of each mutually exclusive sequela, adjusted for comorbidity. We updated the Socio-demographic Index (SDI), a summary development indicator of income per capita, years of schooling, and total fertility rate. Additionally, we calculated differences between male and female YLDs to identify divergent trends across sexes. GBD 2017 complies with the Guidelines for Accurate and Transparent Health Estimates Reporting. FINDINGS: Globally, for females, the causes with the greatest age-standardised prevalence were oral disorders, headache disorders, and haemoglobinopathies and haemolytic anaemias in both 1990 and 2017. For males, the causes with the greatest age-standardised prevalence were oral disorders, headache disorders, and tuberculosis including latent tuberculosis infection in both 1990 and 2017. In terms of YLDs, low back pain, headache disorders, and dietary iron deficiency were the leading Level 3 causes of YLD counts in 1990, whereas low back pain, headache disorders, and depressive disorders were the leading causes in 2017 for both sexes combined. All-cause age-standardised YLD rates decreased by 3·9% (95% uncertainty interval [UI] 3·1-4·6) from 1990 to 2017; however, the all-age YLD rate increased by 7·2% (6·0-8·4) while the total sum of global YLDs increased from 562 million (421-723) to 853 million (642-1100). The increases for males and females were similar, with increases in all-age YLD rates of 7·9% (6·6-9·2) for males and 6·5% (5·4-7·7) for females. We found significant differences between males and females in terms of age-standardised prevalence estimates for multiple causes. The causes with the greatest relative differences between sexes in 2017 included substance use disorders (3018 cases [95% UI 2782-3252] per 100 000 in males vs s1400 [1279-1524] per 100 000 in females), transport injuries (3322 [3082-3583] vs 2336 [2154-2535]), and self-harm and interpersonal violence (3265 [2943-3630] vs 5643 [5057-6302]). INTERPRETATION: Global all-cause age-standardised YLD rates have improved only slightly over a period spanning nearly three decades. However, the magnitude of the non-fatal disease burden has expanded globally, with increasing numbers of people who have a wide spectrum of conditions. A subset of conditions has remained globally pervasive since 1990, whereas other conditions have displayed more dynamic trends, with different ages, sexes, and geographies across the globe experiencing varying burdens and trends of health loss. This study emphasises how global improvements in premature mortality for select conditions have led to older populations with complex and potentially expensive diseases, yet also highlights global achievements in certain domains of disease and injury. FUNDING: Bill & Melinda Gates Foundation.

On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)

Gauriat, Charles-Maxime, Pencolé, Yannick, Ribot, Pauline, Brouillet, Gregory

2024· Dagstuhl Research Online Publication Server13.4Kdoi:10.4230/oasics.dx.2024.27

In an industrial maintenance context, degradation diagnosis is the problem of determining the current level of degradation of operating machines based on measurements. With the emergence of Machine Learning techniques, such a problem can now be solved by training a degradation model offline and by using it online. While such models are more and more accurate and performant, they are often black-box and their decisions are therefore not interpretable for human maintenance operators. On the contrary, interpretable ML models are able to provide explanations for the model’s decisions and consequently improves the confidence of the human operator about the maintenance decision based on these models. This paper proposes a new method to quantitatively measure the interpretability of such models that is agnostic (no assumption about the class of models) and that is applied on degradation models. The proposed method requires that the decision maker sets up some high level parameters in order to measure the interpretability of the models and then can decide whether the obtained models are satisfactory or not. The method is formally defined and is fully illustrated on a decision tree degradation model and a model trained with a recent neural network architecture called Multiclass Neural Additive Model.

Bioconductor: open software development for computational biology and bioinformatics

Robert Gentleman, Vincent J. Carey, Douglas M. Bates, Ben Bolstad +4 more

2004· Genome biology12.5Kdoi:10.1186/gb-2004-5-10-r80

The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples.

Bayes Factors

Robert E. Kass, Adrian E. Raftery

1995· Journal of the American Statistical Association12.3Kdoi:10.1080/01621459.1995.10476572

Abstract In a 1935 paper and in his book Theory of Probability, Jeffreys developed a methodology for quantifying the evidence in favor of a scientific theory. The centerpiece was a number, now called the Bayes factor, which is the posterior odds of the null hypothesis when the prior probability on the null is one-half. Although there has been much discussion of Bayesian hypothesis testing in the context of criticism of P-values, less attention has been given to the Bayes factor as a practical tool of applied statistics. In this article we review and discuss the uses of Bayes factors in the context of five scientific applications in genetics, sports, ecology, sociology, and psychology. We emphasize the following points: •From Jeffreys' Bayesian viewpoint, the purpose of hypothesis testing is to evaluate the evidence in favor of a scientific theory.•Bayes factors offer a way of evaluating evidence in favor of a null hypothesis.•Bayes factors provide a way of incorporating external information into the evaluation of evidence about a hypothesis.•Bayes factors are very general and do not require alternative models to be nested.•Several techniques are available for computing Bayes factors, including asymptotic approximations that are easy to compute using the output from standard packages that maximize likelihoods.•In “nonstandard” statistical models that do not satisfy common regularity conditions, it can be technically simpler to calculate Bayes factors than to derive non-Bayesian significance tests.•The Schwarz criterion (or BIC) gives a rough approximation to the logarithm of the Bayes factor, which is easy to use and does not require evaluation of prior distributions.•When one is interested in estimation or prediction, Bayes factors may be converted to weights to be attached to various models so that a composite estimate or prediction may be obtained that takes account of structural or model uncertainty.•Algorithms have been proposed that allow model uncertainty to be taken into account when the class of models initially considered is very large.•Bayes factors are useful for guiding an evolutionary model-building process.•It is important, and feasible, to assess the sensitivity of conclusions to the prior distributions used.

Search all NobleBlocks papers mentioning “University of Washington” →