University of California, Los Angeles
UniversityLos Angeles, California, United States
Research output, citation impact, and the most-cited recent papers from University of California, Los Angeles (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from University of California, Los Angeles
This article examines the adequacy of the “rules of thumb” conventional cutoff criteria and several new alternatives for various fit indexes used to evaluate model fit in practice. Using a 2‐index presentation strategy, which includes using the maximum likelihood (ML)‐based standardized root mean squared residual (SRMR) and supplementing it with either Tucker‐Lewis Index (TLI), Bollen's (1989) Fit Index (BL89), Relative Noncentrality Index (RNI), Comparative Fit Index (CFI), Gamma Hat, McDonald's Centrality Index (Mc), or root mean squared error of approximation (RMSEA), various combinations of cutoff values from selected ranges of cutoff criteria for the ML‐based SRMR and a given supplemental fit index were used to calculate rejection rates for various types of true‐population and misspecified models; that is, models with misspecified factor covariance(s) and models with misspecified factor loading(s). The results suggest that, for the ML method, a cutoff value close to .95 for TLI, BL89, CFI, RNI, and Gamma Hat; a cutoff value close to .90 for Mc; a cutoff value close to .08 for SRMR; and a cutoff value close to .06 for RMSEA are needed before we can conclude that there is a relatively good fit between the hypothesized model and the observed data. Furthermore, the 2‐index presentation strategy is required to reject reasonable proportions of various types of true‐population and misspecified models. Finally, using the proposed cutoff criteria, the ML‐based TLI, Mc, and RMSEA tend to overreject true‐population models at small sample size and thus are less preferable when sample size is small.
SUMMARY: The Sequence Alignment/Map (SAM) format is a generic alignment format for storing read alignments against reference sequences, supporting short and long reads (up to 128 Mbp) produced by different sequencing platforms. It is flexible in style, compact in size, efficient in random access and is the format in which alignments from the 1000 Genomes Project are released. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments. AVAILABILITY: http://samtools.sourceforge.net.
SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.
Convex optimization problems arise frequently in many different fields. This book provides a comprehensive introduction to the subject, and shows in detail how such problems can be solved numerically with great efficiency. The book begins with the basic elements of convex sets and functions, and then describes various classes of convex optimization problems. Duality and approximation techniques are then covered, as are statistical estimation techniques. Various geometrical problems are then presented, and there is detailed discussion of unconstrained and constrained minimization problems, and interior-point methods. The focus of the book is on recognizing convex optimization problems and then finding the most appropriate technique for solving them. It contains many worked examples and homework exercises and will appeal to students, researchers and practitioners in fields such as engineering, computer science, mathematics, statistics, finance and economics.
BACKGROUND: Correlation networks are increasingly being used in bioinformatics applications. For example, weighted gene co-expression network analysis is a systems biology method for describing the correlation patterns among genes across microarray samples. Weighted correlation network analysis (WGCNA) can be used for finding clusters (modules) of highly correlated genes, for summarizing such clusters using the module eigengene or an intramodular hub gene, for relating modules to one another and to external sample traits (using eigengene network methodology), and for calculating module membership measures. Correlation networks facilitate network based gene screening methods that can be used to identify candidate biomarkers or therapeutic targets. These methods have been successfully applied in various biological contexts, e.g. cancer, mouse genetics, yeast genetics, and analysis of brain imaging data. While parts of the correlation network methodology have been described in separate publications, there is a need to provide a user-friendly, comprehensive, and consistent software implementation and an accompanying tutorial. RESULTS: The WGCNA R software package is a comprehensive collection of R functions for performing various aspects of weighted correlation network analysis. The package includes functions for network construction, module detection, gene selection, calculations of topological properties, data simulation, visualization, and interfacing with external software. Along with the R package we also present R software tutorials. While the methods development was motivated by gene expression data, the underlying data mining approach can be applied to a variety of different settings. CONCLUSION: The WGCNA package provides R functions for weighted correlation network analysis, e.g. co-expression network analysis of gene expression data. The R package along with its source code and additional material are freely available at http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/Rpackages/WGCNA.
Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly. The introduction of new proposals and automatic optimization of tuning parameters has improved convergence for many problems. The new version also sports significantly faster likelihood calculations through streaming single-instruction-multiple-data extensions (SSE) and support of the BEAGLE library, allowing likelihood calculations to be delegated to graphics processing units (GPUs) on compatible hardware. Speedup factors range from around 2 with SSE code to more than 50 with BEAGLE for codon problems. Checkpointing across all models allows long runs to be completed even when an analysis is prematurely terminated. New models include relaxed clocks, dating, model averaging across time-reversible substitution models, and support for hard, negative, and partial (backbone) tree constraints. Inference of species trees from gene trees is supported by full incorporation of the Bayesian estimation of species trees (BEST) algorithms. Marginal model likelihoods for Bayes factor tests can be estimated accurately across the entire model space using the stepping stone method. The new version provides more output options than previously, including samples of ancestral states, site rates, site d(N)/d(S) rations, branch rates, and node dates. A wide range of statistics on tree parameters can also be output for visualization in FigTree and compatible software.
Normed and nonnormed fit indexes are frequently used as adjuncts to chi-square statistics for evaluating the fit of a structural model. A drawback of existing indexes is that they estimate no known population parameters. A new coefficient is proposed to summarize the relative reduction in the noncentrality parameters of two nested models. Two estimators of the coefficient yield new normed (CFI) and nonnormed (FI) fit indexes. CFI avoids the underestimation of fit often noted in small samples for Bentler and Bonett's (1980) normed fit index (NFI). FI is a linear function of Bentler and Bonett's non-normed fit index (NNFI) that avoids the extreme underestimation and overestimation often found in NNFI. Asymptotically, CFI, FI, NFI, and a new index developed by Bollen are equivalent measures of comparative fit, whereas NNFI measures relative fit by comparing noncentrality per degree of freedom. All of the indexes are generalized to permit use of Wald and Lagrange multiplier statistics. An example illustrates the behavior of these indexes under conditions of correct specification and misspecification. The new fit indexes perform very well at all sample sizes.
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Results for the final phase of the 1000 Genomes Project are presented including whole-genome sequencing, targeted exome sequencing, and genotyping on high-density SNP arrays for 2,504 individuals across 26 populations, providing a global reference data set to support biomedical genetics. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.
Probabilistic Reasoning in Intelligent Systems is a complete and accessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty. The author provides a coherent explication of probability as a language for reasoning with partial belief and offers a unifying perspective on other AI approaches to uncertainty, such as the Dempster-Shafer formalism, truth maintenance systems, and nonmonotonic logic. The author distinguishes syntactic and semantic approaches to uncertainty--and offers techniques, based on belief networks, that provid
This paper considers the model problem of reconstructing an object from incomplete frequency samples. Consider a discrete-time signal f/spl isin/C/sup N/ and a randomly chosen set of frequencies /spl Omega/. Is it possible to reconstruct f from the partial knowledge of its Fourier coefficients on the set /spl Omega/? A typical result of this paper is as follows. Suppose that f is a superposition of |T| spikes f(t)=/spl sigma//sub /spl tau//spl isin/T/f(/spl tau/)/spl delta/(t-/spl tau/) obeying |T|/spl les/C/sub M//spl middot/(log N)/sup -1/ /spl middot/ |/spl Omega/| for some constant C/sub M/>0. We do not know the locations of the spikes nor their amplitudes. Then with probability at least 1-O(N/sup -M/), f can be reconstructed exactly as the solution to the /spl lscr//sub 1/ minimization problem. In short, exact recovery may be obtained by solving a convex optimization problem. We give numerical values for C/sub M/ which depend on the desired probability of success. Our result may be interpreted as a novel kind of nonlinear sampling theorem. In effect, it says that any signal made out of |T| spikes may be recovered by convex programming from almost every set of frequencies of size O(|T|/spl middot/logN). Moreover, this is nearly optimal in the sense that any method succeeding with probability 1-O(N/sup -M/) would in general require a number of frequency samples at least proportional to |T|/spl middot/logN. The methodology extends to a variety of other situations and higher dimensions. For example, we show how one can reconstruct a piecewise constant (one- or two-dimensional) object from incomplete frequency samples - provided that the number of jumps (discontinuities) obeys the condition above - by minimizing other convex functionals such as the total variation of f.
The increasing interest in energy storage for the grid can be attributed to multiple factors, including the capital costs of managing peak demands, the investments needed for grid reliability, and the integration of renewable energy sources. Although existing energy storage is dominated by pumped hydroelectric, there is the recognition that battery systems can offer a number of high-value opportunities, provided that lower costs can be obtained. The battery systems reviewed here include sodium-sulfur batteries that are commercially available for grid applications, redox-flow batteries that offer low cost, and lithium-ion batteries whose development for commercial electronics and electric vehicles is being applied to grid storage.
Studies of the human microbiome have revealed that even healthy individuals differ remarkably in the microbes that occupy habitats such as the gut, skin and vagina. Much of this diversity remains unexplained, although diet, environment, host genetics and early microbial exposure have all been implicated. Accordingly, to characterize the ecology of human-associated microbial communities, the Human Microbiome Project has analysed the largest cohort and set of distinct, clinically relevant body habitats so far. We found the diversity and abundance of each habitat’s signature microbes to vary widely even among healthy subjects, with strong niche specialization both within and among individuals. The project encountered an estimated 81–99% of the genera, enzyme families and community configurations occupied by the healthy Western microbiome. Metagenomic carriage of metabolic pathways was stable among individuals despite variation in community structure, and ethnic/racial background proved to be one of the strongest associations of both pathways and microbes with clinical metadata. These results thus delineate the range of structural and functional configurations normal in the microbial communities of a healthy population, enabling future characterization of the epidemiology, ecology and translational applications of the human microbiome. The Human Microbiome Project Consortium reports the first results of their analysis of microbial communities from distinct, clinically relevant body habitats in a human cohort; the insights into the microbial communities of a healthy population lay foundations for future exploration of the epidemiology, ecology and translational applications of the human microbiome. The Human Microbiome Project (HMP), supported by the National Institutes of Health Common Fund, has the goal of characterizing the microbial communities that inhabit and interact with the human body in sickness and in health. In two Articles in this issue of Nature, the HMP Consortium presents the first population-scale details of the organismal and functional composition of the microbiota across five areas of the body. An associated News & Views discusses the initial results — which, along with those of a series of co-publications, already constitute the most extensive catalogue of organisms and genes related to the human microbiome yet published — and highlights some of the major questions that the project will tackle in the next few years.
The HER-2/neu oncogene is a member of the erbB-like oncogene family, and is related to, but distinct from, the epidermal growth factor receptor. This gene has been shown to be amplified in human breast cancer cell lines. In the current study, alterations of the gene in 189 primary human breast cancers were investigated. HER-2/neu was found to be amplified from 2- to greater than 20-fold in 30% of the tumors. Correlation of gene amplification with several disease parameters was evaluated. Amplification of the HER-2/neu gene was a significant predictor of both overall survival and time to relapse in patients with breast cancer. It retained its significance even when adjustments were made for other known prognostic factors. Moreover, HER-2/neu amplification had greater prognostic value than most currently used prognostic factors, including hormonal-receptor status, in lymph node-positive disease. These data indicate that this gene may play a role in the biologic behavior and/or pathogenesis of human breast cancer.
Abstract: SciPy is an open-source scientific computing library for the Python programming language. Since its initial release in 2001, SciPy has become a de facto standard for leveraging scientific algorithms in Python, with over 600 unique code contributors, thousands of dependent packages, over 100,000 dependent repositories and millions of downloads per year. In this work, we provide an overview of the capabilities and development practices of SciPy 1.0 and highlight some recent technical developments.
BACKGROUND: The HER2 gene, which encodes the growth factor receptor HER2, is amplified and HER2 is overexpressed in 25 to 30 percent of breast cancers, increasing the aggressiveness of the tumor. METHODS: We evaluated the efficacy and safety of trastuzumab, a recombinant monoclonal antibody against HER2, in women with metastatic breast cancer that overexpressed HER2. We randomly assigned 234 patients to receive standard chemotherapy alone and 235 patients to receive standard chemotherapy plus trastuzumab. Patients who had not previously received adjuvant (postoperative) therapy with an anthracycline were treated with doxorubicin (or epirubicin in the case of 36 women) and cyclophosphamide alone (138 women) or with trastuzumab (143 women). Patients who had previously received adjuvant anthracycline were treated with paclitaxel alone (96 women) or paclitaxel with trastuzumab (92 women). RESULTS: The addition of trastuzumab to chemotherapy was associated with a longer time to disease progression (median, 7.4 vs. 4.6 months; P<0.001), a higher rate of objective response (50 percent vs. 32 percent, P<0.001), a longer duration of response (median, 9.1 vs. 6.1 months; P<0.001), a lower rate of death at 1 year (22 percent vs. 33 percent, P=0.008), longer survival (median survival, 25.1 vs. 20.3 months; P=0.01), and a 20 percent reduction in the risk of death. The most important adverse event was cardiac dysfunction of New York Heart Association class III or IV, which occurred in 27 percent of the group given an anthracycline, cyclophosphamide, and trastuzumab; 8 percent of the group given an anthracycline and cyclophosphamide alone; 13 percent of the group given paclitaxel and trastuzumab; and 1 percent of the group given paclitaxel alone. Although the cardiotoxicity was potentially severe and, in some cases, life-threatening, the symptoms generally improved with standard medical management. CONCLUSIONS: Trastuzumab increases the clinical benefit of first-line chemotherapy in metastatic breast cancer that overexpresses HER2.
Bayesian inference of phylogeny using Markov chain Monte Carlo (MCMC) plays a central role in understanding evolutionary history from molecular sequence data. Visualizing and analyzing the MCMC-generated samples from the posterior distribution is a key step in any non-trivial Bayesian inference. We present the software package Tracer (version 1.7) for visualizing and analyzing the MCMC trace files generated through Bayesian phylogenetic inference. Tracer provides kernel density estimation, multivariate visualization, demographic trajectory reconstruction, conditional posterior distribution summary, and more. Tracer is open-source and available at http://beast.community/tracer.
The last decade has seen a sharp increase in the number of scientific publications describing physiological and pathological functions of extracellular vesicles (EVs), a collective term covering various subtypes of cell-released, membranous structures, called exosomes, microvesicles, microparticles, ectosomes, oncosomes, apoptotic bodies, and many other names. However, specific issues arise when working with these entities, whose size and amount often make them difficult to obtain as relatively pure preparations, and to characterize properly. The International Society for Extracellular Vesicles (ISEV) proposed Minimal Information for Studies of Extracellular Vesicles ("MISEV") guidelines for the field in 2014. We now update these "MISEV2014" guidelines based on evolution of the collective knowledge in the last four years. An important point to consider is that ascribing a specific function to EVs in general, or to subtypes of EVs, requires reporting of specific information beyond mere description of function in a crude, potentially contaminated, and heterogeneous preparation. For example, claims that exosomes are endowed with exquisite and specific activities remain difficult to support experimentally, given our still limited knowledge of their specific molecular machineries of biogenesis and release, as compared with other biophysically similar EVs. The MISEV2018 guidelines include tables and outlines of suggested protocols and steps to follow to document specific EV-associated functional activities. Finally, a checklist is provided with summaries of key points.
Written by one of the preeminent researchers in the field, this book provides a comprehensive exposition of modern analysis of causation. It shows how causality has grown from a nebulous concept into a mathematical theory with significant applications in the fields of statistics, artificial intelligence, economics, philosophy, cognitive science, and the health and social sciences. Judea Pearl presents and unifies the probabilistic, manipulative, counterfactual, and structural approaches to causation and devises simple mathematical tools for studying the relationships between causal connections and statistical associations. Cited in more than 2,100 scientific publications, it continues to liberate scientists from the traditional molds of statistical thinking. In this revised edition, Judea Pearl elucidates thorny issues, answers readers' questions, and offers a panoramic view of recent advances in this field of research. Causality will be of interest to students and professionals in a wide variety of fields. Dr Judea Pearl has received the 2011 Rumelhart Prize for his leading research in Artificial Intelligence (AI) and systems from The Cognitive Science Society.
BACKGROUND: Bevacizumab, a monoclonal antibody against vascular endothelial growth factor, has shown promising preclinical and clinical activity against metastatic colorectal cancer, particularly in combination with chemotherapy. METHODS: Of 813 patients with previously untreated metastatic colorectal cancer, we randomly assigned 402 to receive irinotecan, bolus fluorouracil, and leucovorin (IFL) plus bevacizumab (5 mg per kilogram of body weight every two weeks) and 411 to receive IFL plus placebo. The primary end point was overall survival. Secondary end points were progression-free survival, the response rate, the duration of the response, safety, and the quality of life. RESULTS: The median duration of survival was 20.3 months in the group given IFL plus bevacizumab, as compared with 15.6 months in the group given IFL plus placebo, corresponding to a hazard ratio for death of 0.66 (P<0.001). The median duration of progression-free survival was 10.6 months in the group given IFL plus bevacizumab, as compared with 6.2 months in the group given IFL plus placebo (hazard ratio for disease progression, 0.54; P<0.001); the corresponding rates of response were 44.8 percent and 34.8 percent (P=0.004). The median duration of the response was 10.4 months in the group given IFL plus bevacizumab, as compared with 7.1 months in the group given IFL plus placebo (hazard ratio for progression, 0.62; P=0.001). Grade 3 hypertension was more common during treatment with IFL plus bevacizumab than with IFL plus placebo (11.0 percent vs. 2.3 percent) but was easily managed. CONCLUSIONS: The addition of bevacizumab to fluorouracil-based combination chemotherapy results in statistically significant and clinically meaningful improvement in survival among patients with metastatic colorectal cancer.
Abstract Mixture modeling is a widely applied data analysis technique used to identify unobserved heterogeneity in a population. Despite mixture models' usefulness in practice, one unresolved issue in the application of mixture models is that there is not one commonly accepted statistical indicator for deciding on the number of classes in a study population. This article presents the results of a simulation study that examines the performance of likelihood-based tests and the traditionally used Information Criterion (ICs) used for determining the number of classes in mixture modeling. We look at the performance of these tests and indexes for 3 types of mixture models: latent class analysis (LCA), a factor mixture model (FMA), and a growth mixture models (GMM). We evaluate the ability of the tests and indexes to correctly identify the number of classes at three different sample sizes (n = 200, 500, 1,000). Whereas the Bayesian Information Criterion performed the best of the ICs, the bootstrap likelihood ratio test proved to be a very consistent indicator of classes across all of the models considered. ACKNOWLEDGMENTS Karen L. Nylund's research was supported by Grant R01 DA11796 from the National Institute on Drug Abuse (NIDA) and Bengt O. Muthén's research was supported by Grant K02 AA 00230 from the National Institute on Alcohol Abuse and Alcoholism (NIAAA). We thank Mplus for software support, Jacob Cheadle for programming expertise, and Katherine Masyn for helpful comments. Notes 1In general, the within-class covariance structure can be freed to allow within-class item covariance. a Item probabilities for categorical LCA models are specified by the probability in each cell, and the class means for the continuous LCA are specified by the value in parentheses. 2The number random starts for LCA models with categorical outcomes was specified to be "starts = 70 7;" in Mplus. The models with continuous outcomes had differing numbers of random starts. 3It is important to note that when coverage is studied, the random starts option of Mplus should not be used. If it is used, label switching may occur, in that a class for one replication might be represented by another class for another replication, therefore distorting the estimate. 4The models that presented convergence problems were those that were badly misspecified. For example, for the GMM (true k = 3 class model) for n = 500, the convergence rates for the three-, four-, and five-class models were 100%, 87%, and 68%, respectively.