University of California, Santa Cruz
UniversitySanta Cruz, United States
Research output, citation impact, and the most-cited recent papers from University of California, Santa Cruz (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from University of California, Santa Cruz
This article examines the adequacy of the “rules of thumb” conventional cutoff criteria and several new alternatives for various fit indexes used to evaluate model fit in practice. Using a 2‐index presentation strategy, which includes using the maximum likelihood (ML)‐based standardized root mean squared residual (SRMR) and supplementing it with either Tucker‐Lewis Index (TLI), Bollen's (1989) Fit Index (BL89), Relative Noncentrality Index (RNI), Comparative Fit Index (CFI), Gamma Hat, McDonald's Centrality Index (Mc), or root mean squared error of approximation (RMSEA), various combinations of cutoff values from selected ranges of cutoff criteria for the ML‐based SRMR and a given supplemental fit index were used to calculate rejection rates for various types of true‐population and misspecified models; that is, models with misspecified factor covariance(s) and models with misspecified factor loading(s). The results suggest that, for the ML method, a cutoff value close to .95 for TLI, BL89, CFI, RNI, and Gamma Hat; a cutoff value close to .90 for Mc; a cutoff value close to .08 for SRMR; and a cutoff value close to .06 for RMSEA are needed before we can conclude that there is a relatively good fit between the hypothesized model and the observed data. Furthermore, the 2‐index presentation strategy is required to reject reasonable proportions of various types of true‐population and misspecified models. Finally, using the proposed cutoff criteria, the ML‐based TLI, Mc, and RMSEA tend to overreject true‐population models at small sample size and thus are less preferable when sample size is small.
Coot is a molecular-graphics application for model building and validation of biological macromolecules. The program displays electron-density maps and atomic models and allows model manipulations such as idealization, real-space refinement, manual rotation/translation, rigid-body fitting, ligand search, solvation, mutations, rotamers and Ramachandran idealization. Furthermore, tools are provided for model validation as well as interfaces to external programs for refinement, validation and graphics. The software is designed to be easy to learn for novice users, which is achieved by ensuring that tools for common tasks are 'discoverable' through familiar user-interface elements (menus and toolbars) or by intuitive behaviour (mouse controls). Recent developments have focused on providing tools for expert users, with customisable key bindings, extensions and an extensive scripting interface. The software is under rapid development, but has already achieved very widespread use within the crystallographic community. The current state of the software is presented, with a description of the facilities available and of some of the underlying methods employed.
The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.
The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies. Results for the final phase of the 1000 Genomes Project are presented including whole-genome sequencing, targeted exome sequencing, and genotyping on high-density SNP arrays for 2,504 individuals across 26 populations, providing a global reference data set to support biomedical genetics. The 1000 Genomes Project has sought to comprehensively catalogue human genetic variation across populations, providing a valuable public genomic resource. The data obtained so far have found applications ranging from association studies and fine mapping studies to the filtering of likely neutral variants in rare-disease cohorts. The authors now report on the final phase of the project, phase 3, which covers previously uncharacterized areas of human genetic diversity in terms of the populations sampled and categories of characterized variation. The sample now includes more than 2,500 individuals from 26 global populations, with low coverage whole-genome and deep exome sequencing, as well as dense microarray genotyping. They find that while most common variants are shared across populations, rarer variants are often restricted to closely related populations. The authors also demonstrate the use of the phase 3 dataset as a reference panel for imputation to improve the resolution in genetic association studies.
Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.
As vertebrate genome sequences near completion and research refocuses to their analysis, the issue of effective genome annotation display becomes critical. A mature web tool for rapid and reliable display of any requested portion of the genome at any scale, together with several dozen aligned annotation tracks, is provided at http://genome.ucsc.edu. This browser displays assembly contigs and gaps, mRNA and expressed sequence tag alignments, multiple gene predictions, cross-species homologies, single nucleotide polymorphisms, sequence-tagged sites, radiation hybrid data, transposon repeats, and more as a stack of coregistered tracks. Text and sequence-based searches provide quick and precise access to any region of specific interest. Secondary links from individual features lead to sequence details and supplementary off-site databases. One-half of the annotation tracks are computed at the University of California, Santa Cruz from publicly available sequence data; collaborators worldwide provide the rest. Users can stably add their own custom tracks to the browser for educational or research purposes. The conceptual and technical framework of the browser, its underlying MYSQL database, and overall use are described. The web site currently serves over 50,000 pages per day to over 3000 different users.
Tremendous amount of RNA sequencing data have been produced by large consortium projects such as TCGA and GTEx, creating new opportunities for data mining and deeper understanding of gene functions. While certain existing web servers are valuable and widely used, many expression analysis functions needed by experimental biologists are still not adequately addressed by these tools. We introduce GEPIA (Gene Expression Profiling Interactive Analysis), a web-based tool to deliver fast and customizable functionalities based on TCGA and GTEx data. GEPIA provides key interactive and customizable functions including differential expression analysis, profiling plotting, correlation analysis, patient survival analysis, similar gene detection and dimensionality reduction analysis. The comprehensive expression analyses with simple clicking through GEPIA greatly facilitate data mining in wide research areas, scientific discussion and the therapeutic discovery process. GEPIA fills in the gap between cancer genomics big data and the delivery of integrated information to end users, thus helping unleash the value of the current data resources. GEPIA is available at http://gepia.cancer-pku.cn/.
Since 65 million years ago (Ma), Earth's climate has undergone a significant and complex evolution, the finer details of which are now coming to light through investigations of deep-sea sediment cores. This evolution includes gradual trends of warming and cooling driven by tectonic processes on time scales of 10(5) to 10(7) years, rhythmic or periodic cycles driven by orbital processes with 10(4)- to 10(6)-year cyclicity, and rare rapid aberrant shifts and extreme climate transients with durations of 10(3) to 10(5) years. Here, recent progress in defining the evolution of global climate over the Cenozoic Era is reviewed. We focus primarily on the periodic and anomalous components of variability over the early portion of this era, as constrained by the latest generation of deep-sea isotope records. We also consider how this improved perspective has led to the recognition of previously unforeseen mechanisms for altering climate.
The IntCal09 and Marine09 radiocarbon calibration curves have been revised utilizing newly available and updated data sets from 14 C measurements on tree rings, plant macrofossils, speleothems, corals, and foraminifera. The calibration curves were derived from the data using the random walk model (RWM) used to generate IntCal09 and Marine09, which has been revised to account for additional uncertainties and error structures. The new curves were ratified at the 21st International Radiocarbon conference in July 2012 and are available as Supplemental Material at www.radiocarbon.org. The database can be accessed at http://intcal.qub.ac.uk/intcal13/.
Current clinical practice is organized according to tissue or organ of origin of tumors. Now, The Cancer Genome Atlas (TCGA) Research Network has started to identify genomic and other molecular commonalities among a dozen different types of cancer. Emerging similarities and contrasts will form the basis for targeted therapies of the future and for repurposing existing therapies by molecular rather than histological similarities of the diseases. The Cancer Genome Atlas (TCGA) Research Network has profiled and analyzed large numbers of human tumors to discover molecular aberrations at the DNA, RNA, protein and epigenetic levels. The resulting rich data provide a major opportunity to develop an integrated picture of commonalities, differences and emergent themes across tumor lineages. The Pan-Cancer initiative compares the first 12 tumor types profiled by TCGA. Analysis of the molecular aberrations and their functional roles across tumor types will teach us how to extend therapies effective in one cancer type to others with a similar genomic profile.
The present article presents a meta-analytic test of intergroup contact theory. With 713 independent samples from 515 studies, the meta-analysis finds that intergroup contact typically reduces intergroup prejudice. Multiple tests indicate that this finding appears not to result from either participant selection or publication biases, and the more rigorous studies yield larger mean effects. These contact effects typically generalize to the entire outgroup, and they emerge across a broad range of outgroup targets and contact settings. Similar patterns also emerge for samples with racial or ethnic targets and samples with other targets. This result suggests that contact theory, devised originally for racial and ethnic encounters, can be extended to other groups. A global indicator of Allport's optimal contact conditions demonstrates that contact under these conditions typically leads to even greater reduction in prejudice. Closer examination demonstrates that these conditions are best conceptualized as an interrelated bundle rather than as independent factors. Further, the meta-analytic findings indicate that these conditions are not essential for prejudice reduction. Hence, future work should focus on negative factors that prevent intergroup contact from diminishing prejudice as well as the development of a more comprehensive theory of intergroup contact.
The purpose of this article is to advance a new understanding of gender as a routine accomplishment embedded in everyday interaction. To do so entails a critical assessment of existing perspectives on sex and gender and the introduction of important distinctions among sex, sex category, and gender. We argue that recognition of the analytical independence of these concepts is essential for understanding the interactional work involved in being a gendered person in society. The thrust of our remarks is toward theoretical reconceptualization, but we consider fruitful directions for empirical research that are indicated by our formulation.
Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments. A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences. BLAT's speed stems from an index of all nonoverlapping K-mers in the genome. This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly. BLAT has several major stages. It uses the index to find regions in the genome likely to be homologous to the query sequence. It performs an alignment between homologous regions. It stitches together these aligned regions (often exons) into larger alignments (typically genes). Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible. This paper describes how BLAT was optimized. Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches. BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications. http://genome.ucsc.edu hosts a web-based BLAT server for the human genome.
By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations. This report from the 1000 Genomes Project describes the genomes of 1,092 individuals from 14 human populations, providing a resource for common and low-frequency variant analysis in individuals from diverse populations; hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites, can be found in each individual. This report by the 1000 Genomes Project describes the genomes of 1,092 individuals from 14 human populations, providing a resource for common and low-frequency variant analysis in individuals from diverse populations. Integrative analyses reveal profiles of rare and common variants in different populations. The frequencies of rare variants vary across biological pathways, and hundreds of rare, non-coding variants at conserved sites — such as changes disrupting transcription-factor motifs — can be established for each individual.
A catalogue of molecular aberrations that cause ovarian cancer is critical for developing and deploying therapies that will improve patients’ lives. The Cancer Genome Atlas project has analysed messenger RNA expression, microRNA expression, promoter methylation and DNA copy number in 489 high-grade serous ovarian adenocarcinomas and the DNA sequences of exons from coding genes in 316 of these tumours. Here we report that high-grade serous ovarian cancer is characterized by TP53 mutations in almost all tumours (96%); low prevalence but statistically recurrent somatic mutations in nine further genes including NF1, BRCA1, BRCA2, RB1 and CDK12; 113 significant focal DNA copy number aberrations; and promoter methylation events involving 168 genes. Analyses delineated four ovarian cancer transcriptional subtypes, three microRNA subtypes, four promoter methylation subtypes and a transcriptional signature associated with survival duration, and shed new light on the impact that tumours with BRCA1/2 (BRCA1 or BRCA2) and CCNE1 aberrations have on survival. Pathway analyses suggested that homologous recombination is defective in about half of the tumours analysed, and that NOTCH and FOXM1 signalling are involved in serous ovarian cancer pathophysiology. The Cancer Genome Atlas (TCGA) project reports here its analysis of messenger RNA and microRNA expression, promoter methylation, DNA copy number and exome sequences in 489 high-grade serous ovarian adenocarcinomas. The analyses help establish new tumour subtypes. Among other insights is the finding that while the gene encoding p53 tumour suppressor is mutated in almost all tumours, nine other loci including NF1, BRCA1, BRCA2, RB1 and CDK12 carry recurrent albeit low-prevalence mutations. Homologous recombination is defective in about half of the tumours studied, and Notch and FOXM1 signalling are involved in the pathophysiology.
The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother–father–child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10−8 per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research. This issue of Nature contains the first publication from The 1000 Genomes Project, an international collaboration that will produce an extensive public catalogue of human genetic variation. The plan, in fact, is to sequence about 2,000 unidentified individuals from 20 populations around the world. This first paper presents the results from the project's pilot phase, testing three different strategies for genome-wide sequencing with high-throughput platforms: low-coverage whole-genome sequencing of 179 individuals in three population groups, high-coverage sequencing of two mother–father–child trios, and exon-targeted sequencing of 697 individuals from seven populations. The goal of the 1000 Genomes Project is to provide in-depth information on variation in human genome sequences. In the pilot phase reported here, different strategies for genome-wide sequencing, using high-throughput sequencing platforms, were developed and compared. The resulting data set includes more than 95% of the currently accessible variants found in any individual, and can be used to inform association and functional studies.
ABSTRACT Radiocarbon ( 14 C) ages cannot provide absolutely dated chronologies for archaeological or paleoenvironmental studies directly but must be converted to calendar age equivalents using a calibration curve compensating for fluctuations in atmospheric 14 C concentration. Although calibration curves are constructed from independently dated archives, they invariably require revision as new data become available and our understanding of the Earth system improves. In this volume the international 14 C calibration curves for both the Northern and Southern Hemispheres, as well as for the ocean surface layer, have been updated to include a wealth of new data and extended to 55,000 cal BP. Based on tree rings, IntCal20 now extends as a fully atmospheric record to ca. 13,900 cal BP. For the older part of the timescale, IntCal20 comprises statistically integrated evidence from floating tree-ring chronologies, lacustrine and marine sediments, speleothems, and corals. We utilized improved evaluation of the timescales and location variable 14 C offsets from the atmosphere (reservoir age, dead carbon fraction) for each dataset. New statistical methods have refined the structure of the calibration curves while maintaining a robust treatment of uncertainties in the 14 C ages, the calendar ages and other corrections. The inclusion of modeled marine reservoir ages derived from a three-dimensional ocean circulation model has allowed us to apply more appropriate reservoir corrections to the marine 14 C data rather than the previous use of constant regional offsets from the atmosphere. Here we provide an overview of the new and revised datasets and the associated methods used for the construction of the IntCal20 curve and explore potential regional offsets for tree-ring data. We discuss the main differences with respect to the previous calibration curve, IntCal13, and some of the implications for archaeology and geosciences ranging from the recent past to the time of the extinction of the Neanderthals.
Abstract The Astropy Project supports and fosters the development of open-source and openly developed Python packages that provide commonly needed functionality to the astronomical community. A key element of the Astropy Project is the core package astropy , which serves as the foundation for more specialized projects and packages. In this article, we provide an overview of the organization of the Astropy project and summarize key features in the core package, as of the recent major release, version 2.0. We then describe the project infrastructure designed to facilitate and support development for a broader ecosystem of interoperable packages. We conclude with a future outlook of planned new features and directions for the broader Astropy Project.
The Review summarizes much of particle physics and cosmology. Using data from previous editions, plus 2,873 new measurements from 758 papers, we list, evaluate, and average measured properties of gauge bosons and the recently discovered Higgs boson, leptons, quarks, mesons, and baryons. We summarize searches for hypothetical particles such as supersymmetric particles, heavy bosons, axions, dark photons, etc. Particle properties and search limits are listed in Summary Tables. We give numerous tables, figures, formulae, and reviews of topics such as Higgs Boson Physics, Supersymmetry, Grand Unified Theories, Neutrino Mixing, Dark Energy, Dark Matter, Cosmology, Particle Detectors, Colliders, Probability and Statistics. Among the 118 reviews are many that are new or heavily revised, including a new review on Neutrinos in Cosmology.Starting with this edition, the Review is divided into two volumes. Volume 1 includes the Summary Tables and all review articles. Volume 2 consists of the Particle Listings. Review articles that were previously part of the Listings are now included in volume 1.The complete Review (both volumes) is published online on the website of the Particle Data Group (http://pdg.lbl.gov) and in a journal. Volume 1 is available in print as the PDG Book. A Particle Physics Booklet with the Summary Tables and essential tables, figures, and equations from selected review articles is also available.The 2018 edition of the Review of Particle Physics should be cited as: M. Tanabashi et al. (Particle Data Group), Phys. Rev. D 98, 030001 (2018).
The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.