Protein Information Resource
nonprofitWashington, United States
Research output, citation impact, and the most-cited recent papers from Protein Information Resource. Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Protein Information Resource
The PIRSF protein classification system (http://pir.georgetown.edu/pirsf/) reflects evolutionary relationships of full-length proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). PIRSF families are curated systematically based on literature review and integrative sequence and functional analysis, including sequence and structure similarity, domain architecture, functional association, genome context, and phyletic pattern. The results of classification and expert annotation are summarized in PIRSF family reports with graphical viewers for taxonomic distribution, domain architecture, family hierarchy, and multiple alignment and phylogenetic tree. The PIRSF system provides a comprehensive resource for bioinformatics analysis and comparative studies of protein function and evolution. Domain or fold-based searches allow identification of evolutionarily related protein families sharing domains or structural folds. Functional convergence and functional divergence are revealed by the relationships between protein classification and curated family functions. The taxonomic distribution allows the identification of lineage-specific or broadly conserved protein families and can reveal horizontal gene transfer. Here we demonstrate, with illustrative examples, how to use the web-based PIRSF system as a tool for functional and evolutionary studies of protein families.
The PIRSF protein classification system ( http://pir.georgetown.edu/pirsf/ ) reflects evolutionary relationships of full-length proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). PIRSF families are curated systematically based on literature review and integrative sequence and functional analysis, including sequence and structure similarity, domain architecture, functional association, genome context, and phyletic pattern. The results of classification and expert annotation are summarized in PIRSF family reports with graphical viewers for taxonomic distribution, domain architecture, family hierarchy, and multiple alignment and phylogenetic tree. The PIRSF system provides a comprehensive resource for bioinformatics analysis and comparative studies of protein function and evolution. Domain or fold-based searches allow identification of evolutionarily related protein families sharing domains or structural folds. Functional convergence and functional divergence are revealed by the relationships between protein classification and curated family functions. The taxonomic distribution allows the identification of lineage-specific or broadly conserved protein families and can reveal horizontal gene transfer. Here we demonstrate, with illustrative examples, how to use the web-based PIRSF system as a tool for functional and evolutionary studies of protein families.
Understanding the association of genetic variation with its functional consequences in proteins is essential for the interpretation of genomic data and identifying causal variants in diseases. Integration of protein function knowledge with genome annotation can assist in rapidly comprehending genetic variation within complex biological processes. Here, we describe mapping UniProtKB human sequences and positional annotations, such as active sites, binding sites, and variants to the human genome (GRCh38) and the release of a public genome track hub for genome browsers. To demonstrate the power of combining protein annotations with genome annotations for functional interpretation of variants, we present specific biological examples in disease-related genes and proteins. Computational comparisons of UniProtKB annotations and protein variants with ClinVar clinically annotated single nucleotide polymorphism (SNP) data show that 32% of UniProtKB variants colocate with 8% of ClinVar SNPs. The majority of colocated UniProtKB disease-associated variants (86%) map to 'pathogenic' ClinVar SNPs. UniProt and ClinVar are collaborating to provide a unified clinical variant annotation for genomic, protein, and clinical researchers. The genome track hubs, and related UniProtKB files, are downloadable from the UniProt FTP site and discoverable as public track hubs at the UCSC and Ensembl genome browsers.
The RESID Database is a comprehensive collection of annotations and structures for protein post-translational modifications including N-terminal, C-terminal and peptide chain cross-link modifications. The RESID Database includes systematic and frequently observed alternate names, Chemical Abstracts Service registry numbers, atomic formulas and weights, enzyme activities, taxonomic range, keywords, literature citations with database cross-references, structural diagrams and molecular models. The NRL-3D Sequence-Structure Database is derived from the three-dimensional structure of proteins deposited with the Research Collaboratory for Structural Bioinformatics Protein Data Bank. The NRL-3D Database includes standardized and frequently observed alternate names, sources, keywords, literature citations, experimental conditions and searchable sequences from model coordinates. These databases are freely accessible through the National Cancer Institute-Frederick Advanced Biomedical Computing Center at these web sites: http://www. ncifcrf.gov/RESID, http://www.ncifcrf.gov/NRL-3D; or at these National Biomedical Research Foundation Protein Information Resource web sites: http://pir.georgetown.edu/pirwww/dbinfo/resid .html, http://pir.georgetown.edu/pirwww/dbinfo/nrl3d .html
The RESID Database contains supplemental information on post-translational modifications for the standardized annotations appearing in the PIR-International Protein Sequence Database. The RESID Database includes: systematic and frequently observed alternate names, Chemical s Service registry numbers, atomic formulas and weights, enzyme activities, indicators for N-terminal, C-terminal or peptide chain cross-link modifications, keywords, literature citations with database cross-references, structural diagrams and molecular models. Since 1995 updates of the RESID Database have appeared as often as weekly, and full releases appear quarterly. The database is freely accessible through the PIR Web site http://pir.georgetown.edu/pirwww/dbinfo/resid.html and by FTP.
Because the number of post-translational modifications requiring standardized annotation in the PIR-International Protein Sequence Database was large and steadily increasing, a database of protein structure modifications was constructed in 1993 to assist in producing appropriate feature annotations for covalent binding sites, modified sites and cross-links. In 1995 RESID was publicly released as a PIR-International text database distributed on CD-ROM and accessible through the ATLAS program. In 1998 it was made available on the PIR Web site at http://www-nbrf.georgetown.edu/pir/searchdb++ +.html . The RESID Database includes such information as: systematic and frequently observed alternate names; Chemical s Service registry numbers; atomic formulas and weights; enzyme activities; indicators forN-terminal, C-terminal or peptide chain cross-link modifications; keywords; and literature citations with database cross-references. The RESID Database can be used to predict atomic masses for peptides, and is being enhanced to provide molecular structures for graphical presentation on the PIR Web site using widely available molecular viewing programs.
The ever-increasing scientific literature and the exponential growth of large-scale molecular data have prompted active research in biological text mining to facilitate literature-based curation of molecular databases. Meanwhile, systems biology and bio-ontologies are emerging as critical tools in biological research where complex data in disparate resources are generated, integrated and analyzed. Both rely on literature for data annotation and analysis. The challenges facing us are to develop broadly utilized text mining tools and systems, and to bring together developer and user communities for system development and evaluation. We describe a framework for linking text mining tools with ontology and systems biology, extending from a previously developed text mining resource, iProLINK. We focus on molecular and ontological resources, including genes/proteins, protein-protein interaction (PPI), and Protein Ontology. The framework consists of two major components: a user interface for text mining of PPI from an integrated tool server and software modules to allow text mining outputs to be created, ranked, and used by the community. Use cases are presented for assessing the gaps and making recommendations for future development.
Abstract The Protein Ontology (PRO) is designed as a formal and well-principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from the classification of proteins, on the basis of evolutionary relationships at the homeomorphic level, to the representation of the multiple protein forms of a gene, such as those resulting from alternative splicing, cleavage and/or post-translational modifications. As an ontology, PRO differs from a database in that it provides description about the protein types and their relationships. In this way PRO can be integrated with or cross-referenced by other ontologies and/or databases. The representation of specific protein entities in PRO allows precise definition of objects in pathways, complexes, or in disease modeling. This is useful for proteomics studies where isoforms and modified forms must be differentiated and for biological pathway/network representation where the cascade of events often depends on a specific protein modification. The PRO framework is designed to allow the community to curate any protein entities of interest and will provide a stable unique identifier to any protein type. PRO is manually curated starting with content derived from various data sources coupling with scientific literature. Only annotation with experimental evidence is included, and is in the form of relationship to other ontologies (such as Gene Ontology, Sequence Ontology, and PSI-MOD). We have developed a web-based curation editor for PRO community annotation. In the tutorial, we will first give a brief introduction to the ontology and its relevance to the research communities - OBO ontologies, MOD, pathway and other databases, and any resources that need references/links to protein types. We will show the components of the PRO entry report, and how to search the ontology. Then, we will walk through an example where we will teach the basic curation steps: accessing the web editor, entering the protein to be defined with source attribution, and adding functional annotation. We will provide the necessary tools and documentation so that the user will be able to start curating the protein types of interest. PRO URL: "http://pir.georgetown.edu/pro/":http://pir.georgetown.edu/pro/
The Protein Ontology (PRO) is designed as a formal and well-principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from the classification of proteins, on the basis of evolutionary relationships at the homeomorphic level, to the representation of the multiple protein forms of a gene, such as those resulting from alternative splicing, cleavage and/or post-translational modifications. As an ontology, PRO differs from a database in that it provides description about the protein types and their relationships. In this way PRO can be integrated with or cross-referenced by other ontologies and/or databases. The representation of specific protein entities in PRO allows precise definition of objects in pathways, complexes, or in disease modeling. This is useful for proteomics studies where isoforms and modified forms must be differentiated and for biological pathway/network representation where the cascade of events often depends on a specific protein modification. The PRO framework is designed to allow the community to curate any protein entities of interest and will provide a stable unique identifier to any protein type. PRO is manually curated starting with content derived from various data sources coupling with scientific literature. Only annotation with experimental evidence is included, and is in the form of relationship to other ontologies (such as Gene Ontology, Sequence Ontology, and PSI-MOD). We have developed a web-based curation editor for PRO community annotation. In the tutorial, we will first give a brief introduction to the ontology and its relevance to the research communities - OBO ontologies, MOD, pathway and other databases, and any resources that need references/links to protein types. We will show the components of the PRO entry report, and how to search the ontology. Then, we will walk through an example where we will teach the basic curation steps: accessing the web editor, entering the protein to be defined with source attribution, and adding functional annotation. We will provide the necessary tools and documentation so that the user will be able to start curating the protein types of interest. PRO URL: "http://pir.georgetown.edu/pro/":http://pir.georgetown.edu/pro/
Abstract Understanding the association of genetic variation with its functional consequences in proteins is essential for the interpretation of genomic data and identifying causal variants in diseases. Integration of protein function knowledge with genome annotation can assist in rapidly comprehending genetic variation within complex biological processes. Here, we describe mapping UniProtKB human sequences and positional annotations such as active sites, binding sites, and variants to the human genome (GRCh38) and the release of a public genome track hub for genome browsers. To demonstrate the power of combining protein annotations with genome annotations for functional interpretation of variants, we present specific biological examples in disease-related genes and proteins. Computational comparisons of UniProtKB annotations and protein variants with ClinVar clinically annotated SNP data show that 32% of UniProtKB variants co-locate with 8% of ClinVar SNPs. The majority of co-located UniProtKB disease-associated variants (86%) map to ‘pathogenic’ ClinVar SNPs. UniProt and ClinVar are collaborating to provide a unified clinical variant annotation for genomic, protein and clinical researchers. The genome track hubs, and related UniProtKB files, are downloadable from the UniProt FTP site and discoverable as public track hubs at the UCSC and Ensembl genome browsers.