NobleBlocks

Protein Information Resource

nonprofitWashington, United States

Research output, citation impact, and the most-cited recent papers from Protein Information Resource. Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works
12
Citations
508
h-index
10
i10-index
10
Also known as
Protein Information Resource

Top-cited papers from Protein Information Resource

PIRSF family classification system for protein functional and evolutionary analysis.
A. N. NIKOL'SKAYA, Cecilia N. Arighi, Hongzhan Huang, Winona C. Barker +1 more
2007· PubMed73

The PIRSF protein classification system (http://pir.georgetown.edu/pirsf/) reflects evolutionary relationships of full-length proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). PIRSF families are curated systematically based on literature review and integrative sequence and functional analysis, including sequence and structure similarity, domain architecture, functional association, genome context, and phyletic pattern. The results of classification and expert annotation are summarized in PIRSF family reports with graphical viewers for taxonomic distribution, domain architecture, family hierarchy, and multiple alignment and phylogenetic tree. The PIRSF system provides a comprehensive resource for bioinformatics analysis and comparative studies of protein function and evolution. Domain or fold-based searches allow identification of evolutionarily related protein families sharing domains or structural folds. Functional convergence and functional divergence are revealed by the relationships between protein classification and curated family functions. The taxonomic distribution allows the identification of lineage-specific or broadly conserved protein families and can reveal horizontal gene transfer. Here we demonstrate, with illustrative examples, how to use the web-based PIRSF system as a tool for functional and evolutionary studies of protein families.

PIRSF Family Classification System for Protein Functional and Evolutionary Analysis
A. N. NIKOL'SKAYA, Cecilia N. Arighi, Hongzhan Huang, Winona C. Barker +1 more
2006· Evolutionary Bioinformatics65doi:10.1177/117693430600200033

The PIRSF protein classification system ( http://pir.georgetown.edu/pirsf/ ) reflects evolutionary relationships of full-length proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). PIRSF families are curated systematically based on literature review and integrative sequence and functional analysis, including sequence and structure similarity, domain architecture, functional association, genome context, and phyletic pattern. The results of classification and expert annotation are summarized in PIRSF family reports with graphical viewers for taxonomic distribution, domain architecture, family hierarchy, and multiple alignment and phylogenetic tree. The PIRSF system provides a comprehensive resource for bioinformatics analysis and comparative studies of protein function and evolution. Domain or fold-based searches allow identification of evolutionarily related protein families sharing domains or structural folds. Functional convergence and functional divergence are revealed by the relationships between protein classification and curated family functions. The taxonomic distribution allows the identification of lineage-specific or broadly conserved protein families and can reveal horizontal gene transfer. Here we demonstrate, with illustrative examples, how to use the web-based PIRSF system as a tool for functional and evolutionary studies of protein families.

UniProt genomic mapping for deciphering functional effects of missense variants
Peter B. McGarvey, Andrew Nightingale, Jie Luo, Hongzhan Huang +2 more
2019· Human Mutation56doi:10.1002/humu.23738

Understanding the association of genetic variation with its functional consequences in proteins is essential for the interpretation of genomic data and identifying causal variants in diseases. Integration of protein function knowledge with genome annotation can assist in rapidly comprehending genetic variation within complex biological processes. Here, we describe mapping UniProtKB human sequences and positional annotations, such as active sites, binding sites, and variants to the human genome (GRCh38) and the release of a public genome track hub for genome browsers. To demonstrate the power of combining protein annotations with genome annotations for functional interpretation of variants, we present specific biological examples in disease-related genes and proteins. Computational comparisons of UniProtKB annotations and protein variants with ClinVar clinically annotated single nucleotide polymorphism (SNP) data show that 32% of UniProtKB variants colocate with 8% of ClinVar SNPs. The majority of colocated UniProtKB disease-associated variants (86%) map to 'pathogenic' ClinVar SNPs. UniProt and ClinVar are collaborating to provide a unified clinical variant annotation for genomic, protein, and clinical researchers. The genome track hubs, and related UniProtKB files, are downloadable from the UniProt FTP site and discoverable as public track hubs at the UCSC and Ensembl genome browsers.

The RESID Database of protein structure modifications and the NRL-3D Sequence-Structure Database
John S. Garavelli
2001· Nucleic Acids Research24doi:10.1093/nar/29.1.199

The RESID Database is a comprehensive collection of annotations and structures for protein post-translational modifications including N-terminal, C-terminal and peptide chain cross-link modifications. The RESID Database includes systematic and frequently observed alternate names, Chemical Abstracts Service registry numbers, atomic formulas and weights, enzyme activities, taxonomic range, keywords, literature citations with database cross-references, structural diagrams and molecular models. The NRL-3D Sequence-Structure Database is derived from the three-dimensional structure of proteins deposited with the Research Collaboratory for Structural Bioinformatics Protein Data Bank. The NRL-3D Database includes standardized and frequently observed alternate names, sources, keywords, literature citations, experimental conditions and searchable sequences from model coordinates. These databases are freely accessible through the National Cancer Institute-Frederick Advanced Biomedical Computing Center at these web sites: http://www. ncifcrf.gov/RESID, http://www.ncifcrf.gov/NRL-3D; or at these National Biomedical Research Foundation Protein Information Resource web sites: http://pir.georgetown.edu/pirwww/dbinfo/resid .html, http://pir.georgetown.edu/pirwww/dbinfo/nrl3d .html

The RESID Database of protein structure modifications: 2000 update
John S. Garavelli
2000· Nucleic Acids Research13doi:10.1093/nar/28.1.209

The RESID Database contains supplemental information on post-translational modifications for the standardized annotations appearing in the PIR-International Protein Sequence Database. The RESID Database includes: systematic and frequently observed alternate names, Chemical s Service registry numbers, atomic formulas and weights, enzyme activities, indicators for N-terminal, C-terminal or peptide chain cross-link modifications, keywords, literature citations with database cross-references, structural diagrams and molecular models. Since 1995 updates of the RESID Database have appeared as often as weekly, and full releases appear quarterly. The database is freely accessible through the PIR Web site http://pir.georgetown.edu/pirwww/dbinfo/resid.html and by FTP.

The RESID Database of protein structure modifications
John S. Garavelli
1999· Nucleic Acids Research11doi:10.1093/nar/27.1.198

Because the number of post-translational modifications requiring standardized annotation in the PIR-International Protein Sequence Database was large and steadily increasing, a database of protein structure modifications was constructed in 1993 to assist in producing appropriate feature annotations for covalent binding sites, modified sites and cross-links. In 1995 RESID was publicly released as a PIR-International text database distributed on CD-ROM and accessible through the ATLAS program. In 1998 it was made available on the PIR Web site at http://www-nbrf.georgetown.edu/pir/searchdb++ +.html . The RESID Database includes such information as: systematic and frequently observed alternate names; Chemical s Service registry numbers; atomic formulas and weights; enzyme activities; indicators forN-terminal, C-terminal or peptide chain cross-link modifications; keywords; and literature citations with database cross-references. The RESID Database can be used to predict atomic masses for peptides, and is being enhanced to provide molecular structures for graphical presentation on the PIR Web site using widely available molecular viewing programs.

iProLINK: A Framework for Linking Text Mining with Ontology and Systems Biology
Zhang-Zhi Hu, Kevin Bretonnel Cohen, Lynette Hirschman, Alfonso Valencia +3 more
20083doi:10.1109/bibm.2008.73

The ever-increasing scientific literature and the exponential growth of large-scale molecular data have prompted active research in biological text mining to facilitate literature-based curation of molecular databases. Meanwhile, systems biology and bio-ontologies are emerging as critical tools in biological research where complex data in disparate resources are generated, integrated and analyzed. Both rely on literature for data annotation and analysis. The challenges facing us are to develop broadly utilized text mining tools and systems, and to bring together developer and user communities for system development and evaluation. We describe a framework for linking text mining tools with ontology and systems biology, extending from a previously developed text mining resource, iProLINK. We focus on molecular and ontological resources, including genes/proteins, protein-protein interaction (PPI), and Protein Ontology. The framework consists of two major components: a user interface for text mining of PPI from an integrated tool server and software modules to allow text mining outputs to be created, ranked, and used by the community. Use cases are presented for assessing the gaps and making recommendations for future development.

Protein Ontology and Community Curation
Cecilia N. Arighi
2009· Nature Precedingsdoi:10.1038/npre.2009.3169.1

Abstract The Protein Ontology (PRO) is designed as a formal and well-principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from the classification of proteins, on the basis of evolutionary relationships at the homeomorphic level, to the representation of the multiple protein forms of a gene, such as those resulting from alternative splicing, cleavage and/or post-translational modifications. As an ontology, PRO differs from a database in that it provides description about the protein types and their relationships. In this way PRO can be integrated with or cross-referenced by other ontologies and/or databases. The representation of specific protein entities in PRO allows precise definition of objects in pathways, complexes, or in disease modeling. This is useful for proteomics studies where isoforms and modified forms must be differentiated and for biological pathway/network representation where the cascade of events often depends on a specific protein modification. The PRO framework is designed to allow the community to curate any protein entities of interest and will provide a stable unique identifier to any protein type. PRO is manually curated starting with content derived from various data sources coupling with scientific literature. Only annotation with experimental evidence is included, and is in the form of relationship to other ontologies (such as Gene Ontology, Sequence Ontology, and PSI-MOD). We have developed a web-based curation editor for PRO community annotation. In the tutorial, we will first give a brief introduction to the ontology and its relevance to the research communities - OBO ontologies, MOD, pathway and other databases, and any resources that need references/links to protein types. We will show the components of the PRO entry report, and how to search the ontology. Then, we will walk through an example where we will teach the basic curation steps: accessing the web editor, entering the protein to be defined with source attribution, and adding functional annotation. We will provide the necessary tools and documentation so that the user will be able to start curating the protein types of interest. PRO URL: "http://pir.georgetown.edu/pro/":http://pir.georgetown.edu/pro/

Protein Ontology and Community Curation
Cecilia N. Arighi, Cecilia Arighi
2009· Nature Precedingsdoi:10.1038/npre.2009.3169

The Protein Ontology (PRO) is designed as a formal and well-principled Open Biomedical Ontologies (OBO) Foundry ontology for proteins. The components of PRO extend from the classification of proteins, on the basis of evolutionary relationships at the homeomorphic level, to the representation of the multiple protein forms of a gene, such as those resulting from alternative splicing, cleavage and/or post-translational modifications. As an ontology, PRO differs from a database in that it provides description about the protein types and their relationships. In this way PRO can be integrated with or cross-referenced by other ontologies and/or databases. The representation of specific protein entities in PRO allows precise definition of objects in pathways, complexes, or in disease modeling. This is useful for proteomics studies where isoforms and modified forms must be differentiated and for biological pathway/network representation where the cascade of events often depends on a specific protein modification. The PRO framework is designed to allow the community to curate any protein entities of interest and will provide a stable unique identifier to any protein type. PRO is manually curated starting with content derived from various data sources coupling with scientific literature. Only annotation with experimental evidence is included, and is in the form of relationship to other ontologies (such as Gene Ontology, Sequence Ontology, and PSI-MOD). We have developed a web-based curation editor for PRO community annotation. In the tutorial, we will first give a brief introduction to the ontology and its relevance to the research communities - OBO ontologies, MOD, pathway and other databases, and any resources that need references/links to protein types. We will show the components of the PRO entry report, and how to search the ontology. Then, we will walk through an example where we will teach the basic curation steps: accessing the web editor, entering the protein to be defined with source attribution, and adding functional annotation. We will provide the necessary tools and documentation so that the user will be able to start curating the protein types of interest. PRO URL: "http://pir.georgetown.edu/pro/":http://pir.georgetown.edu/pro/

UniProt Genomic Mapping for Deciphering Functional Effects of Missense Variants
Peter B. McGarvey, Andrew Nightingale, Jie Luo, Hongzhan Huang +2 more
2017· bioRxiv (Cold Spring Harbor Laboratory)doi:10.1101/192914

Abstract Understanding the association of genetic variation with its functional consequences in proteins is essential for the interpretation of genomic data and identifying causal variants in diseases. Integration of protein function knowledge with genome annotation can assist in rapidly comprehending genetic variation within complex biological processes. Here, we describe mapping UniProtKB human sequences and positional annotations such as active sites, binding sites, and variants to the human genome (GRCh38) and the release of a public genome track hub for genome browsers. To demonstrate the power of combining protein annotations with genome annotations for functional interpretation of variants, we present specific biological examples in disease-related genes and proteins. Computational comparisons of UniProtKB annotations and protein variants with ClinVar clinically annotated SNP data show that 32% of UniProtKB variants co-locate with 8% of ClinVar SNPs. The majority of co-located UniProtKB disease-associated variants (86%) map to ‘pathogenic’ ClinVar SNPs. UniProt and ClinVar are collaborating to provide a unified clinical variant annotation for genomic, protein and clinical researchers. The genome track hubs, and related UniProtKB files, are downloadable from the UniProt FTP site and discoverable as public track hubs at the UCSC and Ensembl genome browsers.