Janelia Research Campus

facilityAshburn, United States

Research output, citation impact, and the most-cited recent papers from Janelia Research Campus (United States). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works

8.5K

Citations

1.9M

h-index

621

i10-index

9.2K

Also known as

Janelia Research Campus

Top-cited papers from Janelia Research Campus

Imaging Intracellular Fluorescent Proteins at Nanometer Resolution

Eric Betzig, George H. Patterson, Rachid Sougrat, O. Wolf Lindwasser +4 more

2006· Science8.9Kdoi:10.1126/science.1127344

We introduce a method for optically imaging intracellular proteins at nanometer spatial resolution. Numerous sparse subsets of photoactivatable fluorescent protein molecules were activated, localized (to approximately 2 to 25 nanometers), and then bleached. The aggregate position information from all subsets was then assembled into a superresolution image. We used this method--termed photoactivated localization microscopy--to image specific target proteins in thin sections of lysosomes and mitochondria; in fixed whole cells, we imaged vinculin at focal adhesions, actin within a lamellipodium, and the distribution of the retroviral protein Gag at the plasma membrane.

Accelerated Profile HMM Searches

Sean R. Eddy

2011· PLoS Computational Biology7.4Kdoi:10.1371/journal.pcbi.1002195

Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the "multiple segment Viterbi" (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call "sparse rescaling". These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches.

Pfam: the protein families database

ROBERT FINN, Alex Bateman, Jody Clements, Penelope Coggill +4 more

2013· Nucleic Acids Research6.5Kdoi:10.1093/nar/gkt1223

Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.

HMMER web server: interactive sequence similarity searching

ROBERT FINN, Jody Clements, Sean R. Eddy

2011· Nucleic Acids Research6.4Kdoi:10.1093/nar/gkr367

HMMER is a software suite for protein sequence similarity searches using probabilistic methods. Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use. Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions. It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web. A HMMER web server (http://hmmer.janelia.org) has been designed and implemented such that most protein database searches return within a few seconds. Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam. The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once. All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted workflows. We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the number of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them.

An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea

Daniel McDonald, Morgan N. Price, Julia K. Goodrich, Eric P. Nawrocki +4 more

2011· The ISME Journal5.3Kdoi:10.1038/ismej.2011.139

Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a 'taxonomy to tree' approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408,315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/.

Infernal 1.1: 100-fold faster RNA homology searches

Eric P. Nawrocki, Sean R. Eddy

2013· Bioinformatics3.9Kdoi:10.1093/bioinformatics/btt509

SUMMARY: Infernal builds probabilistic profiles of the sequence and secondary structure of an RNA family called covariance models (CMs) from structurally annotated multiple sequence alignments given as input. Infernal uses CMs to search for new family members in sequence databases and to create potentially large multiple sequence alignments. Version 1.1 of Infernal introduces a new filter pipeline for RNA homology search based on accelerated profile hidden Markov model (HMM) methods and HMM-banded CM alignment methods. This enables ∼100-fold acceleration over the previous version and ∼10 000-fold acceleration over exhaustive non-filtered CM searches. AVAILABILITY: Source code, documentation and the benchmark are downloadable from http://infernal.janelia.org. Infernal is freely licensed under the GNU GPLv3 and should be portable to any POSIX-compliant operating system, including Linux and Mac OS/X. Documentation includes a user's guide with a tutorial, a discussion of file formats and user options and additional details on methods implemented in the software. CONTACT: nawrockie@janelia.hhmi.org

Predicting the Functional Effect of Amino Acid Substitutions and Indels

Yongwook Choi, Gregory E. Sims, Sean V. Murphy, Jason Miller +1 more

2012· PLoS ONE3.0Kdoi:10.1371/journal.pone.0046688

As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.

The Pfam protein families database

ROBERT FINN, Jaina Mistry, John Tate, Penny Coggill +4 more

2009· Nucleic Acids Research2.7Kdoi:10.1093/nar/gkp985

Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

Trainable Weka Segmentation: a machine learning tool for microscopy pixel classification

Ignacio Arganda‐Carreras, Verena Kaynig, Curtis Rueden, Kevin W. Eliceiri +3 more

2017· Bioinformatics2.5Kdoi:10.1093/bioinformatics/btx180

SUMMARY: State-of-the-art light and electron microscopes are capable of acquiring large image datasets, but quantitatively evaluating the data often involves manually annotating structures of interest. This process is time-consuming and often a major bottleneck in the evaluation pipeline. To overcome this problem, we have introduced the Trainable Weka Segmentation (TWS), a machine learning tool that leverages a limited number of manual annotations in order to train a classifier and segment the remaining data automatically. In addition, TWS can provide unsupervised segmentation learning schemes (clustering) and can be customized to employ user-designed image features or classifiers. AVAILABILITY AND IMPLEMENTATION: TWS is distributed as open-source software as part of the Fiji image processing distribution of ImageJ at http://imagej.net/Trainable_Weka_Segmentation . CONTACT: ignacio.arganda@ehu.eus. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Opportunities and obstacles for deep learning in biology and medicine

Travers Ching, Daniel Himmelstein, Brett K. Beaulieu‐Jones, Alexandr A. Kalinin +4 more

2018· Journal of The Royal Society Interface2.2Kdoi:10.1098/rsif.2017.0387

Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.

Lattice light-sheet microscopy: Imaging molecules to embryos at high spatiotemporal resolution

Bi‐Chang Chen, Wesley R. Legant, Kai Wang, Lin Shao +4 more

2014· Science2.0Kdoi:10.1126/science.1257998

Although fluorescence microscopy provides a crucial window into the physiology of living specimens, many biological processes are too fragile, are too small, or occur too rapidly to see clearly with existing tools. We crafted ultrathin light sheets from two-dimensional optical lattices that allowed us to image three-dimensional (3D) dynamics for hundreds of volumes, often at subsecond intervals, at the diffraction limit and beyond. We applied this to systems spanning four orders of magnitude in space and time, including the diffusion of single transcription factor molecules in stem cell spheroids, the dynamic instability of mitotic microtubules, the immunological synapse, neutrophil motility in a 3D matrix, and embryogenesis in Caenorhabditis elegans and Drosophila melanogaster. The results provide a visceral reminder of the beauty and the complexity of living systems.

Neuromorphic Silicon Neuron Circuits

Giacomo Indiveri, B. Linares-Barranco, Tara Julia Hamilton, André van Schaik +4 more

2011· Frontiers in Neuroscience1.8Kdoi:10.3389/fnins.2011.00073

Hardware implementations of spiking neurons can be extremely useful for a large variety of applications, ranging from high-speed modeling of large-scale neural systems to real-time behaving systems, to bidirectional brain-machine interfaces. The specific circuit solutions used to implement silicon neurons depend on the application requirements. In this paper we describe the most common building blocks and techniques used to implement these circuits, and present an overview of a wide range of neuromorphic silicon neurons, which implement different computational models, ranging from biophysically realistic and conductance-based Hodgkin-Huxley models to bi-dimensional generalized adaptive integrate and fire models. We compare the different design methodologies used for each silicon neuron design described, and demonstrate their features with experimental results, measured from a wide range of fabricated VLSI chips.

Spontaneous behaviors drive multidimensional, brainwide activity

Carsen Stringer, Marius Pachitariu, Nicholas A. Steinmetz, Charu Bai Reddy +2 more

2019· Science1.7Kdoi:10.1126/science.aav7893

Neuron activity across the brain How is it that groups of neurons dispersed through the brain interact to generate complex behaviors? Three papers in this issue present brain-scale studies of neuronal activity and dynamics (see the Perspective by Huk and Hart). Allen et al. found that in thirsty mice, there is widespread neural activity related to stimuli that elicit licking and drinking. Individual neurons encoded task-specific responses, but every brain area contained neurons with different types of response. Optogenetic stimulation of thirst-sensing neurons in one area of the brain reinstated drinking and neuronal activity across the brain that previously signaled thirst. Gründemann et al. investigated the activity of mouse basal amygdala neurons in relation to behavior during different tasks. Two ensembles of neurons showed orthogonal activity during exploratory and nonexploratory behaviors, possibly reflecting different levels of anxiety experienced in these areas. Stringer et al. analyzed spontaneous neuronal firing, finding that neurons in the primary visual cortex encoded both visual information and motor activity related to facial movements. The variability of neuronal responses to visual stimuli in the primary visual area is mainly related to arousal and reflects the encoding of latent behavioral states. Science , this issue p. eaav3932 , p. eaav8736 , p. eaav7893 ; see also p. 236

A GAL4-Driver Line Resource for Drosophila Neurobiology

Arnim Jenett, Gerald M. Rubin, Teri-T B Ngo, David Shepherd +4 more

2012· Cell Reports1.6Kdoi:10.1016/j.celrep.2012.09.011

We established a collection of 7,000 transgenic lines of Drosophila melanogaster. Expression of GAL4 in each line is controlled by a different, defined fragment of genomic DNA that serves as a transcriptional enhancer. We used confocal microscopy of dissected nervous systems to determine the expression patterns driven by each fragment in the adult brain and ventral nerve cord. We present image data on 6,650 lines. Using both manual and machine-assisted annotation, we describe the expression patterns in the most useful lines. We illustrate the utility of these data for identifying novel neuronal cell types, revealing brain asymmetry, and describing the nature and extent of neuronal shape stereotypy. The GAL4 lines allow expression of exogenous genes in distinct, small subsets of the adult nervous system. The set of DNA fragments, each driving a documented expression pattern, will facilitate the generation of additional constructs for manipulating neuronal function.

Hidden Markov model speed heuristic and iterative HMM search procedure

L. Steven Johnson, Sean R. Eddy, Elon Portugaly

2010· BMC Bioinformatics1.5Kdoi:10.1186/1471-2105-11-431

BACKGROUND: Profile hidden Markov models (profile-HMMs) are sensitive tools for remote protein homology detection, but the main scoring algorithms, Viterbi or Forward, require considerable time to search large sequence databases. RESULTS: We have designed a series of database filtering steps, HMMERHEAD, that are applied prior to the scoring algorithms, as implemented in the HMMER package, in an effort to reduce search time. Using this heuristic, we obtain a 20-fold decrease in Forward and a 6-fold decrease in Viterbi search time with a minimal loss in sensitivity relative to the unfiltered approaches. We then implemented an iterative profile-HMM search method, JackHMMER, which employs the HMMERHEAD heuristic. Due to our search heuristic, we eliminated the subdatabase creation that is common in current iterative profile-HMM approaches. On our benchmark, JackHMMER detects 14% more remote protein homologs than SAM's iterative method T2K. CONCLUSIONS: Our search heuristic, HMMERHEAD, significantly reduces the time needed to score a profile-HMM against large sequence databases. This search heuristic allowed us to implement an iterative profile-HMM search method, JackHMMER, which detects significantly more remote protein homologs than SAM's T2K and NCBI's PSI-BLAST.

Infernal 1.0: inference of RNA alignments

Eric P. Nawrocki, Diana L. Kolbe, Sean R. Eddy

2009· Bioinformatics1.5Kdoi:10.1093/bioinformatics/btp157

SUMMARY: INFERNAL builds consensus RNA secondary structure profiles called covariance models (CMs), and uses them to search nucleic acid sequence databases for homologous RNAs, or to create new sequence- and structure-based multiple sequence alignments. AVAILABILITY: Source code, documentation and benchmark downloadable from http://infernal.janelia.org. INFERNAL is freely licensed under the GNU GPLv3 and should be portable to any POSIX-compliant operating system, including Linux and Mac OS/X.

Cellpose 2.0: how to train your own model

Marius Pachitariu, Carsen Stringer

2022· Nature Methods1.4Kdoi:10.1038/s41592-022-01663-4

Pretrained neural network models for biological segmentation can provide good out-of-the-box results for many image types. However, such models do not allow users to adapt the segmentation style to their specific needs and can perform suboptimally for test images that are very different from the training images. Here we introduce Cellpose 2.0, a new package that includes an ensemble of diverse pretrained models as well as a human-in-the-loop pipeline for rapid prototyping of new custom models. We show that models pretrained on the Cellpose dataset can be fine-tuned with only 500-1,000 user-annotated regions of interest (ROI) to perform nearly as well as models trained on entire datasets with up to 200,000 ROI. A human-in-the-loop approach further reduced the required user annotation to 100-200 ROI, while maintaining high-quality segmentations. We provide software tools such as an annotation graphical user interface, a model zoo and a human-in-the-loop pipeline to facilitate the adoption of Cellpose 2.0.

Refinement of Tools for Targeted Gene Expression in Drosophila

Barret D. Pfeiffer, Teri-T B Ngo, Karen L Hibbard, Christine Murphy +3 more

2010· Genetics1.3Kdoi:10.1534/genetics.110.119917

A wide variety of biological experiments rely on the ability to express an exogenous gene in a transgenic animal at a defined level and in a spatially and temporally controlled pattern. We describe major improvements of the methods available for achieving this objective in Drosophila melanogaster. We have systematically varied core promoters, UTRs, operator sequences, and transcriptional activating domains used to direct gene expression with the GAL4, LexA, and Split GAL4 transcription factors and the GAL80 transcriptional repressor. The use of site-specific integration allowed us to make quantitative comparisons between different constructs inserted at the same genomic location. We also characterized a set of PhiC31 integration sites for their ability to support transgene expression of both drivers and responders in the nervous system. The increased strength and reliability of these optimized reagents overcome many of the previous limitations of these methods and will facilitate genetic manipulations of greater complexity and sophistication.

Optimization of a GCaMP Calcium Indicator for Neural Activity Imaging

Jasper Akerboom, Tsai‐Wen Chen, Trevor J. Wardill, Lin Tian +4 more

2012· Journal of Neuroscience1.3Kdoi:10.1523/jneurosci.2601-12.2012

Genetically encoded calcium indicators (GECIs) are powerful tools for systems neuroscience. Recent efforts in protein engineering have significantly increased the performance of GECIs. The state-of-the art single-wavelength GECI, GCaMP3, has been deployed in a number of model organisms and can reliably detect three or more action potentials in short bursts in several systems in vivo. Through protein structure determination, targeted mutagenesis, high-throughput screening, and a battery of in vitro assays, we have increased the dynamic range of GCaMP3 by severalfold, creating a family of "GCaMP5" sensors. We tested GCaMP5s in several systems: cultured neurons and astrocytes, mouse retina, and in vivo in Caenorhabditis chemosensory neurons, Drosophila larval neuromuscular junction and adult antennal lobe, zebrafish retina and tectum, and mouse visual cortex. Signal-to-noise ratio was improved by at least 2- to 3-fold. In the visual cortex, two GCaMP5 variants detected twice as many visual stimulus-responsive cells as GCaMP3. By combining in vivo imaging with electrophysiology we show that GCaMP5 fluorescence provides a more reliable measure of neuronal activity than its predecessor GCaMP3. GCaMP5 allows more sensitive detection of neural activity in vivo and may find widespread applications for cellular imaging in general.

A connectome and analysis of the adult Drosophila central brain

Louis K. Scheffer, C. Shan Xu, Michał Januszewski, Zhiyuan Lu +4 more

2020· eLife1.2Kdoi:10.7554/elife.57443

The neural circuits responsible for animal behavior remain largely unknown. We summarize new methods and present the circuitry of a large fraction of the brain of the fruit fly Drosophila melanogaster . Improved methods include new procedures to prepare, image, align, segment, find synapses in, and proofread such large data sets. We define cell types, refine computational compartments, and provide an exhaustive atlas of cell examples and types, many of them novel. We provide detailed circuits consisting of neurons and their chemical synapses for most of the central brain. We make the data public and simplify access, reducing the effort needed to answer circuit questions, and provide procedures linking the neurons defined by our analysis with genetic reagents. Biologically, we examine distributions of connection strengths, neural motifs on different scales, electrical consequences of compartmentalization, and evidence that maximizing packing density is an important criterion in the evolution of the fly’s brain.

Search all NobleBlocks papers mentioning “Janelia Research Campus” →