NobleBlocks

Génomique Bioinformatique et Applications

facilityParis, France

Research output, citation impact, and the most-cited recent papers from Génomique Bioinformatique et Applications (France). Aggregated across the NobleBlocks index of 300M+ scholarly works.

Total works
836
Citations
60.2K
h-index
112
i10-index
597
Also known as
Génomique Bioinformatique et ApplicationsLaboratory Genomics, Bioinformatics, and Applications

Top-cited papers from Génomique Bioinformatique et Applications

ABGD, Automatic Barcode Gap Discovery for primary species delimitation
Nicolas Puillandre, Amaury Lambert, Sophie Brouillet, Guillaume Achaz
2011· Molecular Ecology3.3Kdoi:10.1111/j.1365-294x.2011.05239.x

Within uncharacterized groups, DNA barcodes, short DNA sequences that are present in a wide range of species, can be used to assign organisms into species. We propose an automatic procedure that sorts the sequences into hypothetical species based on the barcode gap, which can be observed whenever the divergence among organisms belonging to the same species is smaller than divergence among organisms from different species. We use a range of prior intraspecific divergence to infer from the data a model-based one-sided confidence limit for intraspecific divergence. The method, called Automatic Barcode Gap Discovery (ABGD), then detects the barcode gap as the first significant gap beyond this limit and uses it to partition the data. Inference of the limit and gap detection are then recursively applied to previously obtained groups to get finer partitions until there is no further partitioning. Using six published data sets of metazoans, we show that ABGD is computationally efficient and performs well for standard prior maximum intraspecific divergences (a few per cent of divergence for the five data sets), except for one data set where less than three sequences per species were sampled. We further explore the theoretical limitations of ABGD through simulation of explicit speciation and population genetics scenarios. Our results emphasize in particular the sensitivity of the method to the presence of recent speciation events, via (unrealistically) high rates of speciation or large numbers of species. In conclusion, ABGD is fast, simple method to split a sequence alignment data set into candidate species that should be complemented with other evidence in an integrative taxonomic approach.

PHYML Online--a web server for fast maximum likelihood-based phylogenetic inference
Stéphane Guindon, Françoise Lethiec, Patrice Duroux, Olivier Gascuel
2005· Nucleic Acids Research1.5Kdoi:10.1093/nar/gki352

PHYML Online is a web interface to PHYML, a software that implements a fast and accurate heuristic for estimating maximum likelihood phylogenies from DNA and protein sequences. This tool provides the user with a number of options, e.g. nonparametric bootstrap and estimation of various evolutionary parameters, in order to perform comprehensive phylogenetic analyses on large datasets in reasonable computing time. The server and its documentation are available at http://atgc.lirmm.fr/phyml.

A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis
Marie‐Agnès Dillies, Andréa Rau, Julie Aubert, Christelle Hennequet‐Antier +4 more
2012· Briefings in Bioinformatics1.4Kdoi:10.1093/bib/bbs046

During the last 3 years, a number of approaches for the normalization of RNA sequencing data have emerged in the literature, differing both in the type of bias adjustment and in the statistical strategy adopted. However, as data continue to accumulate, there has been no clear consensus on the appropriate normalization method to be used or the impact of a chosen method on the downstream analysis. In this work, we focus on a comprehensive comparison of seven recently proposed normalization methods for the differential analysis of RNA-seq data, with an emphasis on the use of varied real and simulated datasets involving different species and experimental designs to represent data characteristics commonly observed in practice. Based on this comparison study, we propose practical recommendations on the appropriate normalization method to be used and its impact on the differential analysis of RNA-seq data.

Emergence and clonal expansion of in vitro artemisinin-resistant Plasmodium falciparum kelch13 R561H mutant parasites in Rwanda
Aline Uwimana, Eric Legrand, Barbara H. Stokes, Jean-Louis Mangala Ndikumana +4 more
2020· Nature Medicine881doi:10.1038/s41591-020-1005-2

Abstract Artemisinin resistance (delayed P. falciparum clearance following artemisinin-based combination therapy), is widespread across Southeast Asia but to date has not been reported in Africa 1–4 . Here we genotyped the P. falciparum K13 ( Pfkelch13 ) propeller domain, mutations in which can mediate artemisinin resistance 5,6 , in pretreatment samples collected from recent dihydroarteminisin-piperaquine and artemether-lumefantrine efficacy trials in Rwanda 7 . While cure rates were >95% in both treatment arms, the Pfkelch13 R561H mutation was identified in 19 of 257 (7.4%) patients at Masaka. Phylogenetic analysis revealed the expansion of an indigenous R561H lineage. Gene editing confirmed that this mutation can drive artemisinin resistance in vitro. This study provides evidence for the de novo emergence of Pfkelch13 -mediated artemisinin resistance in Rwanda, potentially compromising the continued success of antimalarial chemotherapy in Africa.

A General Approach for Haplotype Phasing across the Full Spectrum of Relatedness
Jared O’Connell, Deepti Gurdasani, Olivier Delaneau, Nicola Pirastu +4 more
2014· PLoS Genetics676doi:10.1371/journal.pgen.1004234

Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally 'unrelated' individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics.

The Systemic Imprint of Growth and Its Uses in Ecological (Meta)Genomics
Sara Vieira‐Silva, Eduardo P. C. Rocha
2010· PLoS Genetics447doi:10.1371/journal.pgen.1000808

Microbial minimal generation times range from a few minutes to several weeks. They are evolutionarily determined by variables such as environment stability, nutrient availability, and community diversity. Selection for fast growth adaptively imprints genomes, resulting in gene amplification, adapted chromosomal organization, and biased codon usage. We found that these growth-related traits in 214 species of bacteria and archaea are highly correlated, suggesting they all result from growth optimization. While modeling their association with maximal growth rates in view of synthetic biology applications, we observed that codon usage biases are better correlates of growth rates than any other trait, including rRNA copy number. Systematic deviations to our model reveal two distinct evolutionary processes. First, genome organization shows more evolutionary inertia than growth rates. This results in over-representation of growth-related traits in fast degrading genomes. Second, selection for these traits depends on optimal growth temperature: for similar generation times purifying selection is stronger in psychrophiles, intermediate in mesophiles, and lower in thermophiles. Using this information, we created a predictor of maximal growth rate adapted to small genome fragments. We applied it to three metagenomic environmental samples to show that a transiently rich environment, as the human gut, selects for fast-growers, that a toxic environment, as the acid mine biofilm, selects for low growth rates, whereas a diverse environment, like the soil, shows all ranges of growth rates. We also demonstrate that microbial colonizers of babies gut grow faster than stabilized human adults gut communities. In conclusion, we show that one can predict maximal growth rates from sequence data alone, and we propose that such information can be used to facilitate the manipulation of generation times. Our predictor allows inferring growth rates in the vast majority of uncultivable prokaryotes and paves the way to the understanding of community dynamics from metagenomic data.

An Immunosurveillance Mechanism Controls Cancer Cell Ploidy
Laura Senovilla, Ilio Vitale, Isabelle Martins, Maximilien Tailler +4 more
2012· Science425doi:10.1126/science.1224922

Cancer cells accommodate multiple genetic and epigenetic alterations that initially activate intrinsic (cell-autonomous) and extrinsic (immune-mediated) oncosuppressive mechanisms. Only once these barriers to oncogenesis have been overcome can malignant growth proceed unrestrained. Tetraploidization can contribute to oncogenesis because hyperploid cells are genomically unstable. We report that hyperploid cancer cells become immunogenic because of a constitutive endoplasmic reticulum stress response resulting in the aberrant cell surface exposure of calreticulin. Hyperploid, calreticulin-exposing cancer cells readily proliferated in immunodeficient mice and conserved their increased DNA content. In contrast, hyperploid cells injected into immunocompetent mice generated tumors only after a delay, and such tumors exhibited reduced DNA content, endoplasmic reticulum stress, and calreticulin exposure. Our results unveil an immunosurveillance system that imposes immunoselection against hyperploidy in carcinogen- and oncogene-induced cancers.

A Path Following Algorithm for the Graph Matching Problem
Mikhail Zaslavskiy, Francis Bach, Jean‐Philippe Vert
2008· IEEE Transactions on Pattern Analysis and Machine Intelligence399doi:10.1109/tpami.2008.245

We propose a convex-concave programming approach for the labeled weighted graph matching problem. The convex-concave programming formulation is obtained by rewriting the weighted graph matching problem as a least-square problem on the set of permutation matrices and relaxing it to two different optimization problems: a quadratic convex and a quadratic concave optimization problem on the set of doubly stochastic matrices. The concave relaxation has the same global minimum as the initial graph matching problem, but the search for its global minimum is also a hard combinatorial problem. We, therefore, construct an approximation of the concave problem solution by following a solution path of a convex-concave problem obtained by linear interpolation of the convex and concave formulations, starting from the convex relaxation. This method allows to easily integrate the information on graph label similarities into the optimization problem, and therefore, perform labeled weighted graph matching. The algorithm is compared with some of the best performing graph matching methods on four data sets: simulated graphs, QAPLib, retina vessel images, and handwritten Chinese characters. In all cases, the results are competitive with the state of the art.

Genomic history of the seventh pandemic of cholera in Africa
François‐Xavier Weill, Daryl Domman, Elisabeth Njamkepo, Cheryl L. Tarr +4 more
2017· Science385doi:10.1126/science.aad5901

O1 isolates, across 45 African countries and over a 49-year period, to show that past epidemics were attributable to a single expanded lineage. This lineage was introduced at least 11 times since 1970, into two main regions, West Africa and East/Southern Africa, causing epidemics that lasted up to 28 years. The last five introductions into Africa, all from Asia, involved multidrug-resistant sublineages that replaced antibiotic-susceptible sublineages after 2000. This phylogenetic framework describes the periodicity of lineage introduction and the stable routes of cholera spread, which should inform the rational design of control measures for cholera in Africa.

Mathematics of Evolution and Phylogeny
Olivier Gascuel
2005351doi:10.1093/oso/9780198566106.001.0001

Abstract This book considers evolution at different scales: sequences, genes, gene families, organelles, genomes and species. The focus is on the mathematical and computational tools and concepts, which form an essential basis of evolutionary studies, indicate their limitations, and give them orientation. Recent years have witnessed rapid progress in the mathematics of evolution and phylogeny, with models and methods becoming more realistic, powerful, and complex. Aimed at graduates and researchers in phylogenetics, mathematicians, computer scientists and biologists, and including chapters by leading scientists: A. Bergeron, D. Bertrand, D. Bryant, R. Desper, O. Elemento, N. El-Mabrouk, N. Galtier, O. Gascuel, M. Hendy, S. Holmes, K. Huber, A. Meade, J. Mixtacki, B. Moret, E. Mossel, V. Moulton, M. Pagel, M.-A. Poursat, D. Sankoff, M. Steel, J. Stoye, J. Tang, L.-S. Wang, T. Warnow, Z. Yang, this book of contributed chapters explains the basis and covers the recent results in this highly topical area.

Comparative and Evolutionary Analysis of the Bacterial Homologous Recombination Systems
Eduardo P. C. Rocha, E Cornet, Bénédicte Michel
2005· PLoS Genetics341doi:10.1371/journal.pgen.0010015

Homologous recombination is a housekeeping process involved in the maintenance of chromosome integrity and generation of genetic variability. Although detailed biochemical studies have described the mechanism of action of its components in model organisms, there is no recent extensive assessment of this knowledge, using comparative genomics and taking advantage of available experimental data on recombination. Using comparative genomics, we assessed the diversity of recombination processes among bacteria, and simulations suggest that we missed very few homologs. The work included the identification of orthologs and the analysis of their evolutionary history and genomic context. Some genes, for proteins such as RecA, the resolvases, and RecR, were found to be nearly ubiquitous, suggesting that the large majority of bacterial genomes are capable of homologous recombination. Yet many genomes show incomplete sets of presynaptic systems, with RecFOR being more frequent than RecBCD/AddAB. There is a significant pattern of co-occurrence between these systems and antirecombinant proteins such as the ones of mismatch repair and SbcB, but no significant association with nonhomologous end joining, which seems rare in bacteria. Surprisingly, a large number of genomes in which homologous recombination has been reported lack many of the enzymes involved in the presynaptic systems. The lack of obvious correlation between the presence of characterized presynaptic genes and experimental data on the frequency of recombination suggests the existence of still-unknown presynaptic mechanisms in bacteria. It also indicates that, at the moment, the assessment of the intrinsic stability or recombination isolation of bacteria in most cases cannot be inferred from the identification of known recombination proteins in the genomes.

Identification of novel peptide hormones in the human proteome by hidden Markov model screening
Olivier Mirabeau, Emerald Perlas, Cinzia Severini, Enrica Audero +4 more
2007· Genome Research298doi:10.1101/gr.5755407

Peptide hormones are small, processed, and secreted peptides that signal via membrane receptors and play critical roles in normal and pathological physiology. The search for novel peptide hormones has been hampered by their small size, low or restricted expression, and lack of sequence similarity. To overcome these difficulties, we developed a bioinformatics search tool based on the hidden Markov model formalism that uses several peptide hormone sequence features to estimate the likelihood that a protein contains a processed and secreted peptide of this class. Application of this tool to an alignment of mammalian proteomes ranked 90% of known peptide hormones among the top 300 proteins. An analysis of the top scoring hypothetical and poorly annotated human proteins identified two novel candidate peptide hormones. Biochemical analysis of the two candidates, which we called spexin and augurin, showed that both were localized to secretory granules in a transfected pancreatic cell line and were recovered from the cell supernatant. Spexin was expressed in the submucosal layer of the mouse esophagus and stomach, and a predicted peptide from the spexin precursor induced muscle contraction in a rat stomach explant assay. Augurin was specifically expressed in mouse endocrine tissues, including pituitary and adrenal gland, choroid plexus, and the atrio-ventricular node of the heart. Our findings demonstrate the utility of a bioinformatics approach to identify novel biologically active peptides. Peptide hormones and their receptors are important diagnostic and therapeutic targets, and our results suggest that spexin and augurin are novel peptide hormones likely to be involved in physiological homeostasis.

Bayesian probabilistic approach for predicting backbone structures in terms of protein blocks
Alexandre G. de Brevern, C. Etchebest, S. Hazout
2000· Proteins Structure Function and Bioinformatics293doi:10.1002/1097-0134(20001115)41:3<271::aid-prot10>3.0.co;2-z

By using an unsupervised cluster analyzer, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive C(alpha) ("protein blocks"). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relation protein block-amino acid propensity is used for prediction and leads to a success rate close to 35%. Sharing sequence windows associated with certain blocks into "sequence families" improves the prediction accuracy by 6%. This prediction accuracy exceeds 75% when keeping the first four predicted protein blocks at each site of the protein. In addition, two different strategies are proposed: the first one defines the number of protein blocks in each site needed for respecting a user-fixed prediction accuracy, and alternatively, the second one defines the different protein sites to be predicted with a user-fixed number of blocks and a chosen accuracy. This last strategy applied to the ubiquitin conjugating enzyme (alpha/beta protein) shows that 91% of the sites may be predicted with a prediction accuracy larger than 77% considering only three blocks per site. The prediction strategies proposed improve our knowledge about sequence-structure dependence and should be very useful in ab initio protein modelling.

Clustered Multi-Task Learning: A Convex Formulation
Laurent Jacob, Francis Bach, Jean‐Philippe Vert
2008· arXiv (Cornell University)285doi:10.48550/arxiv.0809.2085

In multi-task learning several related tasks are considered simultaneously, with the hope that by an appropriate sharing of information across tasks, each task may benefit from the others. In the context of learning linear functions for supervised classification or regression, this can be achieved by including a priori information about the weight vectors associated with the tasks, and how they are expected to be related to each other. In this paper, we assume that tasks are clustered into groups, which are unknown beforehand, and that tasks within a group have similar weight vectors. We design a new spectral norm that encodes this a priori assumption, without the prior knowledge of the partition of tasks into groups, resulting in a new convex optimization formulation for multi-task learning. We show in simulations on synthetic examples and on the IEDB MHC-I binding dataset, that our approach outperforms well-known convex methods for multi-task learning, as well as related non convex methods dedicated to the same problem.

Cell Cycle Regulation of the Murine Cyclin E Gene Depends on an E2F Binding Site in the Promoter
Jürgen W. Botz, Karin Zerfaß-Thome, Dimitry Spitkovsky, Hajo Delius +4 more
1996· Molecular and Cellular Biology247doi:10.1128/mcb.16.7.3401

Cyclin E controls progression through the G1 phase of the cell cycle in mammalian fibroblasts and potentially in many other cell types. Cyclin E is a rate-limiting activator of cdk2 kinase in late G1. The abundance of cyclin E is controlled by phase-specific fluctuations in the mRNA level; in mammalian fibroblasts, mRNA is not detected under conditions of serum starvation and is accumulated upon serum stimulation, with expression starting in mid-G1. Here, we report the cloning of the murine cyclin E promoter. We isolated a 3.8-kb genomic fragment that contains several transcriptional start sites and confers cell cycle regulation on a luciferase reporter gene. This fragment also supports transcriptional activation by adenovirus E1A, a known upstream regulator of cyclin E gene expression. An E2F binding site which is required for G1-specific activation of the cyclin E promoter in synchronized NIH 3T3 cells was identified in this fragment.

Distinct evolution of SARS-CoV-2 Omicron XBB and BA.2.86/JN.1 lineages combining increased fitness and antibody evasion
Delphine Planas, Isabelle Staropoli, Vincent Michel, Frédéric Lemoine +4 more
2024· Nature Communications238doi:10.1038/s41467-024-46490-7

The unceasing circulation of SARS-CoV-2 leads to the continuous emergence of novel viral sublineages. Here, we isolate and characterize XBB.1, XBB.1.5, XBB.1.9.1, XBB.1.16.1, EG.5.1.1, EG.5.1.3, XBF, BA.2.86.1 and JN.1 variants, representing >80% of circulating variants in January 2024. The XBB subvariants carry few but recurrent mutations in the spike, whereas BA.2.86.1 and JN.1 harbor >30 additional changes. These variants replicate in IGROV-1 but no longer in Vero E6 and are not markedly fusogenic. They potently infect nasal epithelial cells, with EG.5.1.3 exhibiting the highest fitness. Antivirals remain active. Neutralizing antibody (NAb) responses from vaccinees and BA.1/BA.2-infected individuals are markedly lower compared to BA.1, without major differences between variants. An XBB breakthrough infection enhances NAb responses against both XBB and BA.2.86 variants. JN.1 displays lower affinity to ACE2 and higher immune evasion properties compared to BA.2.86.1. Thus, while distinct, the evolutionary trajectory of these variants combines increased fitness and antibody evasion.

Genomewide Association Study of an AIDS‐Nonprogression Cohort Emphasizes the Role Played by<i>HLA</i>Genes (ANRS Genomewide Association Study 02)
Sophie Limou, Sigrid Le Clerc, Cédric Coulonges, Wassila Carpentier +4 more
2008· The Journal of Infectious Diseases236doi:10.1086/596067

To elucidate the genetic factors predisposing to AIDS progression, we analyzed a unique cohort of 275 human immunodeficiency virus (HIV) type 1-seropositive nonprogressor patients in relation to a control group of 1352 seronegative individuals in a genomewide association study (GWAS). The strongest association was obtained for HCP5 rs2395029 (P=6.79x10(-10); odds ratio, 3.47) and was possibly linked to an effect of sex. Interestingly, this single-nucleotide polymorphism (SNP) was in high linkage disequilibrium with HLA-B, MICB, TNF, and several other HLA locus SNPs and haplotypes. A meta-analysis of our genomic data combined with data from the previously conducted Euro-CHAVI (Center for HIV/AIDS Vaccine Immunology) GWAS confirmed the HCP5 signal (P=3.02x10(-19)) and identified several new associations, all of them involving HLA genes: MICB, TNF, RDBP, BAT1-5, PSORS1C1, and HLA-C. Finally, stratification by HCP5 rs2395029 genotypes emphasized an independent role for ZNRD1, also in the HLA locus, and this finding was confirmed by experimental data. The present study, the first GWAS of HIV-1 nonprogressors, underscores the potential for some HLA genes to control disease progression soon after infection.

A Controlled Trial Comparing Ciprofloxacin With Mesalazine for The Treatment of Active Crohn's Disease
Jean‐Frédéric Colombel, Marc Lémann, M Cassagnou, Yoram Bouhnik +4 more
1999· The American Journal of Gastroenterology235doi:10.1111/j.1572-0241.1999.935_q.x

OBJECTIVE: The aim of this randomized controlled study was to investigate the efficacy of ciprofloxacin compared with mesalazine in treating active Crohn's disease. METHODS: Patients with a mild to moderate flare-up of Crohn's disease (mean Crohn's Disease Activity Index [CDAI]; 217; range, 160-305) were randomized to receive ciprofloxacin 1 g/day or Pentasa 4 g/day for 6 wk. Complete remission was defined at wk 6 as a CDAI < or = 150 associated with a decrease (delta) in CDAI > 75. Partial remission was defined as a CDAI < or = 150 with 50 < delta CDAI < 75 or a CDAI > 150 with delta CDAI > 50 at wk 6. Group sequential procedure with triangular continuation regions was used to monitor the trial through the difference in complete remission rates, every 20 patients included. RESULTS: Inclusion of patients was stopped at the second step, i.e., after 40 inclusions, with the conclusion of no difference in complete remission rates between ciprofloxacin- and Pentasa-treated groups. Among the 18 patients taking ciprofloxacin, two decided to stop treatment during the trial and three were considered as treatment failures because of deterioration at wk 3. Among the 22 patients taking mesalazine, one patient was lost to follow-up and eight patients were considered as treatment failures. Complete remission was observed in 10 patients (56%) treated with ciprofloxacin and 12 patients (55%) treated with mesalazine and partial remission was observed in three and one patient, respectively. CONCLUSIONS: This study suggests that ciprofloxacin 1 g/day is as effective as mesalazine 4 g/day in treating mild to moderate flare-up of Crohn's disease.

Sequencing of the smallest Apicomplexan genome from the human pathogen Babesia microti†
Emmanuel Cornillot, Kamel Hadj‐Kaddour, Amina Dassouli, Benjamin Noël +4 more
2012· Nucleic Acids Research216doi:10.1093/nar/gks700

We have sequenced the genome of the emerging human pathogen Babesia microti and compared it with that of other protozoa. B. microti has the smallest nuclear genome among all Apicomplexan parasites sequenced to date with three chromosomes encoding ∼3500 polypeptides, several of which are species specific. Genome-wide phylogenetic analyses indicate that B. microti is significantly distant from all species of Babesidae and Theileridae and defines a new clade in the phylum Apicomplexa. Furthermore, unlike all other Apicomplexa, its mitochondrial genome is circular. Genome-scale reconstruction of functional networks revealed that B. microti has the minimal metabolic requirement for intraerythrocytic protozoan parasitism. B. microti multigene families differ from those of other protozoa in both the copy number and organization. Two lateral transfer events with significant metabolic implications occurred during the evolution of this parasite. The genomic sequencing of B. microti identified several targets suitable for the development of diagnostic assays and novel therapies for human babesiosis.

A coarse‐grained protein force field for folding and structure prediction
Julien Maupetit, Pierre Tufféry, Philippe Derreumaux
2007· Proteins Structure Function and Bioinformatics216doi:10.1002/prot.21505

We have revisited the protein coarse-grained optimized potential for efficient structure prediction (OPEP). The training and validation sets consist of 13 and 16 protein targets. Because optimization depends on details of how the ensemble of decoys is sampled, trial conformations are generated by molecular dynamics, threading, greedy, and Monte Carlo simulations, or taken from publicly available databases. The OPEP parameters are varied by a genetic algorithm using a scoring function which requires that the native structure has the lowest energy, and the native-like structures have energy higher than the native structure but lower than the remote conformations. Overall, we find that OPEP correctly identifies 24 native or native-like states for 29 targets and has very similar capability to the all-atom discrete optimized protein energy model (DOPE), found recently to outperform five currently used energy models.