Need help, use our Discussion Forum


Brunet MA, Brunelle M, Lucier JF, Delcourt V, Levesque M, Grenier F, Samandi S, Leblanc S, Aguilar JD, Dufour P, Jacques JF, Fournier I, Ouangraoua A, Scott MS, Boisvert FM, Roucou X. OpenProt: a more comprehensive guide to explore eukaryotic coding potential and proteomes. Nucleic Acids Res. 2019 Jan 8;47(D1):D403-D410. doi: 10.1093/nar/gky936. PubMed PMID: 30299502 ; PubMed Central PMCID: PMC6323990.

The Project

With rare exceptions, three assumptions have prevented the annotation of predominantly small proteins in eukaryotes.

  • The minimum size for a functional open reading frame (ORF), also termed protein-coding sequence (CDS) is defined as 100 codons. A gene or an RNA transcript with fewer than 100 codons is annotated as non-coding.
  • Within a protein-coding gene, only the longest ORF is annotated as the CDS. Thus, by definition, protein-coding genes carry a single canonical or consensus CDS.
  • Pseudogenes do not code for any proteins regardless of the lengths of their ORFs.

As a result of these conventions, thousands of unannotated proteins have eluded detection; they are not present in protein sequence databases, and no antibodies have been developed. They have remained invisible. Recently, advances in ribosome profiling and proteogenomic approaches with customized protein sequence databases helped identify novel proteins and peptides. However, a freely accessible web-based searchable platform for the democratization of this unannotated proteome is missing. OpenProt addresses this issue. Importantly, OpenProt focuses on currently unannotated ORFs and proteins with a minimum cutoff size of 30 codons and amino acids, respectively, but with no restrictions regarding the maximum length.


All ORFs with a minimum length of 30 codons detected in the transcriptome (RefSeq, Ensembl) of several organisms are annotated. These previously unannotated ORFs are provisionally termed alternative ORFs or altORFs to discriminate them from currently annotated CDSs. AltORFs may be localized in untranslated regions of mRNAs (5’ and 3’UTRs), may overlap annotated CDSs in a frameshifted reading frame, or may be present in RNAs annotated as non-coding (e.g. long non-coding RNAs and pseudogene RNAs). AltORF translation products are termed alternative proteins or altProts.


OpenProt has been developed in a gene- and transcript-centric manner with an integrated genome browser (for H. sapiens only in the first version). Search functionalities allow queries with reference to altORF annotations, evidence of expression based on the re-processing of large-scale MS-based proteomics and ribosome profiling data, and prediction of protein domains.

Source URL Citation
Ensembl Zerbino et al., 2018
NCBI Refseq O’Leary et al., 2016
UniProtKB Bateman et al., 2017

Funding and people

The OpenProt project was initiated in 2015 with funding from the Canada Research Chairs program and compute Canada/Québec resource allocations to X. Roucou. It involves the joint efforts of the Roucou lab (Dr Mylène Brunelle, Dr Marie Brunet, Dr Sondos Samandi, Dr Vivian Delcourt and Jean-David Aguilar) and the Bioinformatics service from the Université de Sherbrooke (Jean-François Lucier, Maxime Lévesque). Dr Michelle Scott, Dr François-Michel Boisvert, Dr Darel Hunting, Université de Sherbrooke, and Dr Christian Landry, Université Laval are acknowledged for their contributions.

Contact Us

The best place to ask question is on our Discussion Forum