With rare exceptions, three assumptions have prevented the annotation of predominantly small proteins in eukaryotes.
As a result of these conventions, thousands of unannotated proteins have eluded detection; they are not present in protein sequence databases, and no antibodies have been developed. They have remained invisible. Recently, advances in ribosome profiling and proteogenomic approaches with customized protein sequence databases helped identify novel proteins and peptides. However, a freely accessible web-based searchable platform for the democratization of this unannotated proteome is missing. OpenProt addresses this issue. Importantly, OpenProt focuses on currently unannotated ORFs and proteins with a minimum cutoff size of 30 codons and amino acids, respectively, but with no restrictions regarding the maximum length.
All ORFs with a minimum length of 30 codons detected in the transcriptome (RefSeq, Ensembl) of several organisms are annotated. These previously unannotated ORFs are provisionally termed alternative ORFs or altORFs to discriminate them from currently annotated CDSs. AltORFs may be localized in untranslated regions of mRNAs (5’ and 3’UTRs), may overlap annotated CDSs in a frameshifted reading frame, or may be present in RNAs annotated as non-coding (e.g. long non-coding RNAs and pseudogene RNAs). AltORF translation products are termed alternative proteins or altProts.
OpenProt has been developed in a gene- and transcript-centric manner with an integrated genome browser (for H. sapiens only in the first version). Search functionalities allow queries with reference to altORF annotations, evidence of expression based on the re-processing of large-scale MS-based proteomics and ribosome profiling data, and prediction of protein domains.
|Ensembl||https://useast.ensembl.org/index.html||Zerbino et al., 2018|
|NCBI Refseq||https://www.ncbi.nlm.nih.gov/refseq/||O’Leary et al., 2016|
|UniProtKB||https://www.uniprot.org/||Bateman et al., 2017|
The OpenProt project was initiated in 2015 with funding from the Canada Research Chairs program and compute Canada/Québec resource allocations to X. Roucou. It involves the joint efforts of the Roucou lab (Dr Mylène Brunelle, Dr Marie Brunet, Dr Sondos Samandi, Dr Vivian Delcourt and Jean-David Aguilar) and the Bioinformatics service from the Université de Sherbrooke (Jean-François Lucier, Maxime Lévesque). Dr Michelle Scott, Dr François-Michel Boisvert, Dr Darel Hunting, Université de Sherbrooke, and Dr Christian Landry, Université Laval are acknowledged for their contributions.
The best place to ask question is on our Discussion Forum