OpenCustomDB Documentation

OpenCustomDB in brief

What is OpenCustomDB?

OpenCustomDatabase is a proteogenomic database builder, based on RNA-seq data and OpenProt annotations that allows the discovery of non-canonical and protein variants in MS/MS experiments.

Why use OpenCustomDB?

Non-canonical proteins and proteins variants are not included in conventional proteins databases which makes their detection impossible. Those forgotten proteins are added in OpenCustomDatabase to allow the identification of more proteins, the confirmation of variant expression, and the discovery of proteins never considered before in addition to the identification of canonical proteins.

What type of protein can I expect to identify?

- Proteins currently annotated in databases, such as, Ensembl, Refseq.
- Alternative proteins (or altProts; IP_) which are proteins predicted by OpenProt located either in non-coding RNAs (e.g. long non-coding RNAs, pseudogene RNAs, etc...), in UTRs, or alternative reading frames overlapping a CDS in mRNAs.
- Novel isoforms ( II_) which are proteins predicted by OpenProt but either display (1) a close homology with a reference protein from the same gene; (2) the same start and/or stop codon than the reference protein. An alignment score above the threshold are considered novel isoforms of the reference proteins.
- Protein variants of the three categories described above based on sample RNA-seq information.

Options and inputs format

Every available option and how-to use the website are described here.

How to use the OpenCustomDB website

What do I need to submit an analysis to OpenCustomDB?

To create a database via the OpenCustomDB website, you will need the following:
- An email adress: the email adress will be used to send you a unique link to the results of the analysis.
- A study name: the study name will be used to identify your analysis report.
- The species: select the adequate species for your analysis.
- The genome assembly: select the adequate genome assembly used for variant calling (for Human, please select adequately between hg9, b37 and hg38).
- The desired genome build: select the genome build with which you would like to annotate your variant calling file (VCF).
- A variant calling file (VCF): upload your VCF. Please note that no VCFs are stored on our servers after your result link expires (10 days after completion of your analysis). Also the VCF can be replaced by a results from a variant annotator(snpeff,annovar) in a tab format as described here.
- A kallisto quant output or other transcript expression file as described here.

When will I get my results?

This will depend on the size of the VCF you submit: for a VCF with 500 variants, expect 5 mins; for a VCF with 5 millions variants, expect 1 hour.

What is in the output of the analysis?

The zipped folder contains the following:
- A personalised protein database in fasta format.
- A summary.txt file giving informations about the composition of the newly created database.
- The OpenVar output if indicated.