Project

The first genome-wide maps of the micropeptidome validated through large-scale reprocessing of public proteomics data

Code
3E010920
Duration
01 October 2020 → 13 February 2022
Funding
Research Foundation - Flanders (FWO)
Research disciplines
  • Natural sciences
    • Computational transcriptomics and epigenomics
    • Development of bioinformatics software, tools and databases
    • Structural bioinformatics and computational proteomics
    • Genetics
    • Transcription and translation
Keywords
non-coding RNA micropeptides bioinformatics
 
Project description

Complete annotation of the genome is imperative for understanding development, health, and disease. Nevertheless, the annotation of the protein coding genes is far from complete. Especially micropeptides, small proteins less than 100 amino acids, are historically underrepresented in gene annotation databases. In my project proposal, I will develop a machine learning based algorithm to discover novel micropeptides in long non-coding RNA and circular RNA annotation. I will then apply this algorithm on large RNA sequencing transcriptomes of human and reference annotation of mouse, Arabidopsis and yeast to generate an in silico predicted micropeptidome. Subsequently, I will validate the existence of large numbers of these micropeptides using massive volumes of public tandem mass spectrometry data. To perform these analyses, I will rely on Ionbot, an in-house developed and state of the art sequence database search algorithm capable of performing open modification and open mutation searches. In parallel, I will create proteome-wide in silico spectral libraries and use these for spectral library searching on the same data. Finally, I will report all findings in a custom public micropeptide database.