Project

The first genome-wide maps of the micropeptidome validated through large-scale reprocessing of public proteomics data

Code

3E010920

Duration

01 October 2020 → 13 February 2022

Funding

Research Foundation - Flanders (FWO)

Promotor

Lennart Martens

Research disciplines

Natural sciences
- Computational transcriptomics and epigenomics
- Development of bioinformatics software, tools and databases
- Structural bioinformatics and computational proteomics
- Genetics
- Transcription and translation

Keywords

non-coding RNA micropeptides bioinformatics

Project description

Complete annotation of the genome is imperative for understanding development, health, and disease. Nevertheless, the annotation of the protein coding genes is far from complete. Especially micropeptides, small proteins less than 100 amino acids, are historically underrepresented in gene annotation databases. In my project proposal, I will develop a machine learning based algorithm to discover novel micropeptides in long non-coding RNA and circular RNA annotation. I will then apply this algorithm on large RNA sequencing transcriptomes of human and reference annotation of mouse, Arabidopsis and yeast to generate an in silico predicted micropeptidome. Subsequently, I will validate the existence of large numbers of these micropeptides using massive volumes of public tandem mass spectrometry data. To perform these analyses, I will rely on Ionbot, an in-house developed and state of the art sequence database search algorithm capable of performing open modification and open mutation searches. In parallel, I will create proteome-wide in silico spectral libraries and use these for spectral library searching on the same data. Finally, I will report all findings in a custom public micropeptide database.