Project

Search schemes for sequence alignment

Code

3S005121

Duration

01 November 2021 → 31 March 2025

Funding

Research Foundation - Flanders (FWO)

Promotor

Jan Fostier

Research disciplines

Natural sciences
- Parallel programming
- Development of bioinformatics software, tools and databases
Engineering and technology
- Bio-informatics

Keywords

Sequence alignment Approximate string matching Graph alignment

Project description

Search schemes and a bidirectional index provide a new algorithmic framework for lossless approximate matching, where all approximate matches of a pattern P in a larger search text T are found. Nearly all bioinformatics sequence alignment tools use lossy approximate matching, as lossless approximate matching was historically slower. Search schemes promise to decrease this performance gap and could even be faster. Additionally, our research group has already realized a software prototype. This prototype confirms the increase in performance. We propose to develop algorithms for faster lossless approximate pattern matching based on search schemes and bidirectional full-text indices by taking into account a) the repeat structure of the search text; b) the specific properties of the search pattern. We propose to apply these algorithms to 1) sequence alignments to a linear genome; 2) sequence-to-graph-alignments; 3) alignment of long erroneous reads.