Project

Deep-Learning based Joint Audio, Video Processing for Augmented Listening

Code

01SC2722

Duration

01 October 2022 → 30 September 2026

Funding

Regional and community funding: Special Research Fund

Promotor

Nilesh Madhu

Research disciplines

Natural sciences
- Machine learning and decision making
- Image processing
Engineering and technology
- Audio and speech processing
- Computer vision
- Audio and speech computing

Keywords

Joint audio-video processing Deep learning Augmented reality

Project description

Augmented listening implies the extraction of desired audio signal(s) from a distorted capture. Like human perception of speech, where visual and acoustic cues jointly contribute to the understanding, we aim to improve this extraction by augmenting the audio with visual information. A side-application is the detection of inconsistent streams, hinting at deepfakes or otherwise compromised streams.