Project

NeLF: Next Level Flemish Speech Recognition

Code

3S004923

Duration

01 October 2022 → 30 September 2026

Funding

Research Foundation - Flanders (FWO)

Promotor

Kris Demuynck

Research disciplines

Natural sciences
- Natural language processing
Engineering and technology
- Audio and speech processing
- Pattern recognition and neural networks
- Audio and speech computing

Keywords

data sciences Artificial intelligence Big data Weakly supervised machine learning Information & communication technology Automatic speech recognition

Project description

NeLF aims to provide automatic speech recognition technology that does not require building costly corpora containing large amounts of manually transcribed speech. Instead, we want to exploit low-cost unlabeled or weakly labeled speech data in a self-training and unsupervised training setting. Such a setup, which relies on technical expertise and smart algorithms instead of on large and costly annotation programs, is expected to be a good solution for the Flemish market, a market which is diverse (dialects, non-native speakers), is relatively small (6 million people), and has a multitude of use-cases spread amongst the various industries. By leveraging the technological know-how available in Flanders, combined with a joint effort to make speech data available for the development of speech technology, tailor-made speech solutions can be provided at reasonable cost, a cost that medium and small companies can bear. Our research results will be validated on other languages as well and should be readily applicable to other (European) countries with similarly diverse language variation and market (e.g. Switzerland, France, Italy, Poland, ...). The project outcomes include (1) open source tools and publications describing the underlying technology, (2) a public repository containing the collected data (speech, annotations, and pseudo-annotations, with a focus on the more challenging speech data such as spontaneous speech, dialects, and speech form non-natives) that can be made publicly available, (3) a private repository containing the data which is only available for research by trusted parties, (4) a web-service to allow citizen and companies to donate additional speech for either the public

or private repository, (5) models for open source speech recognition toolkits which are made available to (local) industry, and (6) webservices built on top of those models and toolkits to provide easy access to a baseline automatic speech transcription setup.