Project

NeLF: Next Level Flemish Speech Recognition

Code
3S004923
Duration
01 October 2022 → 30 September 2026
Funding
Research Foundation - Flanders (FWO)
Research disciplines
  • Natural sciences
    • Natural language processing
  • Engineering and technology
    • Audio and speech processing
    • Pattern recognition and neural networks
    • Audio and speech computing
Keywords
data sciences Artificial intelligence Big data Weakly supervised machine learning Information & communication technology Automatic speech recognition
 
Project description

NeLF aims to provide automatic speech recognition technology that

does not require building costly corpora containing large amounts of

manually transcribed speech. Instead, we want to exploit low-cost

unlabeled or weakly labeled speech data in a self-training and

unsupervised training setting. Such a setup, which relies on technical

expertise and smart algorithms instead of on large and costly annotation

programs, is expected to be a good solution for the Flemish market, a

market which is diverse (dialects, non-native speakers), is relatively small

(6 million people), and has a multitude of use-cases spread amongst the

various industries. By leveraging the technological know-how available in

Flanders, combined with a joint effort to make speech data available for

the development of speech technology, tailor-made speech solutions

can be provided at reasonable cost, a cost that medium and small

companies can bear. Our research results will be validated on other

languages as well and should be readily applicable to other (European)

countries with similarly diverse language variation and market (e.g.

Switzerland, France, Italy, Poland, ...).

The project outcomes include (1) open source tools and publications

describing the underlying technology, (2) a public repository containing

the collected data (speech, annotations, and pseudo-annotations, with a

focus on the more challenging speech data such as spontaneous speech,

dialects, and speech form non-natives) that can be made publicly

available, (3) a private repository containing the data which is only

available for research by trusted parties, (4) a web-service to allow

citizen and companies to donate additional speech for either the public

or private repository, (5) models for open source speech recognition

toolkits which are made available to (local) industry, and (6) webservices built on top of those models and toolkits to provide easy access

to a baseline automatic speech transcription setup.