Robust speech capture and enhancement using ad hoc distributed microphone arrays by integrating and embedding domain-specific signal models within deep-learning frameworks

01 January 2020 → 31 December 2023
Research Foundation - Flanders (FWO)
Research disciplines
  • Engineering and technology
    • Telecommunication and remote sensing
    • Audio and speech processing
    • Pattern recognition and neural networks
    • Audio and speech computing
    • Signal processing not elsewhere classified
ad hoc microphone arrays
Project description

In the Internet-of-Things (IoT) world we are entering now, consumer devices are equipped with multiple microphones and it is becoming increasingly common for users to speak to their devices, instead of entering their queries on a keyboard For a machine to be able to accurately interpret the underlying meaning in the speech, the first step is the high-quality acquisition of the signal However, signals acquired through device-mounted microphones are often corrupted due to factors such as sensor degradation, presence of interfering audio sources in the background, room reverberation, etc The goal of this project is to combine the signals captured by the various audio devices scattered about the room to enhance the desired speech signal and suppress the interference Since the geometry of the microphones in the room is unknown, and can constantly change, this is a challenging problem We propose a combination of classical statistical-signal models and state-of-the-art deep neural networks (DNNs) to solve this problem Based on features extracted from the audio signals with the classical method, we can derive an initial estimate of the desired speech This initial estimate is then fed into an appropriate DNN, that can provide a significantly enhanced signal The process can be carried out over multiple iterations, providing a high-quality speech output at the end