This work focuses on solving the cocktail party problem, a well-known challenge in the field of signal processing. The goal of the project is to identify and enhance the audio of a specific conversation from among multiple conversations in the same space and in the presence of background noise. This is particularly relevant for applications such as hearing aids, video calls in noisy environments, and automatic speech recognition systems.
We implemented the solution by creating a beamformer to enhance noisy speech signals captured by a wearable microphone array. The MVDR Beamformer allows us to focus on the desired source while suppressing background noise.
To achieve this, we used audio recordings of natural conversations in noisy environments, with the speaker’s direction known in advance. The required audio files with these conditions were obtained from the dataset of the SPEAR Challenge.