The goal of an acoustic fencing algorithm is to separate speakers by their physical location in space. In this project, we examine an algorithm which solves this problem, define suitable performance criteria, and test the algorithm in varied environments, both simulated and real. The real recordings were acquired by us with suitable acoustic equipment. We examine a speech separation algorithm based on spectral masking inferred from the speakers direction. The algorithm assumes the existence of a dominant speaker in each time-frequency (TF) bin and classifies these bins by employing a deep convolutional neural network.
Traditional evaluation criteria do not independently quantify the effects of the desired signal distortion and the undesired signal attenuation, and often result in a single numeric value for both effects. In this project, we propose a method for evaluating these phenomena separately by applying the separation mask to the original separated signals. This mask is time-dependent and represents the networks gain, such that by applying it to the desired signal (for example), we can evaluate the networks effect on the signal. We tested the algorithm and evaluation criteria on a simulated signals with varied room sizes, speakers locations, and reverberation times. Following the success in the simulation, we continued to test the algorithm on real recordings acquired in the lab employing a microphone array and mouth simulator. To evaluate the generalization of the system, a test set was comprised from recordings acquired in rooms which were not present in the training set. In conclusion, this research describes an acoustic fencing algorithm and evaluation criteria with high correlation human perception and shows successful performance in a real acoustic environment. Furthermore, the systems low resource consumption and fast response times might indicate that it is suitable as a practical algorithm in a real system.