In the last decade, deep learning has been expanding and taking over many areas of signal processing of different kinds - image, audio and text among others.
With a set of diverse architectures such as neural networks, convolutional networks and lately, transformers, deep learning showcases better results than seen with classical methods in many signal processing tasks in general, and audio processing specifically.
In the last few years, convolutional architectures rule the audio world especially in classification, emotion detection and feature extraction. Similar to the computer vision area, the learned audio features can be optimized on a broad spectrum of datasets and labels.