ProjectsProject Details

Emotional Speech Synthesis

Project ID: 6909-1-23
Year: 2024
Student/s: Sagi Eyal, Loren Tzveniashvily
Supervisor/s: Yair Moshe

The goal of this work was to perform emotional speech synthesis. First, we experimented with the emotional voice conversion approach, where the system receives two voice signals and transfers the emotion from one recording to another. Later in the project, we focused on the emotional text-to-speech approach, where the system receives the transcription of the sentence we want to synthesize and the desired emotion and generates a recording of the desired sentence with the given emotion. First, we reproduced the results of the EmoSpeech system, which converts text to emotional speech in a fast and high-quality manner. The system was trained on a dataset containing recordings of ten speakers in five emotions, and therefore is limited to speech synthesis with the voice of one of the ten speakers and in one of the supported emotions. We showed that it is possible to extend the original system to support new speakers, given an appropriate dataset. In addition, we showed that for a new speaker, good results can be achieved with relatively low training time, using a significantly smaller database than the one on which the original system was trained. Finally, we used a generative AI speech enhancement tool to improve the audio and showed that it can significantly improve the quality of results.

Poster for Emotional Speech Synthesis
Collaborators:
Logo of Elbit Systems Collaborator
Elbit Systems