The purpose of this work is to develop an algorithm for converting speech to singing using deep learning methods. The system can help memorize various short texts like phone numbers and lists, as well as for entertainment. There are research papers on the subject that are based on classical signal processing methods as well as works that are based on deep learning, but so far (while working on this project) no results have been achieved that preserve speech content so that it is understandable and humane, along with converting it to desired melody.
The project is based on GAN (Generative Adversarial Network) deep learning network, and its goals are to convert speech to singing while preserving the content of speech and human tone and adjusting the pitch to the desired melody.
The process is coarsely divided into two main parts:
1. Research and technical work in the preliminary stage of database preprocessing
2. Designing and training a deep learning network to solve the problem
With the conclusion of the first part of the project, progress has been made towards the conversion of speech into singing, but more work is required to achieve results that sound natural (human voice) and preserve the content of speech. In the second part of the project, a good pitch conversion was achieved. The sound quality and clarity of speech was similar to state-of-the-art results in this field. Nevertheless, the speech content was not preserved in such a way that the words could be understood.