Much recent advancement in Computer Vision is attributed to large datasets and the ability to use them to train deep neural networks. In 2016 Google announced the publishing of a public dataset containing about 8-million tagged videos called YouTube-8M. In this project, we used this database to train several deep neural networks for tagging videos in a variety of categories. In the first stage, we downloaded 5000 videos for 5 different categories. Next, we trained two deep networks, with slightly different architectures, to tag a video into one of the five categories. One network uses the LSTM architecture and the other uses the BiLSTM architecture. In the second step, we increased the number of categories to 10, by downloading 5 additional categories. Once again we trained two networks, using LSTM architecture and BiLSTM architecture. Finally, we examined whether the 10 categories' training time could be reduced by conducting transfer learning from the network that was based on 5 categories into 10 categories.
The results we achieved were good, for the 5 categories the two networks labeled correctly more than 96 percent of the videos. For 10 categories, both networks labeled correctly about 89 percent of the videos, and the Transfer Learning network labeled correctly 88 percent of the videos.