Abstract:
"Seamless access to online entertainment platforms has been immensely improved over the
past decade with the advancement of information technology. Also, people now have access to online entertainment platforms at any time from everywhere, and they are used to
browsing and listening to music content from these platforms. However, searching for music content, based on their music preferences, is a hot topic nowadays. Listeners and content creators are struggling when using these platforms because of the abundance of music content are added day-by-day, most of the time people are stickler to a particular music genre category and sometimes they would like to hear different genres according to the situation they are at. And entertainment platforms should have efficient algorithms to label music content by analysing their features to tag them, to be searched by the platform users to provide a better experience. So, many researchers have come up with different approaches such as, analysing lyrics with natural language processing techniques, audio analysis using signal processing techniques, and even creating hybrid models using both audio and textual content of the music to classify their genres.
Nowadays, utilizing multi-modal learning has also become a famous topic because it has
overcome the obstacle of training different types of data that cannot be trained in a single
deep learning model, and different types of data bring different features that would be
essential to improve the efficiency of deep learning models. So, this study is about
developing a multi-modal, multi-class classifier using audio content of songs using audio
processing techniques and utilizing lyrics content by training a Deep Neural Network model
using contextual embedding to represent text to provide more accurate results. Finally, to
build a system that is capable of classifying music content more accurately based on their
genre using textual and audio features.
"