Abstract:
"In today's digital age, music streaming has become a common form of entertainment. However,
current recommendation systems frequently overlook users' real-time emotional states,
resulting in a less personalized experience. This project solves this issue by creating an
advanced music recommendation system that uses Vision Transformers (ViTs) to evaluate
users' emotions through visuals and interfaces with Spotify's API to generate playlists based
on the user's current mood.
The technical solution to this problem involves the new application of ViTs, which are
commonly employed in image processing tasks, to emotion recognition. This approach treated
the facial expressions in user-provided photos as a series of patches, then used the transformer
model to comprehend the contextual links between these patches. By training the ViT on a
large dataset of facial expressions, the system can generate accurate predictions about a user's
emotional state.
Following deployment, the emotion detection algorithm underwent extensive testing to assess
its efficacy. Using data science criteria like the confusion matrix, the system achieved a
commendable 86% accuracy rate. This high accuracy reveals the model's ability to discern
emotions, highlighting Vision Transformers' promise in the field of tailored music
recommendations. The confusion matrix, an important tool in the evaluation, offered more
information about the model's precision and recall across different emotions, confirming its
reliability. This testing phase was critical for demonstrating the system's ability to transform
music recommendation services by providing a more dynamic, intuitive, and individualized
approach to matching music to listeners' emotional states."