Abstract:
Apart from brief video segments and audio snippets and textual descriptions digital media
consumption has surged rapidly which now poses challenges to identify films accurately.
Current movie identification methods use metadata comparison and genre classification
techniques but they fail to produce satisfactory results in real-life conditions. The research
presents an AI-driven system which combines multimodal analysis to identify movies from
incomplete user-input clips while generating comprehensive emotional themes of the film
storyline. Major components of the proposed system include (1) the Emotion Matcher which
employs Multimodal Transformer with Emotion-Specific Tuning (EmotionBERT or MERT)
for analyzing emotional transitions to find matching movie scenes and (2) the Text Matcher
that uses Sentence-BERT (SBERT) with Ontology Integration for identifying scenes through
structured narrative semantics. The system operates differently than conventional methods
because it only analyzes audio and text data rather than direct video processing thus
maintaining operational efficiency without excessive computational requirements. Users
benefit from the system through a detailed emotional timeline and emotional content analysis
which enables them to determine content appropriateness for specific target audiences. The
system performs optimally through extensive training using subtitles from various movies
along with semantic ontologies and emotions tagged from specific clips to achieve extensive
generalization across different types of content. This research contributes a novel emotion-
aware movie identification system that enhances user experience, content accessibility, and
personalized recommendations by leveraging cutting-edge AI techniques. The system's ability
to match fragmented inputs with high accuracy and provide a detailed emotional context paves
the way for more intelligent and user-centric content discovery applications in the
entertainment industry.