Digital Repository

A Machine Learning Approach to Predict IMDb Ratings from Movie Scripts

Show simple item record

dc.contributor.author Harinda, Janith
dc.date.accessioned 2026-03-11T03:33:24Z
dc.date.available 2026-03-11T03:33:24Z
dc.date.issued 2025
dc.identifier.citation Harinda, Janith (2025) A Machine Learning Approach to Predict IMDb Ratings from Movie Scripts. Msc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 20221844
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/2910
dc.description.abstract Accurately predicting a movie’s success before its release remains a key challenge in the film industry. Traditional models rely heavily on post-release feedback or superficial metadata, often ignoring the narrative richness embedded in the script. This project addresses that gap by developing a machine learning pipeline to predict IMDb rating classes using raw movie scripts and associated metadata, enabling data-driven decision-making in the pre-production phase. The system extracts structural, emotional, and linguistic features from scripts using a custom feature engineering pipeline. Semantic understanding was further enhanced using three embedding techniques: TF-IDF, BERT, and Sentence Transformers. Structured metadata such as genre, director, cast, and country were integrated with the engineered features. Machine learning models including Random Forest, XGBoost, and Gradient Boosting were trained using these inputs, along with techniques such as label encoding, SMOTE for class balancing, and hyperparameter tuning. A prototype web interface was also developed using Streamlit and FastAPI, allowing users to upload scripts and receive predictions in real-time via deployed model. The best-performing model combined Sentence Transformer embeddings with Random Forest, achieving an accuracy of 85%, macro F1-score of 0.85. This result, along with the real-time interface, demonstrates that combining deep semantic script analysis with structured metadata can effectively predict IMDb ratings prior to release, offering practical value to producers and analysts. en_US
dc.language.iso en en_US
dc.subject Movie Scripts en_US
dc.subject Machine Learning en_US
dc.subject Computing Methodologies en_US
dc.title A Machine Learning Approach to Predict IMDb Ratings from Movie Scripts en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account