Abstract:
"In the ever-expanding world of online video content, YouTube creators face the challenge of
optimizing their videos for discoverability and engagement. This research project aims to empower content creators with a browser extension that leverages AI and machine learning techniques to enhance their video metadata and maximize their video's potential reach. The project begins with the collection of a diverse dataset from YouTube Data API v3, Feature extraction techniques, as TF-IDF are employed to convert textual data into numerical representations.
To address the issue, the project has utilised a hybrid model that combines BERT embeddings with a Random Forest classifier. The BERT model generates contextualized embeddings by understanding the semantic meaning of words in titles. Meanwhile, the Random Forest classifier leverages these embeddings alongside TF-IDF features for keyword prediction. Extracted BERT embeddings and concatenated them with TF-IDF vectors, creating a hybrid feature representation. This combined feature set was then used to train the Random Forest model, allowing us to capture both contextual information from BERT and structured features from TF-IDF.
The results of the project have been provided from the forms of an evaluation method named 2 vs.2 test. The result of the implementation with the created dataset is 78%"