Classification & Detecting Linguistics in Reviews

Mohamed Badurdeen, Mohamed Hikam

dc.contributor.author	Mohamed Badurdeen, Mohamed Hikam
dc.date.accessioned	2024-03-01T06:34:51Z
dc.date.available	2024-03-01T06:34:51Z
dc.date.issued	2023
dc.identifier.citation	Mohamed Badurdeen, Mohamed Hikam (2023) Classification & Detecting Linguistics in Reviews. BSc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	2018419
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/1796
dc.description.abstract	"The project aims to develop a fake review detection system using the BERT (Bidirectional Encoder Representations from Transformers) model and various natural language processing techniques. The system preprocesses text data by tokenizing, removing stop words, and lemmatizing the reviews. It then uses BERT, a pre-trained transformer-based model, for sequence classification to distinguish between real and fake reviews. The project incorporates BERT's tokenizer and sequence classification model, along with PyTorch and the Hugging Face Transformers library. The data is split into training and validation sets, and evaluation metrics such as accuracy, precision, recall, and F1 score are computed. The model is trained on the training set and evaluated on the validation set. The best model is saved for future use. Furthermore, the project extends the fake review detection to classify different types of fake reviews, including incentive, competitor, and malicious reviews. Separate classification models are fine-tuned using the BERT model and trained on labeled data. The incentive, competitor, and malicious classification models are evaluated separately and their performance is assessed using evaluation metrics. To enhance the analysis, the project also incorporates sentiment analysis using TextBlob and linguistic analysis using NLTK. The sentiment polarity of reviews is calculated, and linguistic features such as part- of-speech tagging are extracted. The system provides a function to analyze and classify input text as real or fake. It outputs the predicted label, probability of being real, sentiment polarity, reason for classification, and classification results for different types of fake reviews. It also offers visualization features to display linguistic features, entity relations, and confusion matrices. Overall, the project demonstrates the application of BERT-based models and natural language processing techniques for fake review detection and classification, providing insights into the authenticity of online reviews."	en_US
dc.language.iso	en	en_US
dc.publisher	IIT	en_US
dc.subject	Classification	en_US
dc.subject	Machine Learning	en_US
dc.title	Classification & Detecting Linguistics in Reviews	en_US
dc.type	Thesis	en_US