Digital Repository

Classification & Detecting Linguistics in Reviews

Show simple item record

dc.contributor.author Mohamed Badurdeen, Mohamed Hikam
dc.date.accessioned 2024-03-01T06:34:51Z
dc.date.available 2024-03-01T06:34:51Z
dc.date.issued 2023
dc.identifier.citation Mohamed Badurdeen, Mohamed Hikam (2023) Classification & Detecting Linguistics in Reviews. BSc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 2018419
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/1796
dc.description.abstract "The project aims to develop a fake review detection system using the BERT (Bidirectional Encoder Representations from Transformers) model and various natural language processing techniques. The system preprocesses text data by tokenizing, removing stop words, and lemmatizing the reviews. It then uses BERT, a pre-trained transformer-based model, for sequence classification to distinguish between real and fake reviews. The project incorporates BERT's tokenizer and sequence classification model, along with PyTorch and the Hugging Face Transformers library. The data is split into training and validation sets, and evaluation metrics such as accuracy, precision, recall, and F1 score are computed. The model is trained on the training set and evaluated on the validation set. The best model is saved for future use. Furthermore, the project extends the fake review detection to classify different types of fake reviews, including incentive, competitor, and malicious reviews. Separate classification models are fine-tuned using the BERT model and trained on labeled data. The incentive, competitor, and malicious classification models are evaluated separately and their performance is assessed using evaluation metrics. To enhance the analysis, the project also incorporates sentiment analysis using TextBlob and linguistic analysis using NLTK. The sentiment polarity of reviews is calculated, and linguistic features such as part- of-speech tagging are extracted. The system provides a function to analyze and classify input text as real or fake. It outputs the predicted label, probability of being real, sentiment polarity, reason for classification, and classification results for different types of fake reviews. It also offers visualization features to display linguistic features, entity relations, and confusion matrices. Overall, the project demonstrates the application of BERT-based models and natural language processing techniques for fake review detection and classification, providing insights into the authenticity of online reviews." en_US
dc.language.iso en en_US
dc.publisher IIT en_US
dc.subject Classification en_US
dc.subject Machine Learning en_US
dc.title Classification & Detecting Linguistics in Reviews en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account