| dc.contributor.author | Amarasinghe, Charith Thiwanka | |
| dc.date.accessioned | 2022-02-28T05:58:26Z | |
| dc.date.available | 2022-02-28T05:58:26Z | |
| dc.date.issued | 2021 | |
| dc.identifier.citation | Amarasinghe, Charith Thiwanka (2021) Intrinsic Plagiarism Detection in Sinhala language documents. MSc. Dissertation Informatics Institute of Technology | en_US |
| dc.identifier.issn | 2019574 | |
| dc.identifier.uri | http://dlib.iit.ac.lk/xmlui/handle/123456789/789 | |
| dc.description.abstract | " The research study is conducted on intrinsic plagiarism detection in Sinhala language documents. There are considerably low number of studies done on the plagiarism detection and authorship verification for Sinhala language. This research proposes an anomaly detection-based approach classify text portions based on anomalous behavior when compared to the neighboring context for the featured extracted using word embedding based approach. In the study multiple feature extraction methods and anomaly detection algorithms and supervised algorithm were used to conduct a series of experiments to identify the combination which perform best for the Sinhala languages. Study uses paragraph level features to distinguish segments with anomalistic behavior. Proposed solution was able classify plagiarized content with an accuracy of 85% with a f1-score of 0.40." | en_US |
| dc.language.iso | en | en_US |
| dc.subject | Plagiarism detection | en_US |
| dc.subject | Anomaly detection | en_US |
| dc.subject | Text analytics | en_US |
| dc.subject | Data mining | en_US |
| dc.title | Intrinsic Plagiarism Detection in Sinhala language documents | en_US |
| dc.type | Thesis | en_US |