Abstract:
"
The research study is conducted on intrinsic plagiarism detection in Sinhala language 
documents. There are considerably low number of studies done on the plagiarism 
detection and authorship verification for Sinhala language. This research proposes an
anomaly detection-based approach classify text portions based on anomalous behavior 
when compared to the neighboring context for the featured extracted using word 
embedding based approach. In the study multiple feature extraction methods and 
anomaly detection algorithms and supervised algorithm were used to conduct a series 
of experiments to identify the combination which perform best for the Sinhala 
languages. Study uses paragraph level features to distinguish segments with anomalistic 
behavior. Proposed solution was able classify plagiarized content with an accuracy of 
85% with a f1-score of 0.40."