Semi-supervised Corpus based POS tagger for Sinhala

Perera, Kumal

dc.contributor.advisor
dc.contributor.author	Perera, Kumal
dc.date.accessioned	2019-03-04T12:47:26Z
dc.date.available	2019-03-04T12:47:26Z
dc.date.issued	2018
dc.identifier.citation	Perera, K. (2018) Semi-supervised Corpus based POS tagger for Sinhala. BSc. Dissertation. Informatics Institute of Technology	en_US
dc.identifier.other	2014159
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/173
dc.description.abstract	Most of the rich morphological languages are being endangered due to the lack of resources and also since most of the countries are still being developed. It takes time to build up a status where a particular language has enough and more resources. It is found that 22 million people in Sri Lanka use Sinhala most of the time. Even though that much of people use the local language, not much priority is given to the obligation of building up technical libraries for it. Local language should be prioritized as it is our nationalistic obligation to hold the local culture. One finds it more comfortable in using their own language. According to the research done so far, it is found that even though there have been a series of work done for the development of the local language for POS tagging, not much accuracy is found to help it grow. The whole purpose of the project Psephology is to come up with Part of Speech tags for Sinhala words. It has many uses in regard to implementing another system, and checking local grammar context as well. With the anticipation that this would overcome the chance of the local language being endangered, several approaches were proposed to uplift the performance and usability of the product and minimize the discrepancies. All the libraries were implemented using the Python framework. Sub libraries such as NLTK, Scikit-learn and NumPy were also used. Implemented system was tested thoroughly under different conditions and the Lexicon system was evaluated by evaluators of various domains. Eventually, the test results attested that the analysis, design, implementation and documentation have been carried out in an effective and in an efficient manner.	en_US
dc.subject	POS tagging	en_US
dc.subject	Natural Language Processing	en_US
dc.subject	Hidden Markov Model	en_US
dc.subject	Text Classification	en_US
dc.title	Semi-supervised Corpus based POS tagger for Sinhala	en_US
dc.type	Thesis	en_US