Abstract:
"
In the field of Classifying Text data, Text Classification and Topic Modeling plays the
higher role. When compared between the two techniques, Text Classification provides
outputs with higher accuracy level. Due to this Data analysts tend to move towards
this technique. Text classification is also referred to as text categorization/tagging,
and it is a task of categorizing text according to its specified class. Text classifiers can
automatically examine a set of text and classify it under a pre-defined category
according to the content of the set of text with the help of Natural Language
Processing (NLP) [1]. As this is a Supervised learning, it requires a vast range of
classified dataset to make the classification efficient. But when it comes to languages
with scarcity of classified dataset such as Sinhala, it becomes a problem to train the
model due to the insufficiency of the dataset. Thus, the author proposes a solution for
performing Text classification using Active learning. This solution utilizes the
available classified dataset, learns from this supervised model, and produces outcomes
(Classified Text Data) with a high accuracy level."