Abstract:
In the field of Classifying Text data, Text Classification and Topic Modeling plays the higher role. When compared between the two techniques, Text Classification provides outputs with higher accuracy level. Due to this Data analysts tend to move towards this technique. Text classification is also referred to as text categorization/tagging, and it is a task of categorizing text according to its specified class. Text classifiers can automatically examine a set of text and classify it under a pre-defined category according to the content of the set of text with the help of Natural Language Processing (NLP) [1]. As this is a Supervised learning, it requires a vast range of classified dataset to make the classification efficient. But when it comes to languages with scarcity of classified dataset such as Sinhala, it becomes a problem to train the model due to the insufficiency of the dataset. Thus, the author proposes a solution for performing Text classification using Active learning. This solution utilizes the available classified dataset, learns from this supervised model, and produces outcomes (Classified Text Data) with a high accuracy level.