Digital Repository

Petrichor: Semi Supervised Fine-Tuning of Textual Entailment Based Boolean Question-Answering for Pre-trained Language Models

Show simple item record

dc.contributor.author Jesuthasan, Tony
dc.date.accessioned 2023-01-23T08:26:27Z
dc.date.available 2023-01-23T08:26:27Z
dc.date.issued 2022
dc.identifier.citation Jesuthasan, Tony (2022) Petrichor: Semi Supervised Fine-Tuning of Textual Entailment Based Boolean Question-Answering for Pre-trained Language Models. BSc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 2018596
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/1523
dc.description.abstract Pre-trained Language Models (PLM) have taken the Natural Language Processing (NLP) domain by storm since its inception during the latter years of the previous decade. Consisting of two training stages: pre-training (unsupervised) and fine-tuning (supervised), these language models require quite a large amount of annotated data for the latter process. Procurement of such data is expensive and an immensely time-consuming process, thus hampering the use of these powerful models especially in domains where annotated corpora is scarce. Boolean Question-Answering, an NLP task, is known for being notoriously difficult in nature to solve as it relies on the textual entailment between a question and passage to infer an answer. Obtaining a labelled dataset of this sort is extremely difficult as it requires long hours of human expertise, combing through each question and its relevant passage before deducing the right answer. This hindrance also limits the use of PLMs to solve the aforementioned NLP task. These aspects present the need for a solution capable of solving the labelled data requirements of the fine-tuning process of a language model and the Boolean Question-Answering task. A promising approach that has demonstrated the ability to reduce the annotated data requirement across any medium or task is semi-supervised learning. This dissertation presents Petrichor – a hybrid architecture that pairs a PLM with a Generative Adversarial Network (GAN) for semi-supervised fine-tuning of textual entailment based Boolean Question Answering to solve this gap. Experimental results indicate that the proposed architecture is capable of producing similar performance levels (F1-scores between 70-80%) when utilizing either 100% or 10% labelled data samples of a dataset. Benchmarking results showcase Petrichor’s ability to outperform functionally similar models that suffer massive performance drops when radically reducing annotated data quantity. en_US
dc.language.iso en en_US
dc.subject Pre-trained Language Models en_US
dc.subject Generative Adversarial Networks en_US
dc.subject Semi-Supervised Learning en_US
dc.title Petrichor: Semi Supervised Fine-Tuning of Textual Entailment Based Boolean Question-Answering for Pre-trained Language Models en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account