Digital Repository

A Monitoring System of an Effective Event-driven Big Data ETL Pipeline Architecture for a Shared Multi-Tenant News Aggregator on AWS Cloud

Show simple item record

dc.contributor.author Jainulabdeen, Azeem
dc.date.accessioned 2025-06-30T05:32:35Z
dc.date.available 2025-06-30T05:32:35Z
dc.date.issued 2024
dc.identifier.citation Jainulabdeen, Azeem (2024) A Monitoring System of an Effective Event-driven Big Data ETL Pipeline Architecture for a Shared Multi-Tenant News Aggregator on AWS Cloud. Msc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 20200776
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/2766
dc.description.abstract " The challenge of efficiently managing and processing large volumes of data has driven the demand for scalable, reliable, and flexible ETL (Extract, Transform, Load) solutions with excellent performance. Ensuring data integrity is critical in any ETL data pipeline, making automatic rollback an indispensable feature. Moreover, rather than relying solely on infrastructure auto-scaling, it is crucial to enhance performance programmatically through the right architecture in each stage of the ETL process. To address these challenges, this thesis explores the implementation of a Function-as-a-Service (FaaS)-based ETL pipeline utilizing AWS Step Functions – standard edition as an orchestration tool. This research presents an architecture implementation of a FaaS-based ETL data pipeline that provides an automatic rollback error handling and it experiments with both asynchronous and synchronous sequential programming approaches to determine their impact on performance. Different parameters, including data sizes and RAM capacities (512 MB, 1024 MB), were tested to evaluate the average performance for each combination. Upgrading from 512 MB to 1024 MB RAM resulted in an average execution time improvement of 0.90% for the sequential code paradigm and 2.94% for the asynchronous paradigm. This reflects a modest performance boost with increased RAM, particularly for asynchronous execution, which showed a greater improvement. " en_US
dc.language.iso en en_US
dc.subject Monitoring System en_US
dc.subject AWS Step Functions en_US
dc.subject Multi-Tenancy en_US
dc.title A Monitoring System of an Effective Event-driven Big Data ETL Pipeline Architecture for a Shared Multi-Tenant News Aggregator on AWS Cloud en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account