Implementation of Change Data Capture using Apache Hive to improve ETL performance in a Big Data Warehouse

Wickramaratne, Malith

Home
→
Dissertations & Thesis
→
MSc Bigdata Analytics
→
2023
→
View Item

dc.contributor.author	Wickramaratne, Malith
dc.date.accessioned	2024-06-04T09:09:39Z
dc.date.available	2024-06-04T09:09:39Z
dc.date.issued	2023
dc.identifier.citation	Wickramaratne, Malith (2023) Implementation of Change Data Capture using Apache Hive to improve ETL performance in a Big Data Warehouse. MSc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	2018349
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/2187
dc.description.abstract	"A Data Warehouse acts as a centralized repository for millions/ billions of historical data. In order to provide historical intelligence, the data storage platform and the ETL process play vital roles with regards to the performance of a Data Warehouse. Many organizations tend to use Apache Hadoop as the distribution storage platform for large amounts of data, in other words for ‘Big Data’, however Hadoop has its own limitations when it comes to transactional processing such as inserts or updates or deletes. This study aims to improve the performance of these transactions using Apache Hive, and thereby develop a logic to capture only the changed data within the ETL process. The experimented test results show that this method would improve the execution time of Hive queries, hence an improvement in the performance of the overall ETL process, which could result in significant lead time improvements to cater historical intelligence for organizations and its stakeholders."	en_US
dc.language.iso	en	en_US
dc.subject	Change Data Capture	en_US
dc.subject	Apache Hadoop	en_US
dc.subject	Apache Hive	en_US
dc.title	Implementation of Change Data Capture using Apache Hive to improve ETL performance in a Big Data Warehouse	en_US
dc.type	Thesis	en_US