Digital Repository

Implementation of Change Data Capture using Apache Hive to improve ETL performance in a Big Data Warehouse

Show simple item record

dc.contributor.author Wickramaratne, Malith
dc.date.accessioned 2024-06-04T09:09:39Z
dc.date.available 2024-06-04T09:09:39Z
dc.date.issued 2023
dc.identifier.citation Wickramaratne, Malith (2023) Implementation of Change Data Capture using Apache Hive to improve ETL performance in a Big Data Warehouse. MSc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 2018349
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/2187
dc.description.abstract "A Data Warehouse acts as a centralized repository for millions/ billions of historical data. In order to provide historical intelligence, the data storage platform and the ETL process play vital roles with regards to the performance of a Data Warehouse. Many organizations tend to use Apache Hadoop as the distribution storage platform for large amounts of data, in other words for ‘Big Data’, however Hadoop has its own limitations when it comes to transactional processing such as inserts or updates or deletes. This study aims to improve the performance of these transactions using Apache Hive, and thereby develop a logic to capture only the changed data within the ETL process. The experimented test results show that this method would improve the execution time of Hive queries, hence an improvement in the performance of the overall ETL process, which could result in significant lead time improvements to cater historical intelligence for organizations and its stakeholders." en_US
dc.language.iso en en_US
dc.subject Change Data Capture en_US
dc.subject Apache Hadoop en_US
dc.subject Apache Hive en_US
dc.title Implementation of Change Data Capture using Apache Hive to improve ETL performance in a Big Data Warehouse en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account