Digital Repository

Enhance apache spark join with an in memory index

Show simple item record

dc.contributor.author Chandika, Keerthisinghe Alankarage Janith
dc.date.accessioned 2022-02-25T09:22:13Z
dc.date.available 2022-02-25T09:22:13Z
dc.date.issued 2021
dc.identifier.citation Chandika, Keerthisinghe Alankarage Janith (2021) Enhance apache spark join with an in memory index. MSc. Dissertation Informatics Institute of Technology en_US
dc.identifier.issn 2018535
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/775
dc.description.abstract Even though Apache Spark fulfils state-of-the-art big data processing needs, some performance issues still exist in Spark JOIN operator which is one of a heavily used operators in many applications. Further spark itself does not have a way to add indexes on dataframes and it makes most RDDs shuffle between distributed nodes while processing data rather than referring indexed metadata. This study aims to improve Spark JOIN operation performance with index friendly dataframes which make data scan faster and reduce the amount of data shuffle in between spark nodes by keeping index metadata in a shared volume. To test the correlation between data volume, amount of data shuffle and execution time an experiment has been conducted, and results showed that there is a linear relationship between the volume of data and amount of shuffled/execution time. Then compare execution time taken by native spark dataframe against novel indexed dataframe. The experimental results show that once indexed dataframes are initialized, they can make Spark JOIN operation execution time faster up to 97 % by comparing to native spark dataframes. In addition to execution time, it also makes no data shuffling at all while processing data. Hence, the use of an indexed friendly dataframe can suggest as a best-fit solution to Spark query slowness en_US
dc.language.iso en en_US
dc.title Enhance apache spark join with an in memory index en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account