go back

Volume 18, No. 3

RDPro: Distributed Processing of Big Raster Data

Authors:
Zhuocheng Shang, Samriddhi Singla, Ahmed Eldawy, Elia Scudiero

Abstract

Advancements in remote sensing technology allowed for collecting vast amounts of satellite and aerial imagery with up to 1 cm pixel resolutions, stored in raster format crucial for various research fields. However, processing this data poses challenges, including resolving data dependencies when location, resolution, and coor-resolving data dependencies when location, resolution, and coordinate systems do not align and managing large datasets within memory constraints. This paper introduces RDPro, a novel Spark-memory constraints. This paper introduces RDPro, a novel Sparkbased system that efficiently processes and analyzes large raster datasets. RDPro features a new data model tailored for data depen-datasets. RDPro features a new data model tailored for data dependencies in a distributed, shared-nothing environment, complete with tools for loading and writing raster data. It also optimizes core raster operations within Spark, allowing users to integrate com-raster operations within Spark, allowing users to integrate complex data science workflows. Comparative analysis shows RDPro outperforms existing systems by up to two orders of magnitude.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy