go back
go back
Volume 14, No. 12
DeFiHap: Detecting and Fixing HiveQL Anti-Patterns
Abstract
The emergence of Hive greatly facilitates the management of massive data stored in various places. Meanwhile, data scientists face challenges during HiveQL programming – they may not use correct and/or efficient HiveQL statements in their programs; developers may also introduce anti-patterns indeliberately into HiveQL programs, leading to poor performance, low maintainability, and/or program crashes. This paper presents an empirical study on HiveQL programming, in which 38 HiveQL anti-patterns are revealed. We then design and implement DeFiHap, the first tool for automatically detecting and fixing HiveQL anti-patterns. DeFiHap detects HiveQL anti-patterns via analyzing the abstract syntax trees of HiveQL statements and Hive configurations, and generates fix suggestions by rule-based rewriting and performance tuning techniques. The experimental results show that DeFiHap is effective. In particular, DeFiHap detects 25 anti-patterns and generates fix suggestions for 17 of them.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy