go back

Volume 14, No. 12

DeFiHap: Detecting and Fixing HiveQL Anti-Patterns

Authors:
Yuetian Mao (Shanghai Jiao Tong University), Shuai Yuan (Shanghai Jiao Tong University), Nan Cui (Shanghai Jiao Tong University), Tianjiao Du (Shanghai Jiao Tong University), Beijun Shen (Shanghai Jiao Tong University), Yuting Chen (Shanghai Jiao Tong University)

Abstract

The emergence of Hive greatly facilitates the management of massive data stored in various places. Meanwhile, data scientists face challenges during HiveQL programming – they may not use correct and/or efficient HiveQL statements in their programs; developers may also introduce anti-patterns indeliberately into HiveQL programs, leading to poor performance, low maintainability, and/or program crashes. This paper presents an empirical study on HiveQL programming, in which 38 HiveQL anti-patterns are revealed. We then design and implement DeFiHap, the first tool for automatically detecting and fixing HiveQL anti-patterns. DeFiHap detects HiveQL anti-patterns via analyzing the abstract syntax trees of HiveQL statements and Hive configurations, and generates fix suggestions by rule-based rewriting and performance tuning techniques. The experimental results show that DeFiHap is effective. In particular, DeFiHap detects 25 anti-patterns and generates fix suggestions for 17 of them.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy