LLM for Data Management

Authors:

Guoliang Li, Xuanhe Zhou, Xinyang Zhao

Download PDF

Abstract

Machine learning techniques have been verified to be effective in optimizing data management systems and are widely researched in recent years. However, traditional small-sized ML models often struggle to generalize to new scenarios, and have limited context understanding ability (e.g., inputting discrete features only). The emergence of LLMs offers a promising solution to these challenges. LLMs have been trained over a vast number of scenarios and tasks and acquire human-competitive capabilities like context understanding and summarization, which can be highly beneficial for data management tasks (e.g., natural language based data analytics). In this tutorial, we present how to utilize LLMs to optimize data management systems and review new techniques for addressing these technical challenges, including hallucination of LLMs, high cost of interacting with LLMs, and low accuracy for processing complicated tasks. First, we discuss retrieval augmented generation (RAG) techniques to address the hallucination problem. Second, we present vector database techniques to improve the latency. Third, we present LLM agent techniques for processing complicated tasks by generating multi-round pipelines. We also showcase some real-world data management scenarios that can be well optimized by LLMs, including query rewrite, database diagnosis and data analytics. Finally, we summarize some open research challenges.

PVLDB is part of the VLDB Endowment Inc.

Start

Current Submission

All Volumes

Reproducibility

General Information

Volume 17, No. 12

LLM for Data Management

Abstract