go back

Volume 17, No. 12

Navigating Data Repositories: Utilizing Line Charts to Discover Relevant Datasets

Authors:
Daomin Ji, Hui Luo, Zhifeng Bao, Shane Culpepper

Abstract

Line charts are fundamental to data analysis and exploration, offering concise visual representations of trends. However, gaining access to the underlying data used to construct these charts is often challenging. In this paper, we describe DDLC (short for Dataset discovery via line charts), an automatic dataset discovery tool that is able to not only identify datasets (from a dataset repository) that are “relevant” to the information depicted from a line chart provided by the users, but also empower users to refine search results based on specific visual elements extracted from the line chart. Moreover, DDLC offers multiple avenues for users to validate search outcomes: 1) Providing explanations on how a similar line chart could be generated from the identified dataset; 2) enabling comparison of line charts generated from different datasets via different ways (e.g., the aggregation vs. non-aggregation operator); 3) facilitating fine-grained examination of the correspondence between the line chart and the identified dataset. By seamlessly combining dataset retrieval with visual refinement and validation mechanisms, DDLC offers a comprehensive solution for the data-driven exploration and analysis.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy