This website is under development. If you come accross any issues, please report them to Konstantinos Kanellis (kkanellis@cs.wisc.edu) or Yannis Chronis (chronis@google.com).

Kyrix-J: Visual Discovery of Connected Datasets in a Data Lake

Authors:
Wenbo Tao, Adam Sah, Leilani Battle, Remco Chang, Michael Stonebraker
Abstract

Understanding data in large data lakes is becoming increasingly challenging. While some existing systems help with data discovery in data lakes, they are limited in surfacing connections between datasets and helping users comprehend them, which is crucial for many applications. To this end, we have built a system called Kyrix-J. Kyrix-J uses interactive visualizations to enable rapid discovery of connected datasets in data lakes. We allow a user to “jump” from one visualization to another following connections between the underlying data. Kyrix-J automatically generates these jumps so that the user can start using the system without a manual appauthoring process. We also contribute a novel user interface with Kyrix-J which facilitates a variety of database exploration tasks. Finally, we conduct user study which shows that Kyrix-J is easy to use and allows the users to effortlessly explore connected datasets in a data lake.