Kyrix-J: Visual Discovery of Connected Datasets in a Data Lake
Abstract
Understanding data in large data lakes is becoming increasingly challenging. While some existing systems help with data discovery in data lakes, they are limited in surfacing connections between datasets and helping users comprehend them, which is crucial for many applications. To this end, we have built a system called Kyrix-J. Kyrix-J uses interactive visualizations to enable rapid discovery of connected datasets in data lakes. We allow a user to “jump” from one visualization to another following connections between the underlying data. Kyrix-J automatically generates these jumps so that the user can start using the system without a manual appauthoring process. We also contribute a novel user interface with Kyrix-J which facilitates a variety of database exploration tasks. Finally, we conduct user study which shows that Kyrix-J is easy to use and allows the users to effortlessly explore connected datasets in a data lake.