go back
go back
Volume 14, No. 12
A Demonstration of KGLac: A Data Discovery and Enrichment Platform for Data Science
Abstract
Data science's growing success relies on knowing where a relevant dataset exists, understanding its impact on a specific task, finding ways to enrich a dataset, and leveraging insights derived from it. With the growth of open data initiatives, data scientists need an extensible set of effective discovery operations to find relevant data from their enterprise datasets accessible via data discovery systems or open datasets accessible via data portals. Existing portals and systems suffer from limited discovery support and do not track the use of a dataset and insights derived from it. We will demonstrate KGLac, a system that captures metadata and semantics of datasets to construct a knowledge graph (GLac) interconnecting data items, e.g., tables and columns. KGLac supports various data discovery operations via SPARQL queries for table discovery, unionable and joinable tables, plus annotation with related derived insights. We harness a broad range of Machine Learning (ML) approaches with GLac to enable automatic graph learning for advanced and semantic data discovery. The demo will showcase how KGLac facilitates data discovery and enrichment while developing an ML pipeline to evaluate potential gender salary bias in IT jobs.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy