This website is under development. If you come accross any issues, please report them to Konstantinos Kanellis (kkanellis@cs.wisc.edu) or Yannis Chronis (chronis@google.com).

Building a Shared Conceptual Model of Complex, Heterogeneous Data Systems: A Demonstration

Authors:
Michael R Anderson, Yuze Lou, Jiayun Zou, Michael Cafarella, Sarah Chasins, Doug Downey, Tian Gao, Kexin Huang, Dinghao Shen, Jenny Vo-Phamhi, Yitong Wang, Yuning Wang, Anna Zeng
Abstract

The world of data objects and systems is complex and heterogeneous, making collaboration across tools, teams, and institutions difficult. Important goals like effective data science, responsible data governance, and well-informed data consumption all require participation from multiple parties who share conceptual data models despite being unfamiliar with, or organizationally distant from each other. In order to be productive together, data collaborators need a shared conceptual model that includes traditional schemas and system models, such as pipelines and procedures. This shared model does not have to be entirely correct, but to enable effective collaboration, it should be tool-, team-, and institution-independent. We describe a working demonstration system that aims to build this shared conceptual model. This system borrows ideas from knowledge graphs and other massive collaborative efforts to curate data artifacts beyond the reach of any one person or institution.