go back
go back
Volume 17, No. 12
UTOPIA: Automatic Pivot Table Assistant
Abstract
Data summarization is required to comprehend large datasets, and aggregations are effective ways to summarize data. A pivot table is a mechanism to aggregate numerical attributes grouped by categorical attributes and spreadsheet pivot tables are particularly suitable for novices, where they can rearrange, group, and aggregate data using intuitive interfaces. However, real-world spreadsheet data is often disorganized, with attributes having multiple values and synonymous variants. For instance, in the IMDb data, multiple genres are stored as a comma-separated value (“Action, Comedy, Drama”) and “Science Fiction” can be represented in various ways: “Sci-Fi”, “scifi”, “Technological Fiction”, etc. Such data issues pose barriers for novices while constructing pivot tables, and result in noisy and incomprehensible summarization. Parsing multi-valued attributes forces users to resort to external tools (Power Query) or methods (Python), requiring additional expertise; and consolidating synonymous variants demand manual effort, which is a tedious task. We introduce Utopia, an aUTOmatic PIvot table Assistant that extends the functionality of spreadsheet pivot tables, overcoming data issues such as multi-valued attributes and synonymous variants. Utopia helps construct pivot tables without requiring additional expertise by (1) automatically detecting multi-valued attributes and organizing the values, achieving implicit data normalization, and (2) leveraging SimCSE embeddings and K-Means clustering to consolidate synonymous variants, enabling semantic aggregation. We will demonstrate how Utopia enables effective pivot table construction, relieving technical novices from tedious data preprocessing while allowing them to remain in their familiar spreadsheet environment, without requiring external tools or additional expertise.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy