go back

Volume 17, No. 12

OSSInsight: Scalable GitHub Analysis

Authors:
Ahmad Ghazal, Zhiyuan Liang, Sunny Bains, Hanumath Maduri

Abstract

GitHub is a platform hosting code, enabling collaboration, and supporting version control for a global community of over 100 million developers. The need for free tools is crucial for researching open-source software. Based on our research, we found out that existing tools lack real-time GitHub data processing or have limited functionalities. This demonstration presents OSSInsight, an open source tool for researching and analyzing GitHub repositories. We first present the architecture of the tool including its access to nearly 7 billion archived & real time data and how it is powered by TiDB. The demonstration shows how OSSInsight provides analysis of GitHub data along three dimensions: developers, repositories and organizations. All these analysis are based on generated SQL queries submitted to TiDB database. TiDB possesses HTAP capabilities, utilizing its row store for simple SQL queries while relying on its column store for more complex queries. Users can view and edit these SQL queries and also view their execution plan. Finally, OSSInsight provides an innovative tool based on OpenAI, that conducts data analysis using input in English text, yielding visual representations in the form of charts and graphs.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy