Machine Learning, Linear Algebra, and More: Is SQL All You Need?
Abstract
SQL is the standard language for retrieving and manipulating relational data. Although SQL is ubiquitous for simple analytical queries, it is rarely used for more complex computations like machine learning, linear algebra, and other computationally-intensive algorithms. These algorithms are often programmed in a procedural fashion and look very different from declarative SQL queries. However, SQL actually does provide constructs to perform all kinds of computations. In this paper, we show how to translate procedural constructs to SQL – enabling complex SQL-only algorithms. Using SQL for algorithms keeps computations close to the data, requires minimal user permissions, and increases software portability. The performance of the resulting SQL algorithms depends heavily on the underlying DBMS and the SQL code. Surprisingly, we find that query engines like HyPer can achieve very high performance –in some cases even outperforming state-of-the-art linear algebra packages like NumPy.