go back
go back
Volume 14, No. 12
Using VDMS to Index and Search 100M Images
Abstract
Data scientists spend most of their time dealing with data preparation, rather than doing what they know best: build machine learning models and algorithms to solve previously unsolvable problems. In this paper, we describe the Visual Data Management System(VDMS), and demonstrate how it can be used to simplify the datapreparation process and consequently gain in efficiency simply because we are using a system designed for the job. To demonstrate this, we use one of the largest available public datasets (YFCC100M), with 100 million images and videos, plus additional data including machine-generated tags, for a total of about 12TB of data. VDMS differs from existing data management systems due to its focus on supporting machine learning and data analytics pipelines that rely on images, videos, and feature vectors, treating these as first class citizens. We demonstrate how VDMS outperforms well-known and widely used systems for data management by up to∼364x, with anaverage improvement of about 85x for our use-cases, and particularly at scale, for a image search engine implementation. At the same time, VDMS simplifies the process of data preparation and data access, and provides functionalities non-existent in alternative options.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy