go back

Volume 15, No. 1

Accelerating Recommendation System Training by Leveraging Popular Choices

Authors:
Muhammad Adnan (University of British Columbia) Yassaman Ebrahimzadeh Maboud (University of British Columbia) Divya Mahajan (Microsoft)* Prashant Nair (University of British Columbia )

Abstract

Recommender models are commonly used to suggest relevant itemsto a user for e-commerce and online advertisement-based applications.These models use massive embedding tables to store numericalrepresentation of items’ and users’ categorical variables(memory intensive) and employ neural networks (compute intensive)to generate final recommendations. Training these large-scalerecommendation models is evolving to require increasing data andcompute resources. The highly parallel neural networks portion ofthese models can benefit from GPU acceleration however, large embeddingtables often cannot fit in the limited-capacity GPU devicememory. Hence, this paper deep dives into the semantics of trainingdata and obtains insights about the feature access, transfer, and usagepatterns of these models.We observe that, due to the popularityof certain inputs, the accesses to the embeddings are highly skewedwith a few embedding entries being accessed up to 10000× more.This paper leverages this asymmetrical access pattern to offer aframework, called FAE, and proposes a hot-embedding aware datalayout for training recommender models. This layout utilizes thescarce GPU memory for storing the highly accessed embeddings,thus reduces the data transfers from CPU to GPU. At the same time,FAE engages the GPU to accelerate the executions of these hotembedding entries. Experiments on production-scale recommendationmodels with real datasets show that FAE reduces the overalltraining time by 2.3× and 1.52× in comparison to XDL CPU-onlyand XDL CPU-GPU execution while maintaining baseline accuracy.

PVLDB is part of the VLDB Endowment Inc.

Privacy Policy