go back
go back
Volume 17, No. 12
ResLake: Towards Minimum Job Latency and Balanced Resource Utilization in Geo-distributed Job Scheduling
Abstract
At internet scale companies like ByteDance, data is generated and consumed at enormously high speed by many different applications. Achieving low latency on such big data jobs is an important problem. However, the naive approach of aggregating all the data required by a job to a single location is not always feasible in a geo-distributed environment. Similarly, existing approaches in geo-distributed job scheduling often try to minimize WAN usage, which may come at the cost of latency. Another crucial element to ensure low latency is resource load balancing among DCs, which enables flexibility in job scheduling and avoids resource bottlenecks. Therefore, to minimize latency, optimizing job completion time (JCT) while maintaining resource utilization balance is important. To this end, we propose ResLake, a global scheduling platform for data-intensive workloads. ResLake aims to reduce JCT of geo-distributed applications while balancing the compute (CPU/Memory) and storage (Disk) usages across DCs and efficiently using WAN interconnections. We have deployed ResLake in ByteDance’s production for over 1.5 years. ResLake has scheduled billions of jobs since its deployment. We find that ResLake improves JCT of jobs by at least 20%, and can improve resource utilization balance across DCs by up to 53%.
PVLDB is part of the VLDB Endowment Inc.
Privacy Policy