Machine learning regression to boost scheduling performance in hyper-scale cloud-computing data centres
Authors:
- Damián Fernández-Cerero,
- José A. Troyano,
- Agnieszka Jakóbik,
- Alejandro Fernández-Montes
Abstract
Data centres increase their size and complexity due to the increasing amount of heterogeneous workloads and patterns to be served. Such a mix of various purpose workloads makes the optimisation of resource management systems according to temporal or application-level patterns difficult. Datacentre operators have developed multiple resource-management models to improve scheduling performance in controlled scenarios. However, the constant evolution of the workloads makes the utilisation of only one resource-management model sub-optimal in some scenarios. In this work, we propose: (a) a machine learning regression model based on gradient boosting to predict the time a resource manager needs to schedule incoming jobs for a given period; and (b) a resource management model, Boost, that takes advantage of this regression model to predict the scheduling time of a catalogue of resource managers so that the most performant can be used for a time span. The benefits of the proposed resource-management model are analysed by comparing its scheduling performance KPIs to those provided by the two most popular resource-management models: twolevel, used by Apache Mesos, and shared-state, employed by Google Borg. Such gains are empirically evaluated by simulating a hyper-scale data centre that executes a realistic synthetically generated workload that follows real-world trace patterns.
- Record ID
- CUTf0b9d039da2747d3859bac0391745673
- Publication categories
- ;
- Author
- Journal series
- Journal of King Saud University - Computer and Information Sciences, ISSN 1319-1578, e-ISSN 2213-1248
- Issue year
- 2022
- Vol
- 34
- No
- 6, Part B
- Pages
- 3191-3203
- Other elements of collation
- rys.; tab.; wykr.; Bibliografia (na s.) - 3202-3203; Oznaczenie streszczenia - Abstr.; Numeracja w czasopiśmie - Vol. 34, Iss. 6, Part B
- Keywords in English
- data centre, cloud computing, scheduling optimisation, machine learning, gradient boosting
- ASJC Classification
- DOI
- DOI:10.1016/j.jksuci.2022.04.008 Opening in a new tab
- URL
- https://www.sciencedirect.com/science/article/pii/S1319157822001367 Opening in a new tab
- Language
- eng (en) English
- License
- Score (nominal)
- 100
- Score source
- journalList
- Score
- Publication indicators
- Additional fields
- Indeksowana w: Scopus
- Uniform Resource Identifier
- https://cris.pk.edu.pl/info/article/CUTf0b9d039da2747d3859bac0391745673/
- URN
urn:pkr-prod:CUTf0b9d039da2747d3859bac0391745673
* presented citation count is obtained through Internet information analysis, and it is close to the number calculated by the Publish or PerishOpening in a new tab system.