Understanding the Importance of Interactions among Job Scheduling Policies

Logo poskytovatele

Varování

Publikace nespadá pod Filozofickou fakultu, ale pod Fakultu informatiky. Oficiální stránka publikace je na webu muni.cz.
Autoři

TÓTH Šimon KLUSÁČEK Dalibor

Rok publikování 2014
Druh Článek ve sborníku
Konference Memics 2014
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
Obor Informatika
Klíčová slova Scheduling; Queues; Fairshare; Simulation
Popis Many studies in the past two decades focused on the problem of efficient job scheduling in large computational systems. While many new scheduling algorithms have been proposed, mainstream resource management systems and schedulers are still using only a limited set of scheduling policies. For example, the core of the system is generally based on the simple First Come First Served (FCFS) approach, while backfilling (a trivial optimization of FCFS to increase utilization) is typically the most advanced option available. Since backfilling has been proposed in 1995, it is obvious that there is some misunderstanding between the research community and system administrators concerning "what is really important". In this work -- recently presented at the Euro-Par conference -- we show that the problem of operating a production scheduler is far more complex than just choosing a proper scheduling algorithm. Using our experience from the Czech National Grid Infrastructure MetaCentrum we explain several additional challenges that appear when searching for a functional solution. These problems are related to the fact that real systems must meet far more complicated requirements than those that are typically considered in classical research papers. In fact, production systems need to balance various policies that are set in place to satisfy both resource providers and users. While many works address these separate policies, e.g., fairshare for fair resource allocation, complex interactions between policies are not properly discussed in the literature. In our work we describe how to approach these interactions when developing site-specific policies. Notably, we describe how (priority) queues interact with scheduling algorithms, fairshare and with anti-starvation mechanisms. Moreover, we present a~case study describing how detailed simulations were used to find new configuration for MetaCentrum, significantly increasing its performance.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.