Cluster management systems, such as Google’s Borg, run hundreds of thousands of jobs across tens of thousands of machines with the goal of achieving high utilization via effective load balancing, efficient task placement, and machine sharing. Load balancing is the process of distributing network traffic or computational workloads across multiple servers or computing resources, and it is one of the most critical components of a modern cluster management system. Effective load balancing is critical to improving the performance, robustness and scalability of the system.
In the classical formulation of the online load balancing problem, computational jobs arrive one-by-one and, as soon as a job arrives, it must be assigned to one of several machines. Each job may impose different processing loads on different machines, and the load incurred by a machine depends on the jobs that are assigned to it. The goal of a load balancing algorithm is to minimize the maximum load on any machine. Online algorithms are those designed for situations where the input to the system is revealed to the algorithm piece by piece.
Online problems are common in decision-making scenarios that have uncertainty, including the ski-rental problem, secretary problem, caching and scheduling problems, and many others. Scheduling and load balancing questions are prevalent in resource management for large-scale systems leading to research into many real-world scheduling problems, including maintaining consistent allocation of clients to servers and, more recently, platforms for AI workloads. Traditionally, online algorithms for scheduling and load balancing are studied through the lens of competitive analysis. The competitive ratio of an online algorithm quantifies the worst-case performance of the algorithm relative to an optimal offline algorithm that knows future jobs, specifically by determining the worst-case ratio of the cost incurred by the two algorithms over all possible sequences of jobs.
In “Online Load and Graph Balancing for Random Order Inputs”, presented at SPAA 2024, we study the competitive ratio of online load balancing problems when jobs arrive in uniformly random order (i.e., when each possible permutation of job arrival sequences is equally likely). We show new limitations on how well deterministic online algorithms can perform in this setting.