Editing Load balancing (computing) (section)

==Problem overview==
A load-balancing algorithm always tries to answer a specific problem. Among other things, the nature of the tasks, the algorithmic [[Computational complexity|complexity]], the hardware architecture on which the algorithms will run as well as required [[error tolerance]], must be taken into account. Therefore compromise must be found to best meet application-specific requirements.

===Nature of tasks===
The efficiency of load balancing algorithms critically depends on the nature of the tasks. Therefore, the more information about the tasks is available at the time of decision making, the greater the potential for optimization.

====Size of tasks====
Perfect knowledge of the [[execution time]] of each of the tasks allows to reach an optimal load distribution (see algorithm of [[prefix sum]]).<ref name="Sequential and parallel algorithms">{{cite book |last1=Sanders |first1=Peter |last2=Mehlhorn |first2=Kurt |last3=Dietzfelbinger |first3=Martin |last4=Dementiev |first4=Roman |title=Sequential and parallel algorithms and data structures : the basic toolbox |date=11 September 2019 |publisher=Springer |isbn=978-3-030-25208-3}}</ref> Unfortunately, this is in fact an idealized case. Knowing the exact [[execution time]] of each task is an extremely rare situation.

For this reason, there are several techniques to get an idea of the different execution times. First of all, in the fortunate scenario of having tasks of relatively homogeneous size, it is possible to consider that each of them will require approximately the average execution time. If, on the other hand, the execution time is very irregular, more sophisticated techniques must be used. One technique is to add some [[metadata]] to each task. Depending on the previous execution time for similar metadata, it is possible to make inferences for a future task based on statistics.<ref>{{cite journal |last1=Liu |first1=Qi |last2=Cai |first2=Weidong |last3=Jin |first3=Dandan |last4=Shen |first4=Jian |last5=Fu |first5=Zhangjie |last6=Liu |first6=Xiaodong |last7=Linge |first7=Nigel |title=Estimation Accuracy on Execution Time of Run-Time Tasks in a Heterogeneous Distributed Environment |journal=Sensors |date=30 August 2016 |volume=16 |issue=9 |pages=1386 |doi=10.3390/s16091386|pmid=27589753 |pmc=5038664 |bibcode=2016Senso..16.1386L |s2cid=391429 |doi-access=free }}</ref>

====Dependencies====
In some cases, tasks depend on each other. These interdependencies can be illustrated by a [[directed acyclic graph]]. Intuitively, some tasks cannot begin until others are completed.

Assuming that the required time for each of the tasks is known in advance, an optimal execution order must lead to the minimization of the total execution time. Although this is an [[NP-hard]] problem and therefore can be difficult to be solved exactly. There are algorithms, like [[job scheduler]], that calculate optimal task distributions using [[metaheuristic]] methods.

====Segregation of tasks====
Another feature of the tasks critical for the design of a load balancing algorithm is their ability to be broken down into subtasks during execution. The "Tree-Shaped Computation" algorithm presented later takes great advantage of this specificity.

===Static and dynamic algorithms===

====Static====
A load balancing algorithm is "static" when it does not take into account the state of the system for the distribution of tasks. Thereby, the system state includes measures such as the [[Load (computing)|load level]] (and sometimes even overload) of certain processors. Instead, assumptions about the overall system are made beforehand, such as the arrival times and resource requirements of incoming tasks. In addition, the number of processors, their respective power and communication speeds are known. Therefore, static load balancing aims to associate a known set of tasks with the available processors in order to minimize a certain performance function. The trick lies in the concept of this performance function.

Static load balancing techniques are commonly centralized around a router, or [[Master/slave (technology)|Master]], which distributes the loads and optimizes the performance function. This minimization can take into account information related to the tasks to be distributed, and derive an expected execution time.

The advantage of static algorithms is that they are easy to set up and extremely efficient in the case of fairly regular tasks (such as processing [[HTTP]] requests from a website). However, there is still some statistical variance in the assignment of tasks which can lead to the overloading of some computing units.

====Dynamic====
Unlike static load distribution algorithms, dynamic algorithms take into account the current load of each of the computing units (also called nodes) in the system. In this approach, tasks can be moved dynamically from an overloaded node to an underloaded node in order to receive faster processing. While these algorithms are much more complicated to design, they can produce excellent results, in particular, when the execution time varies greatly from one task to another.

Dynamic load balancing architecture can be more [[Modular design|modular]] since it is not mandatory to have a specific node dedicated to the distribution of work. When tasks are uniquely assigned to a processor according to their state at a given moment, it is a unique assignment. If, on the other hand, the tasks can be permanently redistributed according to the state of the system and its evolution, this is called dynamic assignment.<ref>{{cite journal |last1=Alakeel |first1=Ali |title=A Guide to Dynamic Load Balancing in Distributed Computer Systems |journal=International Journal of Computer Science and Network Security  |date=November 2009 |volume=10 |url=https://www.researchgate.net/publication/268200851}}</ref> Obviously, a load balancing algorithm that requires too much communication in order to reach its decisions runs the risk of slowing down the resolution of the overall problem.

===Hardware architecture===
====Heterogeneous machines====
[[Parallel computing]] infrastructures are often composed of units of different [[computing power]], which should be taken into account for the load distribution.

For example, lower-powered units may receive requests that require a smaller amount of computation, or, in the case of homogeneous or unknown request sizes, receive fewer requests than larger units.

====Shared and distributed memory====
Parallel computers are often divided into two broad categories: those where all processors share a single common memory on which they read and write in parallel ([[Parallel random-access machine|PRAM]] model), and those where each computing unit has its own memory ([[distributed memory]] model), and where information is exchanged by messages.

For [[shared-memory]] computers, managing write conflicts greatly slows down the speed of individual execution of each computing unit. However, they can work perfectly well in parallel. Conversely, in the case of message exchange, each of the processors can work at full speed. On the other hand, when it comes to collective message exchange, all processors are forced to wait for the slowest processors to start the communication phase.
 
In reality, few systems fall into exactly one of the categories. In general, the processors each have an internal memory to store the data needed for the next calculations and are organized in successive [[Computer cluster|clusters]]. Often, these processing elements are then coordinated through [[distributed memory]] and [[message passing]]. Therefore, the load balancing algorithm should be uniquely adapted to a parallel architecture. Otherwise, there is a risk that the efficiency of parallel problem solving will be greatly reduced.

====Hierarchy====
Adapting to the hardware structures seen above, there are two main categories of load balancing algorithms. On the one hand, the one where tasks are assigned by “master” and executed by “workers” who keep the master informed of the progress of their work, and the master can then take charge of assigning or reassigning the workload in case of the dynamic algorithm. The literature refers to this as [[Master/slave (technology)|"Master-Worker"]] architecture. On the other hand, the control can be distributed between the different nodes. The load balancing algorithm is then executed on each of them and the responsibility for assigning tasks (as well as re-assigning and splitting as appropriate) is shared. The last category assumes a dynamic load balancing algorithm.

Since the design of each load balancing algorithm is unique, the previous distinction must be qualified. Thus, it is also possible to have an intermediate strategy, with, for example, "master" nodes for each sub-cluster, which are themselves subject to a global "master". There are also multi-level organizations, with an alternation between master-slave and distributed control strategies. The latter strategies quickly become complex and are rarely encountered. Designers prefer algorithms that are easier to control.

====Adaptation to larger architectures (scalability)====
In the context of algorithms that run over the very long term (servers, cloud...), the computer architecture evolves over time. However, it is preferable not to have to design a new algorithm each time.

An extremely important parameter of a load balancing algorithm is therefore its ability to adapt to scalable hardware architecture. This is called the [[scalability]] of the algorithm. An algorithm is called scalable for an input parameter when its performance remains relatively independent of the size of that parameter.

When the algorithm is capable of adapting to a varying number of computing units, but the number of computing units must be fixed before execution, it is called moldable. If, on the other hand, the algorithm is capable of dealing with a fluctuating amount of processors during its execution, the algorithm is said to be malleable. Most load balancing algorithms are at least moldable.<ref>{{cite book |last1=Asghar |first1=Sajjad |last2=Aubanel |first2=Eric |last3=Bremner |first3=David |title=2013 42nd International Conference on Parallel Processing |chapter=A Dynamic Moldable Job Scheduling Based Parallel SAT Solver |date=October 2013 |pages=110–119 |doi=10.1109/ICPP.2013.20 |isbn=978-0-7695-5117-3 |s2cid=15124201 }}</ref>

===Fault tolerance===
Especially in large-scale [[computing cluster]]s, it is not tolerable to execute a parallel algorithm that cannot withstand the failure of one single component. Therefore, [[fault tolerant]] algorithms are being developed which can detect outages of processors and recover the computation.<ref>{{cite book |last1=Punetha Sarmila |first1=G. |last2=Gnanambigai |first2=N. |last3=Dinadayalan |first3=P. |title=2015 2nd International Conference on Electronics and Communication Systems (ICECS) |chapter=Survey on fault tolerant &mdash; Load balancing algorithmsin cloud computing |date=2015 |pages=1715–1720 |doi=10.1109/ECS.2015.7124879 |isbn=978-1-4799-7225-8 |s2cid=30175022 }}</ref>