Editing Load balancing (computing) (section)

====Server-side load balancers====
For Internet services, a server-side load balancer is usually a software program that is listening on the [[TCP and UDP port|port]] where external clients connect to access services.  The load balancer forwards requests to one of the "backend" servers, which usually replies to the load balancer. This allows the load balancer to reply to the client without the client ever knowing about the internal separation of functions. It also prevents clients from contacting back-end servers directly, which may have security benefits by hiding the structure of the internal network and preventing attacks on the kernel's network stack or unrelated services running on other ports.

Some load balancers provide a mechanism for doing something special in the event that all backend servers are unavailable. This might include forwarding to a backup load balancer or displaying a message regarding the outage.

It is also important that the load balancer itself does not become a [[single point of failure]]. Usually, load balancers are implemented in [[High availability|high-availability]] pairs which may also replicate session persistence data if required by the specific application.<ref>{{cite web|url= http://www.linuxvirtualserver.org/HighAvailability.html|title= High Availability|publisher=linuxvirtualserver.org | access-date=2013-11-20}}</ref> Certain applications are programmed with immunity to this problem, by offsetting the load balancing point over differential sharing platforms beyond the defined network. The sequential algorithms paired to these functions are defined by flexible parameters unique to the specific database.<ref>{{cite journal |last1=Ranjan |first1=R |title=Peer-to-peer cloud provisioning: Service discovery and load-balancing |journal=Cloud Computing |date=2010}}</ref>

=====Scheduling algorithms=====
Numerous [[scheduling algorithm]]s, also called load-balancing methods, are used by load balancers to determine which back-end server to send a request to.
Simple algorithms include random choice, [[Round-robin scheduling|round robin]], or least connections.<ref name=":0">{{Cite web|url=https://f5.com/resources/white-papers/load-balancing-101-nuts-and-bolts|archive-url=https://web.archive.org/web/20171205223948/https://f5.com/resources/white-papers/load-balancing-101-nuts-and-bolts|url-status=dead|archive-date=2017-12-05|title=Load Balancing 101: Nuts and Bolts|date=2017-12-05|publisher=[[F5, Inc.|F5]]|access-date=2018-03-23}}</ref> More sophisticated load balancers may take additional factors into account, such as a server's reported load, least response times, up/down status (determined by a monitoring poll of some kind), a number of active connections, geographic location, capabilities, or how much traffic it has recently been assigned.

=====Persistence=====
An important issue when operating a load-balanced service is how to handle information that must be kept across the multiple requests in a user's session. If this information is stored locally on one backend server, then subsequent requests going to different backend servers would not be able to find it. This might be cached information that can be recomputed, in which case load-balancing a request to a different backend server just introduces a performance issue.<ref name=":0" />

Ideally, the cluster of servers behind the load balancer should not be session-aware, so that if a client connects to any backend server at any time the user experience is unaffected. This is usually achieved with a shared database or an in-memory session database like [[Memcached]].

One basic solution to the session data issue is to send all requests in a user session consistently to the same backend server. This is known as "persistence" or "stickiness". A significant downside to this technique is its lack of automatic [[failover]]: if a backend server goes down, its per-session information becomes inaccessible, and any sessions depending on it are lost. The same problem is usually relevant to central database servers; even if web servers are "stateless" and not "sticky", the central database is (see below).

Assignment to a particular server might be based on a username, client IP address, or random. Because of changes in the client's perceived address resulting from [[DHCP]], [[network address translation]], and [[web proxy|web proxies]] this method may be unreliable. Random assignments must be remembered by the load balancer, which creates a burden on storage. If the load balancer is replaced or fails, this information may be lost, and assignments may need to be deleted after a timeout period or during periods of high load to avoid exceeding the space available for the assignment table. The random assignment method also requires that clients maintain some state, which can be a problem, for example when a web browser has disabled the storage of cookies. Sophisticated load balancers use multiple persistence techniques to avoid some of the shortcomings of any one method.

Another solution is to keep the per-session data in a [[database]]. This is generally bad for performance because it increases the load on the database: the database is best used to store information less transient than per-session data. To prevent a database from becoming a [[single point of failure]], and to improve [[scalability]], the database is often replicated across multiple machines, and load balancing is used to spread the query load across those replicas.  [[Microsoft]]'s [[ASP.net]] State Server technology is an example of a session database. All servers in a web farm store their session data on State Server and any server in the farm can retrieve the data.

In the very common case where the client is a web browser, a simple but efficient approach is to store the per-session data in the browser itself. One way to achieve this is to use a [[HTTP cookie|browser cookie]], suitably time-stamped and encrypted. Another is [[URL rewriting]]. Storing session data on the client is generally the preferred solution: then the load balancer is free to pick any backend server to handle a request. However, this method of state-data handling is poorly suited to some complex business logic scenarios, where session state payload is big and recomputing it with every request on a server is not feasible. URL rewriting has major security issues because the end-user can easily alter the submitted URL and thus change session streams.

Yet another solution to storing persistent data is to associate a name with each block of data, and use a [[distributed hash table]] to pseudo-randomly assign that name to one of the available servers, and then store that block of data in the assigned server.

=====Load balancer features=====
Hardware and software load balancers may have a variety of special features. The fundamental feature of a load balancer is to be able to distribute incoming requests over a number of backend servers in the cluster according to a scheduling algorithm. Most of the following features are vendor specific:

; Asymmetric load
: A ratio can be manually assigned to cause some backend servers to get a greater share of the workload than others. This is sometimes used as a crude way to account for some servers having more capacity than others and may not always work as desired.
; Priority activation
: When the number of available servers drops below a certain number, or the load gets too high, standby servers can be brought online.
; [[TLS acceleration|TLS Offload and Acceleration]]
: TLS (or its predecessor SSL) acceleration is a technique of offloading cryptographic protocol calculations onto specialized hardware. Depending on the workload, processing the encryption and authentication requirements of a [[Transport Layer Security|TLS]] request can become a major part of the demand on the Web Server's CPU; as the demand increases, users will see slower response times, as the TLS overhead is distributed among Web servers. To remove this demand on Web servers, a balancer can terminate TLS connections, passing HTTPS requests as HTTP requests to the Web servers. If the balancer itself is not overloaded, this does not noticeably degrade the performance perceived by end-users. The downside of this approach is that all of the TLS processing is concentrated on a single device (the balancer) which can become a new bottleneck. Some load balancer appliances include specialized hardware to process TLS. Instead of upgrading the load balancer, which is quite expensive dedicated hardware, it may be cheaper to forgo TLS offload and add a few web servers. Also, some server vendors such as Oracle/Sun now incorporate cryptographic acceleration hardware into their CPUs such as the T2000. F5 Networks incorporates a dedicated TLS acceleration hardware card in their local traffic manager (LTM) which is used for encrypting and decrypting TLS traffic. One clear benefit to TLS offloading in the balancer is that it enables it to do balancing or content switching based on data in the HTTPS request.
; [[Distributed denial of service|Distributed Denial of Service]] (DDoS) attack protection
: Load balancers can provide features such as [[SYN cookies]] and delayed-binding (the back-end servers don't see the client until it finishes its TCP handshake) to mitigate [[SYN flood]] attacks and generally offload work from the servers to a more efficient platform.
; [[HTTP compression]]
: HTTP compression reduces the amount of data to be transferred for HTTP objects by utilising gzip compression available in all modern web browsers.  The larger the response and the further away the client is, the more this feature can improve response times.  The trade-off is that this feature puts additional CPU demand on the load balancer and could be done by web servers instead.
; [[TCP offload]]
: Different vendors use different terms for this, but the idea is that normally each HTTP request from each client is a different TCP connection.  This feature utilises HTTP/1.1 to consolidate multiple HTTP requests from multiple clients into a single TCP socket to the back-end servers.
; TCP buffering
: The load balancer can buffer responses from the server and spoon-feed the data out to slow clients, allowing the webserver to free a thread for other tasks faster than it would if it had to send the entire request to the client directly.
; Direct Server Return
: An option for asymmetrical load distribution, where request and reply have different network paths.
; Health checking
: The balancer polls servers for application layer health and removes failed servers from the pool.
; [[HTTP caching]]
: The balancer stores static content so that some requests can be handled without contacting the servers.
; Content filtering
: Some balancers can arbitrarily modify traffic on the way through.
; HTTP security
: Some balancers can hide HTTP error pages, remove server identification headers from HTTP responses, and encrypt cookies so that end users cannot manipulate them.
; [[Priority queuing]]
: Also known as [[rate shaping]], the ability to give different priorities to different traffic.
; Content-aware switching
: Most load balancers can send requests to different servers based on the URL being requested, assuming the request is not encrypted (HTTP) or if it is encrypted (via HTTPS) that the HTTPS request is terminated (decrypted) at the load balancer.
; Client authentication
: Authenticate users against a variety of authentication sources before allowing them access to a website.
; Programmatic traffic manipulation
: At least one balancer allows the use of a scripting language to allow custom balancing methods, arbitrary traffic manipulations, and more.
; [[Firewall (networking)|Firewall]]
: Firewalls can prevent direct connections to backend servers, for network security reasons.
; [[Intrusion prevention system]]
: Intrusion prevention systems offer application layer security in addition to the network/transport layer offered by firewall security.