Load Balancing for Network Servers
From Computing and Software Wiki
m (→References) |
|||
(22 intermediate revisions not shown) | |||
Line 2: | Line 2: | ||
==Effects of Load Balancing== | ==Effects of Load Balancing== | ||
- | + | Load balancing is motivated by the need to distribute load across enough resources to service network requests. It can be used to ease network traffic by routing requests to various destination hosts, to make use of the increased computing power available in multiple hosts, or to increase storage capacity by allowing the use of more than one host. These three problems can frequently go hand-in-hand and effective load balancing can solve all three. | |
- | + | ||
- | == | + | ===Common Scenarios=== |
+ | Load balancing is used in many network-related application but perhaps its best-documented use is in web applications. Web sites generally use servers with consumer hardware, since more powerful computers (such as supercomputers) are extremely expensive, and many of the issues involved with using (relatively) cheap hardware have been solved with increased research in distributed computing, of which load balancing is a part. Extremely popular websites such as Google, Slashdot, or Facebook each field hundreds (if not thousands) of requests per seconds, and having a single server servicing these requests is simply untenable. Bandwidth, compute power, and storage are all bottlenecks here. Even if enough bandwidth was available a single server would not have enough RAM to service the overhead involved in using TCP connections. | ||
+ | Large websites frequently use load balancing in three ways: | ||
+ | |||
+ | #Offloading bandwidth-intensive content such as images, video, and other media to '''Content Delivery Networks''', discussed briefly below. | ||
+ | #Redirecting a request to a server that is physically close to the user and/or is not at its load capacity | ||
+ | #Storing databases in segments on different servers. There can be heuristics to determine which parts of each database are stored on which server, or they can be randomly distributed. The idea is to reduce the number of requests that come to each database. | ||
+ | |||
+ | ==Methods of Load Balancing== | ||
===Dividing Servers Based On Use=== | ===Dividing Servers Based On Use=== | ||
+ | One of the simplest forms of load balancing, this method divides servers based on intended use. In the simplest of configurations, this may be one or more server which hosts or generates HTML pages, while a separate server (or multiple servers) hosts any images displayed on the site. For example, the user might access www.example.com, which would serve him the HTML with embedded image tags that point to images.example.com, where images.example.com points to a different host. | ||
+ | |||
+ | Content Delivery Networks are an extension of this concept. A '''CDN''' is a global network of servers owned by a third party such as ''[http://www.akamai.com Akamai]'' which hosts content for client web sites. This content is frequently larger files such as images, video, and other media. The CDN determines the server closest to the user and serves the requested content from there. CDNs are used to increase the speed at which users can access content by decreasing bandwidth demands on client (or "source") sites. The operation of CDNs is a very complex topic and is beyond the scope of this article. See [[#See Also | below]] for more information. | ||
+ | |||
+ | Web sites frequently separate their operations into two or three functional tiers or layers: | ||
+ | |||
+ | #Presentation tier – this tier takes care of rendering the web page. (For example, in a '''[http://en.wikipedia.org/wiki/LAMP_(software_bundle) LAMP]''' stack it executes PHP code to generate HTML and CSS.) | ||
+ | #Business tier – this tier involves processing data supplied by the data tier (below) and supplying the end result to the presentation tier for display to the user. | ||
+ | #Data tier – this tier is responsible for maintaining the data underlying the application. This frequently involves maintaining and querying databases. | ||
+ | |||
+ | This topology allows a '''separation of concerns''', an important design methodology for better code maintainability, but by assigning each server a specific task, it allows computation to be easily distributed among servers. This topology is known as a '''[http://en.wikipedia.org/wiki/Multitier_architecture#Three-tier_architecture three-tier architecture]'''. | ||
===Multiple Redundant Servers=== | ===Multiple Redundant Servers=== | ||
+ | In this scenario there are multiple servers which all serve the same function. For example, when a user connects to a large website like Google, they may be connecting to any one of several hundred or thousand servers, but regardless of which server he is connected to, the user still gets the same Google homepage and can make searches just the same way. This is what is usually meant by the phrase "load balancing." | ||
====Traditional Load Balancing==== | ====Traditional Load Balancing==== | ||
+ | Load balancing must occur transparently to the user. It is therefore necessary to use '''Virtual IPs''' or '''VIPs'''. A Virtual IP is "the load-balancing instance where there world points its browsers to get to a site. A VIP has an IP address which must be publicly available to be useable" [1]. There can be one or more machines which answer to this IP—these are the load balancing machines. All traffic travels through these computers, which route traffic to hosts based on a variety of algorithms. Where there is more than one load balancing machine, failover protection is used so that if one machine fails the other load balancer can take over its traffic. This can be done in a variety of ways, such as with the '''[http://en.wikipedia.org/wiki/VRRP Virtual Router Redundancy Protocol]''' and its proprietary cousins, and fail-over cable, among others. | ||
+ | |||
+ | Many load balancing servers address the issue of persistence or '''stickiness''', which stipulates that whichever server is used at the beginning of a user's session must be used for the rest of the duration of that session. This requires a higher awareness of the TCP communications on the part of the load balancer, but is essential for persistent web applications such as stores. | ||
+ | |||
+ | Load balancing servers are responsible for manipulating packet headers through a process termed '''Network Address Translation''' ('''NAT''') to properly route the packets. This nicely circumvents the TCP requirement that the source and destination addresses in incoming and outgoing packets on a single connection to match. (This would be a problem since the server answering the user's request does not have the same IP address as the server to which the connection was made—the load balancer.) | ||
+ | |||
+ | A packet in a typical load balancing system usually goes through the following progression [1]: | ||
+ | #The packet is sent from the user's computer (host A) to the load balancer (host B). Naturally, the source address is A and the destination address is B. | ||
+ | #The load balancer chooses a server (host C) which will service the user's request. It rewrites the packet header so that the destination address reads as C. The source address remains as A. | ||
+ | #Host C finishes serving the request and sends a response back with source C and destination A. However, as discussed above, A will discard the packet because its source address does not match the connection (it expects a packet with source B). However the default route outgoing from C is B, so the load balancing server is able to intercept the packet on its way back to the user. | ||
+ | #Host B rewrites the packet header again, setting the source field to B. The packet now has source B and destination A, and is delivered properly to the user's computer. | ||
====Optimizing the Return Trip==== | ====Optimizing the Return Trip==== | ||
+ | The advantage of the algorithm described above is that it is simple and can be done on a relatively high level. Since only IP addresses are rewritten, it can be dealt with entirely on Level 3 (Networking layer) of the '''[http://en.wikipedia.org/wiki/OSI_model OSI model]'''. However it can be optimized by eliminating step three—this would reduce traffic through the load balancing server. (Since most Internet usage has a ratio 8:1 incoming traffic to outgoing traffic, this optimization can reduce the amount of traffic going through the load balancing server by 8 times.) This optimization is entitled '''Direct Server Return''' and operates on Level 2 (Data Link layer) of the OSI model, using '''MAC Address Translation''' (or '''MAT''') [1]. | ||
+ | *The real server (host C in the above scenario) is assigned its own IP Address as well as the load balancer's IP. Since it is impossible to have duplicate IP addresses on the same network, so this address is bound to host C's loopback interface. (This is done through aliases, so that the loopback interface also maintains its traditional 127.0.0.1 address.) | ||
+ | *The server software (such as a web server like Apache) is bound to this new address on the loopback interface. | ||
+ | The packet goes through the following steps [1]: | ||
+ | #The packet is sent from the user's computer (host A) to the load balancer (host B). The source address is A and the destination address is B. | ||
+ | #The load balancer (B) chooses a server (host C) which will service the user's request. Instead of rewriting the packet header to point at C, it keeps its own address (B) as the destination. (Remember, host C has this address configured on its loopback interface) Instead of changing the destination IP address, it rewrites the destination MAC address. This means that the packet will be sent to C, but whereas the packet would normally be dropped (since C would not normally also be configured on B's IP address), C accepts the packet. | ||
+ | #Host C finishes serving the request and sends a response back with source B and destination A. The packet is delivered directly to the user's computer, which accepts the packet. | ||
+ | |||
+ | ===Caching=== | ||
+ | Caching is used to reduce load on servers in the data tier and sometimes also in the business tier. In principle, results of common database queries are cached in RAM so that fewer requests necessitate a read from disk. Traditional, platter-based hard drives are usually the slowest component in any server, which creates a significant bottleneck in applications that rely heavily on databases. Additionally, some database operations such as JOINs are computationally very expensive. If the result of these costly queries can be cached in RAM, fewer disk accesses are required in normal operation of the application and therefore the application will be faster, and load on the database server will be reduced. | ||
+ | |||
+ | In LAMP stacks one of the most popular caching tools is '''[http://www.danga.com/memcached/ Memcache]'''. Memcache implements what is essentially a very large hash map which allows very quick access to cached data. It works with PHP, Java, Ruby, Python, and several other languages. | ||
+ | |||
+ | ==See Also== | ||
+ | *[http://en.wikipedia.org/wiki/Content_delivery_network Content Delivery Network] | ||
+ | *[http://en.wikipedia.org/wiki/LAMP_(software_bundle) LAMP] | ||
+ | *[http://en.wikipedia.org/wiki/Memcache Memcache] | ||
+ | *[http://en.wikipedia.org/wiki/Multitier_architecture Multitier Architecture] | ||
==References== | ==References== | ||
# Tony Bourke: ''Server Load Balancing'', O'Reilly, ISBN 0-596-00050-2 | # Tony Bourke: ''Server Load Balancing'', O'Reilly, ISBN 0-596-00050-2 | ||
- | # Matthew Syme, Philip Goldie: ''Optimizing network performance with content switching'', Prentice Hall PTR, ISBN | + | # Matthew Syme, Philip Goldie: ''Optimizing network performance with content switching'', Prentice Hall PTR, ISBN 0-13-101468-4 |
+ | # [http://www.flowgram.com/f/p.html#2qi3k8eicrfgkv Jason Sobel: ''Needle in a Haystack: Efficient Storage of Billions of Photos''] | ||
+ | # [http://www.infoq.com/presentations/Facebook-Software-Stack Aditya Agarwal: ''Facebook: Science and the Social Graph''] | ||
+ | |||
+ | |||
+ | --[[User:Sadlerjw|Sadlerjw]] 22:31, 12 April 2009 (EDT) |
Current revision as of 02:31, 13 April 2009
Load balancing refers to methods used to distribute network traffic amongst multiple hosts. This can be done by having different hosts used for different tasks (for example, separate servers for image and text content for a website) or by using a pool of redundant servers from which a load balancer can choose a single host to use for a given connection. Load balancing is usually achieved transparently to the client—that is, the service requested by the client appears to come from one place, even though it may be coming from multiple servers or a server at a different IP address.
Contents |
Effects of Load Balancing
Load balancing is motivated by the need to distribute load across enough resources to service network requests. It can be used to ease network traffic by routing requests to various destination hosts, to make use of the increased computing power available in multiple hosts, or to increase storage capacity by allowing the use of more than one host. These three problems can frequently go hand-in-hand and effective load balancing can solve all three.
Common Scenarios
Load balancing is used in many network-related application but perhaps its best-documented use is in web applications. Web sites generally use servers with consumer hardware, since more powerful computers (such as supercomputers) are extremely expensive, and many of the issues involved with using (relatively) cheap hardware have been solved with increased research in distributed computing, of which load balancing is a part. Extremely popular websites such as Google, Slashdot, or Facebook each field hundreds (if not thousands) of requests per seconds, and having a single server servicing these requests is simply untenable. Bandwidth, compute power, and storage are all bottlenecks here. Even if enough bandwidth was available a single server would not have enough RAM to service the overhead involved in using TCP connections.
Large websites frequently use load balancing in three ways:
- Offloading bandwidth-intensive content such as images, video, and other media to Content Delivery Networks, discussed briefly below.
- Redirecting a request to a server that is physically close to the user and/or is not at its load capacity
- Storing databases in segments on different servers. There can be heuristics to determine which parts of each database are stored on which server, or they can be randomly distributed. The idea is to reduce the number of requests that come to each database.
Methods of Load Balancing
Dividing Servers Based On Use
One of the simplest forms of load balancing, this method divides servers based on intended use. In the simplest of configurations, this may be one or more server which hosts or generates HTML pages, while a separate server (or multiple servers) hosts any images displayed on the site. For example, the user might access www.example.com, which would serve him the HTML with embedded image tags that point to images.example.com, where images.example.com points to a different host.
Content Delivery Networks are an extension of this concept. A CDN is a global network of servers owned by a third party such as Akamai which hosts content for client web sites. This content is frequently larger files such as images, video, and other media. The CDN determines the server closest to the user and serves the requested content from there. CDNs are used to increase the speed at which users can access content by decreasing bandwidth demands on client (or "source") sites. The operation of CDNs is a very complex topic and is beyond the scope of this article. See below for more information.
Web sites frequently separate their operations into two or three functional tiers or layers:
- Presentation tier – this tier takes care of rendering the web page. (For example, in a LAMP stack it executes PHP code to generate HTML and CSS.)
- Business tier – this tier involves processing data supplied by the data tier (below) and supplying the end result to the presentation tier for display to the user.
- Data tier – this tier is responsible for maintaining the data underlying the application. This frequently involves maintaining and querying databases.
This topology allows a separation of concerns, an important design methodology for better code maintainability, but by assigning each server a specific task, it allows computation to be easily distributed among servers. This topology is known as a three-tier architecture.
Multiple Redundant Servers
In this scenario there are multiple servers which all serve the same function. For example, when a user connects to a large website like Google, they may be connecting to any one of several hundred or thousand servers, but regardless of which server he is connected to, the user still gets the same Google homepage and can make searches just the same way. This is what is usually meant by the phrase "load balancing."
Traditional Load Balancing
Load balancing must occur transparently to the user. It is therefore necessary to use Virtual IPs or VIPs. A Virtual IP is "the load-balancing instance where there world points its browsers to get to a site. A VIP has an IP address which must be publicly available to be useable" [1]. There can be one or more machines which answer to this IP—these are the load balancing machines. All traffic travels through these computers, which route traffic to hosts based on a variety of algorithms. Where there is more than one load balancing machine, failover protection is used so that if one machine fails the other load balancer can take over its traffic. This can be done in a variety of ways, such as with the Virtual Router Redundancy Protocol and its proprietary cousins, and fail-over cable, among others.
Many load balancing servers address the issue of persistence or stickiness, which stipulates that whichever server is used at the beginning of a user's session must be used for the rest of the duration of that session. This requires a higher awareness of the TCP communications on the part of the load balancer, but is essential for persistent web applications such as stores.
Load balancing servers are responsible for manipulating packet headers through a process termed Network Address Translation (NAT) to properly route the packets. This nicely circumvents the TCP requirement that the source and destination addresses in incoming and outgoing packets on a single connection to match. (This would be a problem since the server answering the user's request does not have the same IP address as the server to which the connection was made—the load balancer.)
A packet in a typical load balancing system usually goes through the following progression [1]:
- The packet is sent from the user's computer (host A) to the load balancer (host B). Naturally, the source address is A and the destination address is B.
- The load balancer chooses a server (host C) which will service the user's request. It rewrites the packet header so that the destination address reads as C. The source address remains as A.
- Host C finishes serving the request and sends a response back with source C and destination A. However, as discussed above, A will discard the packet because its source address does not match the connection (it expects a packet with source B). However the default route outgoing from C is B, so the load balancing server is able to intercept the packet on its way back to the user.
- Host B rewrites the packet header again, setting the source field to B. The packet now has source B and destination A, and is delivered properly to the user's computer.
Optimizing the Return Trip
The advantage of the algorithm described above is that it is simple and can be done on a relatively high level. Since only IP addresses are rewritten, it can be dealt with entirely on Level 3 (Networking layer) of the OSI model. However it can be optimized by eliminating step three—this would reduce traffic through the load balancing server. (Since most Internet usage has a ratio 8:1 incoming traffic to outgoing traffic, this optimization can reduce the amount of traffic going through the load balancing server by 8 times.) This optimization is entitled Direct Server Return and operates on Level 2 (Data Link layer) of the OSI model, using MAC Address Translation (or MAT) [1].
- The real server (host C in the above scenario) is assigned its own IP Address as well as the load balancer's IP. Since it is impossible to have duplicate IP addresses on the same network, so this address is bound to host C's loopback interface. (This is done through aliases, so that the loopback interface also maintains its traditional 127.0.0.1 address.)
- The server software (such as a web server like Apache) is bound to this new address on the loopback interface.
The packet goes through the following steps [1]:
- The packet is sent from the user's computer (host A) to the load balancer (host B). The source address is A and the destination address is B.
- The load balancer (B) chooses a server (host C) which will service the user's request. Instead of rewriting the packet header to point at C, it keeps its own address (B) as the destination. (Remember, host C has this address configured on its loopback interface) Instead of changing the destination IP address, it rewrites the destination MAC address. This means that the packet will be sent to C, but whereas the packet would normally be dropped (since C would not normally also be configured on B's IP address), C accepts the packet.
- Host C finishes serving the request and sends a response back with source B and destination A. The packet is delivered directly to the user's computer, which accepts the packet.
Caching
Caching is used to reduce load on servers in the data tier and sometimes also in the business tier. In principle, results of common database queries are cached in RAM so that fewer requests necessitate a read from disk. Traditional, platter-based hard drives are usually the slowest component in any server, which creates a significant bottleneck in applications that rely heavily on databases. Additionally, some database operations such as JOINs are computationally very expensive. If the result of these costly queries can be cached in RAM, fewer disk accesses are required in normal operation of the application and therefore the application will be faster, and load on the database server will be reduced.
In LAMP stacks one of the most popular caching tools is Memcache. Memcache implements what is essentially a very large hash map which allows very quick access to cached data. It works with PHP, Java, Ruby, Python, and several other languages.
See Also
References
- Tony Bourke: Server Load Balancing, O'Reilly, ISBN 0-596-00050-2
- Matthew Syme, Philip Goldie: Optimizing network performance with content switching, Prentice Hall PTR, ISBN 0-13-101468-4
- Jason Sobel: Needle in a Haystack: Efficient Storage of Billions of Photos
- Aditya Agarwal: Facebook: Science and the Social Graph
--Sadlerjw 22:31, 12 April 2009 (EDT)