Insert Raw Code Here: HAProxy and Keepalived on Debian Squeeze for failover and loadbalancing

Building a failover load balancing cluster on four machines with HAProxy and Keepalived in Debian Squeeze

So you've got a big-ass VMWare machine with some servers to spare? Lets put them to work creating that redundancy your boss always nags you about whenever there is a split-second of downtime. In this post, I'll dive into how you can build a basic load-balancing high availability cluster either on VMWare or with separate bare-metal servers.

I've made a few assumptions about network topography and services used based on my own server environment and the services I work with which are mainly Drupal servers and Varnish cache servers. I usually run my Drupal backends on their own servers fronted by a Varnish server on its own box. I run Debian 5 (Lenny) and Debian 6 (Squeeze). Other than that, it's all pretty basic stuff.

What I'll build is a solution that will project one server to the outside but on the inside it will consist of four servers, in essence a small high-availability cluster. The cluster can be extended infinitely in all levels should the need arise.

High availability cluster

The pros of this kind of setup are redundancy, load balancing, ease of maintenance and the possibility to do proxying for all kinds of tcp connections, not just http but for basically any service over TCP.

The cons are that it can take quite a bit of debugging if anything starts behaving funky and, obviously, it requires a few spare servers and IPs.

Outline

There are a few steps we need to go through to get this up and running:

Install HAProxy and Keepalived
Configure HAProxy and Keepalived
Configure sysctl
Configure cache servers/backends
Start services
Verify that the system is up
Verify failover function

The tools

To provide redundancy I'll be using Keepalived, a simple and robust linux routing software written in C that provides failover functionality via the virtual routing protocol VRRP
Keepalived also provides layer 4 load balancing. Keepalived is responsible for maintaining the shared public IP and determining which server is alive.

To provide true layer 7 load balancing I will be using HAProxy. HAProxy is a fast, free and reliable TCP load balancing, proxying and high availability software that provides us with the parts needed to finish our cluster. HAProxy determines the health of the backends - removing any one that fails - and distributes the load between them. HAProxy also provides sticky sessions through cookies that pin each visitor to it's own backend.
HAProxy is also light on resources, easily handling thousands of connections on cheap hardware

To make it all happen, I'm using four servers. Two will be serving as load balancers and two as cache servers. The load balancers run HAProxy and Keepalived and the cache servers run Varnish and Apache (which you could replace with Nginx or whatever). The reason the cache servers need to run Apache as well as Varnish is because they need to be able to serve a file on their default IP and Varnish doesn't handle that (yet). Of course, many are running backend and cache server on the same box so in that case, this is not a problem.

The idea of it all is to have the load balancers sharing one IP in a master/backup setup. Since they share one IP there will only be one server visible to the outside at any one time. Should one of them fail, the other instantly picks up the IP and resumes business. For this to work, we need to use the Keepalived daemon.

For the load balancing part, we use HAProxy which works in tandem with Keepalived to assure that the backup server takes over if HAProxy should fail. HAProxy provides another layer of failover by monitoring our cache servers and removing them if they fail to respond.

The servers are:

proxy1 haproxy, keepalived, ip 11.22.33.42 (sharing ip 11.22.33.44)

proxy2 haproxy, keepalived, ip 11.22.33.43 (sharing ip 11.22.33.44)

cache1 varnish, apache2, ip 11.22.33.45

cache2 varnish, apache2, ip 11.22.33.46

Install HAProxy and Keepalived

I won't cover installation of web servers and caching servers.

For the load balancers you need to install HAProxy and Keepalived which are available in the usual deb repositiories.

root@proxy1: # apt-get install haproxy keepalived

root@proxy2: # apt-get install haproxy keepalived

On Squeeze, this will also install ipvsadm and ask you to run dpkg-reconfigure to enable it but this can safely be ignored since Keepalived will load necessary parts from it.

Configure HAProxy and Keepalived

UPDATE: Adjusted HAProxy config as per input from Willy Tarreau, resulting in a nice extra 1000 req/s throughput increase
This is a very basic setup to get the load balancing and failover running

/etc/keepalived/keepalived.conf on proxy1:

vrrp_script chk_haproxy { # Requires keepalived-1.1.13

script "killall -0 haproxy" # cheaper than pidof

interval 2 # check every 2 seconds

weight 2 # add 2 points of prio if OK

}

vrrp_instance VI_1 {

interface eth0

state MASTER

virtual_router_id 51

priority 101 # 101 on master, 100 on backup

virtual_ipaddress {

11.22.33.44 # supply your own spare public ip

}

track_script {

chk_haproxy

}

/etc/keepalived/keepalived.conf on proxy2:

vrrp_script chk_haproxy { # Requires keepalived-1.1.13

script "killall -0 haproxy" # cheaper than pidof

interval 2 # check every 2 seconds

weight 2 # add 2 points of prio if OK

}

vrrp_instance VI_1 {

interface eth0

state MASTER

virtual_router_id 51

priority 100 # 101 on master, 100 on backup

virtual_ipaddress {

11.22.33.44 # supply your own spare public ip

}

track_script {

chk_haproxy

}

/etc/default/haproxy on proxy1 and proxy2:

# Set ENABLED to 1 if you want the init script to start haproxy.

ENABLED=1

# Add extra flags here.

#EXTRAOPTS="-de -m 16"

/etc/haproxy/haproxy.cfg on proxy1 and proxy2:

global

log 127.0.0.1 local0

log 127.0.0.1 local1 notice

#log loghost local0 info

maxconn 4096

#debug

#quiet

user haproxy

group haproxy

daemon

defaults

log global

mode http

option httplog

option dontlognull

retries 3

option redispatch

maxconn 2000

contimeout 5000

clitimeout 50000

srvtimeout 50000

listen webfarm *:80

mode http

stats enable

stats auth user:pass

balance roundrobin

cookie SERVERID insert # pin visitor to server

option http-server-close # Thanks Willy!

option forwardfor

option httpchk HEAD /check.txt HTTP/1.0

# change IP to your cacheservers public IP

server webA 11.22.33.45:80 cookie A check

# change IP to your cacheservers public IP

server webB 11.22.33.46:80 cookie B check

Configure sysctl

Edit /etc/sysctl.conf on proxy1:

root@proxy1: # echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf

root@proxy1: # echo "net.ipv4.ip_nonlocal_bind = 1" >> /etc/sysctl.conf

root@proxy1: # sysctl -p

net.ipv4.ip_forward = 1

net.ipv4.ip_nonlocal_bind = 1

root@proxy1: #

Edit /etc/sysctl.conf on proxy2:

root@proxy2: # echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf

root@proxy2: # echo "net.ipv4.ip_nonlocal_bind = 1" >> /etc/sysctl.conf

root@proxy2: # sysctl -p

net.ipv4.ip_forward = 1

net.ipv4.ip_nonlocal_bind = 1

root@proxy2: #

Configure cache servers / backends

Make sure that your cache servers respond on port 80 of their IP as assigned in your haproxy config on the load balancers.

Make sure they serve a file named check.txt on for example 11.22.33.45:80/check.txt . The contents are not important but since they will be serving it every 2 seconds, just put "check ok" inside it or a simple "1" if you're really anal about optimizations. Which you should be :)

Start services

Start services on proxy1

root@proxy1: # service haproxy start

root@proxy1: # service keepalived start

Start services on proxy2

root@proxy2: # service haproxy start

root@proxy2: # service keepalived start

Verify that the system works

Check your network on proxy1

root@proxy1: # ip addr sh eth0

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
inet 11.22.33.43/27 brd 11.22.33.63 scope global eth0
inet 11.22.33.44/32 scope global eth0
valid_lft forever preferred_lft forever

Check your network on proxy2

root@proxy2: # ip addr sh eth0

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:11:22:33:44:54 brd ff:ff:ff:ff:ff:ff
inet 11.22.33.42/27 brd 11.22.33.63 scope global eth0
valid_lft forever preferred_lft forever

It's working! Proxy1 is now master and has the correct IP assigned as an alias. It assumes the master role by default since it has a higher priority defined in keepalived.conf

To verify that the connection to your backend / cache servers work, you'll have to edit the hosts file on your local machine, adding an entry for any of your backend domains that points to 11.22.33.44. After flushing your DNS cache, you should now be able to browse your site. If it doesn't work, check that haproxy.cfg points to the correct cache/backend and that your backend server is up. Also check that your dns entry was activated.

Verify the failover function

Watch /var/log/messages on proxy2 while stopping the network on proxy1 and you should see something along the lines of this:

root@proxy2: # tail -f /var/log/messages

Sep 28 15:12:28 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Received lower prio advert, forcing new election

Sep 28 15:12:56 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Received higher prio advert

Sep 28 15:12:56 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE

Sep 28 16:29:45 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE

Sep 28 16:29:46 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE

Check your network

root@proxy2: # ip addr sh eth0

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000link/ether 00:11:22:33:44:54 brd ff:ff:ff:ff:ff:ff
inet 11.22.33.42/27 brd 11.22.33.63 scope global eth0
inet 11.22.33.44/32 scope global eth0
valid_lft forever preferred_lft forever

It's working. Proxy2 has now taken over the role of master and activated the IP.

Now watch the messages log again and reenable networking on proxy1

root@proxy2: # tail -f /var/log/messages
Sep 28 16:29:45 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE

Sep 28 16:29:46 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Sep 28 16:29:55 proxy-02 mpt-statusd: detected non-optimal RAID status
Sep 28 16:32:52 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Received lower prio advert, forcing new election
Sep 28 16:32:53 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Received higher prio advert
Sep 28 16:32:53 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE

Check your network

root@proxy2: # ip addr sh eth0

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000link/ether 00:11:22:33:44:54 brd ff:ff:ff:ff:ff:ff
inet 11.22.33.42/27 brd 11.22.33.63 scope global eth0
valid_lft forever preferred_lft forever

It's working. Proxy1 has now taken over the role of master and activated the IP.

Benchmarking

testserver1:~# ab -k -n 10000 -c 1000 http://www.mydomain.com/favicon.ico

This is ApacheBench, Version 2.3 <$Revision: 655654 $>

Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking www.mydomain.com (be patient)

Completed 1000 requests

Completed 2000 requests

Completed 3000 requests

Completed 4000 requests

Completed 5000 requests

Completed 6000 requests

Completed 7000 requests

Completed 8000 requests

Completed 9000 requests

Completed 10000 requests

Finished 10000 requests

Server Software: Apache/2.2.16

Server Hostname: www.mydomain.com

Server Port: 80

Document Path: /favicon.ico

Document Length: 894 bytes

Concurrency Level: 1000
Time taken for tests: 1.474 seconds
Complete requests: 10000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 10000
Total transferred: 14118199 bytes
HTML transferred: 8940000 bytes
Requests per second: 6783.93 [#/sec] (mean)
Time per request: 147.407 [ms] (mean)
Time per request: 0.147 [ms] (mean, across all concurrent requests)
Transfer rate: 9353.22 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 5.3 0 30
Processing: 23 79 88.0 29 706
Waiting: 23 79 88.0 29 706
Total: 23 80 88.6 29 706

Percentage of the requests served within a certain time (ms)
50% 29
66% 93
75% 119
80% 127
90% 204
95% 244
98% 266
99% 295
100% 706 (longest request)

Deploying

To put this all to work, all you need to do is to alter your dns records and point them towards your shared proxy IP where you normally would point to your backend / cache server.

Monitoring

To get an overview of HAProxy current statistics, log in to your stats page at 11.22.33.44:1936 as configured in haproxy.cfg.

HAProxy statistics overview

Troubleshooting

Replace all the example IPs with real addresses
Run keepalived with the -d flag
Check your network
Check that the cache/backend servers listen on the IP:s you assigned in haproxy.conf
Verify that you can reach the shared IP from the outside
Tail those logs.. /var/log/daemon.log /var/log/messages
Read the documentation!
Low performance? Check that your virtualization network drivers can handle the load. See this blog post - http://www.networkredux.com/blog/view/1346

Gotchas

If all else fails and that shared IP won't respond to your pings from the outside - check with your hosting admin and see if he has configured your vlan correctly. I spent a few hours debugging everything without result and the day after, our provider checked and lo and behold - the vlan was misconfigured. Click-clickety-clack, ten seconds later everything was working as intented.

Using ACLs in Varnish? Fail much? client.ip will now always be the same as your proxy ip.
Use req.http.x-forward-for and match them line for line instead. Or write a VMOD.
Check this blog post for more info: http://zcentric.com/2012/03/16/varnish-acl-with-x-forwarded-for-header/

Insert Raw Code Here

Friday, September 28, 2012

HAProxy and Keepalived on Debian Squeeze for failover and loadbalancing

Building a failover load balancing cluster on four machines with HAProxy and Keepalived in Debian Squeeze

Outline

The tools

The servers are:

proxy1 haproxy, keepalived, ip 11.22.33.42 (sharing ip 11.22.33.44)

proxy2 haproxy, keepalived, ip 11.22.33.43 (sharing ip 11.22.33.44)

cache1 varnish, apache2, ip 11.22.33.45

cache2 varnish, apache2, ip 11.22.33.46

Install HAProxy and Keepalived

Configure HAProxy and Keepalived

Configure sysctl

Configure cache servers / backends

Start services

Verify that the system works

Verify the failover function

Benchmarking

Deploying

Monitoring

Troubleshooting

Gotchas

Resources

2 comments:

Friday, September 28, 2012

HAProxy and Keepalived on Debian Squeeze for failover and loadbalancing

Building a failover load balancing cluster on four machines with HAProxy and Keepalived in Debian Squeeze

Outline

The tools

The servers are: proxy1 haproxy, keepalived, ip 11.22.33.42 (sharing ip 11.22.33.44) proxy2 haproxy, keepalived, ip 11.22.33.43 (sharing ip 11.22.33.44) cache1 varnish, apache2, ip 11.22.33.45 cache2 varnish, apache2, ip 11.22.33.46

Install HAProxy and Keepalived

Configure HAProxy and Keepalived

Configure sysctl

Configure cache servers / backends

Start services

Verify that the system works

Verify the failover function

Benchmarking

Deploying

Monitoring

Troubleshooting

Gotchas

Resources

2 comments:

The servers are:

proxy1 haproxy, keepalived, ip 11.22.33.42 (sharing ip 11.22.33.44)

proxy2 haproxy, keepalived, ip 11.22.33.43 (sharing ip 11.22.33.44)

cache1 varnish, apache2, ip 11.22.33.45

cache2 varnish, apache2, ip 11.22.33.46