Friday, September 28, 2012

HAProxy and Keepalived on Debian Squeeze for failover and loadbalancing

Building a failover load balancing cluster on four machines with HAProxy and Keepalived in Debian Squeeze

So you've got a big-ass VMWare machine with some servers to spare? Lets put them to work creating that redundancy your boss always nags you about whenever there is a split-second of downtime. In this post, I'll dive into how you can build a basic load-balancing high availability cluster either on VMWare or with separate bare-metal servers.

I've made a few assumptions about network topography and services used based on my own server environment and the services I work with which are mainly Drupal servers and Varnish cache servers. I usually run my Drupal backends on their own servers fronted by a Varnish server on its own box. I run Debian 5 (Lenny) and Debian 6 (Squeeze).  Other than that, it's all pretty basic stuff.

What I'll build is a solution that will project one server to the outside but on the inside it will consist of four servers, in essence a small high-availability cluster. The cluster can be extended infinitely in all levels should the need arise. 




High availability cluster



The pros of this kind of setup are redundancy, load balancing, ease of maintenance and the possibility to do proxying for all kinds of tcp connections, not just http but for basically any service over TCP. 

The cons are that it can take quite a bit of debugging if anything starts behaving funky and, obviously, it requires a few spare servers and IPs.


Outline

There are a few steps we need to go through to get this up and running:
  1. Install HAProxy and Keepalived
  2. Configure HAProxy and Keepalived
  3. Configure sysctl
  4. Configure cache servers/backends
  5. Start services
  6. Verify that the system is up
  7. Verify failover function


The tools

To provide redundancy I'll be using Keepalived, a simple and robust linux routing software written in C that provides failover functionality via the virtual routing protocol VRRP 
Keepalived also provides layer 4 load balancing. Keepalived is responsible for maintaining the shared public IP and determining which server is alive.

To provide true layer 7 load balancing I will be using HAProxy. HAProxy is a fast, free and reliable TCP load balancing, proxying and high availability software that provides us with the parts needed to finish our cluster. HAProxy determines the health of the backends - removing any one that fails - and distributes the load between them. HAProxy also provides sticky sessions through cookies that pin each visitor to it's own backend.
HAProxy is also light on resources, easily handling thousands of connections on cheap hardware


To make it all happen, I'm using four servers. Two will be serving as load balancers and two as cache servers. The load balancers run HAProxy and Keepalived and the cache servers run Varnish and Apache (which you could replace with Nginx or whatever). The reason the cache servers need to run Apache as well as Varnish is because they need to be able to serve a file on their default IP and Varnish doesn't handle that (yet). Of course, many are running backend and cache server on the same box so in that case, this is not a problem.

The idea of it all is to have the load balancers sharing one IP in a master/backup setup. Since they share one IP there will only be one server visible to the outside at any one time. Should one of them fail, the other instantly picks up the IP and resumes business. For this to work, we need to use the Keepalived daemon.

For the load balancing part, we use HAProxy which works in tandem with Keepalived to assure that the backup server takes over if HAProxy should fail. HAProxy provides another layer of failover by monitoring our cache servers and removing them if they fail to respond.

The servers are:
  • proxy1 haproxy, keepalived, ip 11.22.33.42 (sharing ip 11.22.33.44)
  • proxy2 haproxy, keepalived, ip 11.22.33.43 (sharing ip 11.22.33.44)
  • cache1 varnish, apache2, ip 11.22.33.45
  • cache2 varnish, apache2, ip 11.22.33.46


Install HAProxy and Keepalived

I won't cover installation of web servers and caching servers. 
For the load balancers you need to install HAProxy and Keepalived which are available in the usual deb repositiories.

root@proxy1: # apt-get install haproxy keepalived

root@proxy2: # apt-get install haproxy keepalived

On Squeeze, this will also install ipvsadm and ask you to run dpkg-reconfigure to enable it but this can safely be ignored since Keepalived will load necessary parts from it.


Configure HAProxy and Keepalived

UPDATE: Adjusted HAProxy config as per input from Willy Tarreau, resulting in a nice extra 1000 req/s throughput increase
This is a very basic setup to get the load balancing and failover running

/etc/keepalived/keepalived.conf on proxy1:

vrrp_script chk_haproxy {           # Requires keepalived-1.1.13
        script "killall -0 haproxy"     # cheaper than pidof
        interval 2                      # check every 2 seconds
        weight 2                        # add 2 points of prio if OK
}

vrrp_instance VI_1 {
        interface eth0
        state MASTER
        virtual_router_id 51
        priority 101                    # 101 on master, 100 on backup
        virtual_ipaddress {
            11.22.33.44                 # supply your own spare public ip 
        }
        track_script {
            chk_haproxy
        }
}

/etc/keepalived/keepalived.conf on proxy2:

vrrp_script chk_haproxy {           # Requires keepalived-1.1.13
        script "killall -0 haproxy"     # cheaper than pidof
        interval 2                      # check every 2 seconds
        weight 2                        # add 2 points of prio if OK
}

vrrp_instance VI_1 {
        interface eth0
        state MASTER
        virtual_router_id 51
        priority 100                    # 101 on master, 100 on backup
        virtual_ipaddress {
            11.22.33.44                 # supply your own spare public ip 
        }
        track_script {
            chk_haproxy
        }
}

/etc/default/haproxy on proxy1 and proxy2:

# Set ENABLED to 1 if you want the init script to start haproxy.
ENABLED=1
# Add extra flags here.
#EXTRAOPTS="-de -m 16"

/etc/haproxy/haproxy.cfg on proxy1 and proxy2:

global
        log 127.0.0.1   local0
        log 127.0.0.1   local1 notice
        #log loghost    local0 info
        maxconn 4096
        #debug
        #quiet
        user haproxy
        group haproxy
daemon

defaults
        log     global
        mode    http
        option  httplog
        option  dontlognull
        retries 3
        option redispatch
        maxconn 2000
        contimeout      5000
        clitimeout      50000
        srvtimeout      50000

listen webfarm *:80
       mode http
       stats enable
       stats auth user:pass
       balance roundrobin
       cookie SERVERID insert # pin visitor to server
       option http-server-close # Thanks Willy!
       option forwardfor
       option httpchk HEAD /check.txt HTTP/1.0
       # change IP to your cacheservers public IP
       server webA 11.22.33.45:80 cookie A check
       # change IP to your cacheservers public IP
       server webB 11.22.33.46:80 cookie B check


Configure sysctl

Edit /etc/sysctl.conf on proxy1:

root@proxy1: # echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf
root@proxy1: # echo "net.ipv4.ip_nonlocal_bind = 1" >> /etc/sysctl.conf


root@proxy1: # sysctl -p
net.ipv4.ip_forward = 1
net.ipv4.ip_nonlocal_bind = 1
root@proxy1: #




Edit /etc/sysctl.conf on proxy2:

root@proxy2: # echo "net.ipv4.ip_forward = 1" >> /etc/sysctl.conf
root@proxy2: # echo "net.ipv4.ip_nonlocal_bind = 1" >> /etc/sysctl.conf


root@proxy2: # sysctl -p
net.ipv4.ip_forward = 1
net.ipv4.ip_nonlocal_bind = 1
root@proxy2: #


Configure cache servers / backends

Make sure that your cache servers respond on port 80 of their IP as assigned in your haproxy config on the load balancers. 

Make sure they serve a file named check.txt on for example 11.22.33.45:80/check.txt . The contents are not important but since they will be serving it every 2 seconds, just put "check ok" inside it or a simple "1" if you're really anal about optimizations. Which you should be :)


Start services

Start services on proxy1

root@proxy1: # service haproxy start
root@proxy1: # service keepalived start

Start services on proxy2

root@proxy2: # service haproxy start
root@proxy2: # service keepalived start



Verify that the system works

Check your network on proxy1

root@proxy1: # ip addr sh eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000   
link/ether 00:11:22:33:44:55 brd ff:ff:ff:ff:ff:ff
   
inet 11.22.33.43/27 brd 11.22.33.63 scope global eth0
   
inet 11.22.33.44/32 scope global eth0
     
valid_lft forever preferred_lft forever

Check your network on proxy2

root@proxy2: # ip addr sh eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:11:22:33:44:54 brd ff:ff:ff:ff:ff:ff   
inet 11.22.33.42/27 brd 11.22.33.63 scope global eth0
     
valid_lft forever preferred_lft forever



It's working! Proxy1 is now master and has the correct IP assigned as an alias. It assumes the master role by default since it has a higher priority defined in keepalived.conf

To verify that the connection to your backend / cache servers work, you'll have to edit the hosts file on your local machine, adding an entry for any of your backend domains that points to 11.22.33.44. After flushing your DNS cache, you should now be able to browse your site. If it doesn't work, check that haproxy.cfg points to the correct cache/backend and that your backend server is up. Also check that your dns entry was activated.


Verify the failover function

Watch /var/log/messages on proxy2 while stopping the network on proxy1 and you should see something along the lines of this:

root@proxy2: # tail -f /var/log/messages
Sep 28 15:12:28 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Received lower prio advert, forcing new election
Sep 28 15:12:56 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Received higher prio advert
Sep 28 15:12:56 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE
Sep 28 16:29:45 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 28 16:29:46 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE

Check your network

root@proxy2: # ip addr sh eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000link/ether 00:11:22:33:44:54 brd ff:ff:ff:ff:ff:ff
inet 11.22.33.42/27 brd 11.22.33.63 scope global eth0
inet 11.22.33.44/32 scope global eth0
valid_lft forever preferred_lft forever


It's working. Proxy2 has now taken over the role of master and activated the IP.

Now watch the messages log again and reenable networking on proxy1

root@proxy2: # tail -f /var/log/messages
Sep 28 16:29:45 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Transition to MASTER STATE
Sep 28 16:29:46 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Entering MASTER STATE
Sep 28 16:29:55 proxy-02 mpt-statusd: detected non-optimal RAID status
Sep 28 16:32:52 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Received lower prio advert, forcing new election
Sep 28 16:32:53 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Received higher prio advert
Sep 28 16:32:53 proxy-02 Keepalived_vrrp: VRRP_Instance(VI_1) Entering BACKUP STATE


Check your network

root@proxy2: # ip addr sh eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000link/ether 00:11:22:33:44:54 brd ff:ff:ff:ff:ff:ff
inet 11.22.33.42/27 brd 11.22.33.63 scope global eth0
valid_lft forever preferred_lft forever

It's working. Proxy1 has now taken over the role of master and activated the IP.


Benchmarking

testserver1:~# ab -k -n 10000 -c 1000 http://www.mydomain.com/favicon.ico
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking www.mydomain.com (be patient)
Completed 1000 requests
Completed 2000 requests
Completed 3000 requests
Completed 4000 requests
Completed 5000 requests
Completed 6000 requests
Completed 7000 requests
Completed 8000 requests
Completed 9000 requests
Completed 10000 requests
Finished 10000 requests


Server Software:        Apache/2.2.16
Server Hostname:        www.mydomain.com
Server Port:            80

Document Path:          /favicon.ico
Document Length:        894 bytes


Concurrency Level:      1000
Time taken for tests:   1.474 seconds
Complete requests:      10000
Failed requests:        0
Write errors:           0
Keep-Alive requests:    10000
Total transferred:      14118199 bytes
HTML transferred:       8940000 bytes
Requests per second:    6783.93 [#/sec] (mean)
Time per request:       147.407 [ms] (mean)
Time per request:       0.147 [ms] (mean, across all concurrent requests)
Transfer rate:          9353.22 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   5.3      0      30
Processing:    23   79  88.0     29     706
Waiting:       23   79  88.0     29     706
Total:         23   80  88.6     29     706

Percentage of the requests served within a certain time (ms)
  50%     29
  66%     93
  75%    119
  80%    127
  90%    204
  95%    244
  98%    266
  99%    295
 100%    706 (longest request)


Deploying

To put this all to work, all you need to do is to alter your dns records and point them towards your shared proxy IP where you normally would point to your backend / cache server.


Monitoring

To get an overview of HAProxy current statistics, log in to your stats page at 11.22.33.44:1936 as configured in haproxy.cfg.


HAProxy statistics overview



Troubleshooting

  • Replace all the example IPs with real addresses
  • Run keepalived with the -d flag
  • Check your network
  • Check that the cache/backend servers listen on the IP:s you assigned in haproxy.conf
  • Verify that you can reach the shared IP from the outside
  • Tail those logs.. /var/log/daemon.log /var/log/messages
  • Read the documentation!
  • Low performance? Check that your virtualization network drivers can handle the load. See this blog post - http://www.networkredux.com/blog/view/1346



Gotchas

If all else fails and that shared IP won't respond to your pings from the outside - check with your hosting admin and see if he has configured your vlan correctly. I spent a few hours debugging everything without result and the day after, our provider checked and lo and behold - the vlan was misconfigured. Click-clickety-clack, ten seconds later everything was working as intented.

Using ACLs in Varnish? Fail much? client.ip will now always be the same as your proxy ip.
Use req.http.x-forward-for and match them line for line instead. Or write a VMOD.
Check this blog post for more info: http://zcentric.com/2012/03/16/varnish-acl-with-x-forwarded-for-header/



Resources

Wednesday, March 28, 2012

Rendering fields correctly in Drupal 7

While searching for some field api stuff I stumbled on this rather good introduction to the Field API in Drupal 7 and specifically on how to read field contents in a safe manner. 

If you (as I did) read the below excerpt and feel a little guilty then you should most definitely read the full article


You may well have seen (or written!) code that looks something like this:

 
// This is WRONG example.
$block['content'] = $node->field_name['und'][0]['safe_value'];

Poking around the node object for the value you wanted to print was fairly common in Drupal 6, and the 'safe_value' sounds like it's been sanitised, right? What's wrong with that? Oh, Let me count the ways.
  1. Firstly, the ['und'] element is part of the field localisation in Drupal 7 (see this article from Gábor Hojtsy for more on that); directly accessing that value will cause issues in any kind of multi-lingual environment. Boo.
  2. By accessing the field value directly you miss out on any theming that might come courtesy of the normal field markup.
  3. The [0][safe_value] explicitly accesses the first value of the field - if you wanted every value from a multi-value field you'd need to do some sort of loop.
  4. Some fields (such as node references) won't have a safe_value element, only a value - which can easily be printed without thought for sanitisation. This is dangerous, not because node reference fields contain dangerous data (they're just a nid), but because it's not a helpful habit to get into, especially for new developers. Other fields types 'value' may well be highly dangerous.

Thanks, Stephen. I hearby wow never to repeat my sins against Drupal. Honestly.

Saturday, February 18, 2012

Site performance - analyze and improve

Looking back at the Drupal site I built two years ago for my current employer it's starting to look a little old and worn. Sure, we're running Varnish, we have purge rules, we have memcached for the backend and we've shut off basically all cpu hogs. But what about the visitors side - apart from getting a speedy response from our cache servers, there has to be more we can do? Well - load times, page rendering speed and total data size seem like good targets for improvement since they also provide (via Firebug, YSlow and NewRelic) great metrics.

Record, improve, measure, profit? 

We start initially by collecting metrics on the site usage so we can set goals for the improvements. We use a mix-and-match approach to this, utilizing tools like Yahoos YSlow, Pingdoms Pagetest, New Relic and Firebug. Mind you, we had to disable a few rules in YSlow since they're really not applicable for us - the CDN rules (hey, we don't have a CDN), ETags (we have them but they're default Apache - YSlow hates them), Expires headers (yeah we don't want to set them 3 years into the future) and DNS Lookups (Friggin external services are really DNS-expensive)

The pages we're measuring typically show one full article, 3 banners (flash/jpg/whatever) and up to 150 teasers consisting of an image and up to 200 characters text. Insane amount of teasers.

Initial metrics
Total load time: 8sec
Waiting for external services: 3-12 sec
Data size: 2.5mb
DOM Elements: 1377
YSlow score: D (67)


Changes

1) GZIP GZIP GZIP
Apparently, somewhere in the past we had to turn the mod_deflate off and never brought it back online. Bad robot! We turned it back on and are now gzipping css, js, html and anything text-based which resulted in the total data size dropping to about 2.1 megs. Still a shitload of data though.

2) Lazy-loading of images
Instead of loading 150+ images from the get-go, we only load what's visible in the visitors browser. Anything else is loaded when it scrolls into view. This was the major improvement, bringing us down to about 1.5 megs. Still not slim enough!

3) Reduction of DOM elements
As we've progressed through the last two years, the site has been extended and fixed and extended again - resulting in a patchwork of DIVs and css classes. Some parts were done through Semantic Views and other parts through their own .tpl files. We went through and made sure all parts are rendered from their own .tpl so we have full control over the output. Then we got started reducing and simplifying the files, swapping out wrapper DIVs and extraneous containers. This brought us down to 1044 DOM elements and 1.1mb in size

4) Minimizing external services
We used to have a GooglePlus button on the site. It typically took anywhere between 3 and 10 seconds to load all of it's resources. Need I say it's gone? We also used to have the Facebook activity stream with Faces enabled. It required between 50 and 70 http requests to load all it's resources and took up to 8 seconds to load. No more. We also used to load the ever-ubiquitous JQuery from Googles ajax-cdn to be sure we always had the latest version on the site. I guess we can live with having it lag a version or two if that's the price we have to pay to have a faster site so we cached it on our own servers instead.
We still have a Twitter button and a few Facebook boxes on the page but we'll have to live with them until we develop a better method to load them (preferrably they will be loaded when the user hovers over the boxes)


Result

Final metrics
Total load time: 2 sec (-75%)
Waiting for external services: 2-3 sec
Data size: 1.1mb (-56%)
DOM Elements 1044 (-24%)
YSlow score: B (89)

So here we are. The new site is now a lot slimmer, nearly ready for Beach 2012. Stuff that's left to consider is trimming of the master CSS, adding the responsive parts to the CSS and trimming away some of the extraneous Drupal CSS classes which it so loves to add. All in all, I think the result is quite good but surely there are a some more optimizations left to do. Or at least I hope so, this performance hunting is addictive!


Other possible optimizations

A couple of points that could be worth researching are
  • Progressive loading of all content, not just images
  • Dividing the posts into categories, thereby reducing the number of posts on any page
  • Minifying *everything*
  • Writing our own sharing functions that don't require hefty scripts from external sources 
  • Spreading the downloads over multiple domains to reduce browser blocking

Tools used 

Tuesday, February 14, 2012

VCL Random url part

Random is random, usually. Unless it's being fetched every 20th minute and included as a part of a site with heavy traffic. That's a lesson I learned the hard way by relying on an ad networks RSS aggregator that included one of our RSS feeds on it's pages. The feed is randomized on our end with a list of nodes that are straight out of Views via some custom theming. The nodes are grouped in threes from a larger set.

What I noticed was that over 24 hours, what should have been an even distribution turned out to be heavily weighted towards one specific set. The ad networks RSS aggregation ran on a schedule and as bad luck would have it, Views liked to display the same set every time the RSS aggregator made a visit. Probably just bad luck but to the customer it looked like one set of nodes were favored over the others.

As I run Varnish for caching I started thinking about if I could do this in another way. I thought about faux round-robin directors and complicated solutions involving dns aliases and htaccess rewrites.

Then I stumbled upon this blogpost by Chris Davies who is just generally speaking one of the most knowledgeable techies I know of. So enter inline-C for my VCL.

sub vcl_recv{
    if (req.url == "/randomizeme") {
C{
char buff[5];
sprintf(buff, "%d", rand()%4+1);
VRT_SetHdr(sp, HDR_REQ, "\017X-Distribution:", buff, vrt_magic_string_end);
}C
        set req.url = "/randomizeme/" req.http.X-Distribution;
    }
 }

It works like this:
- Varnish picks up the request on url /randomizeme
- Varnish executes the inline C
- Varnish sets a special header with the randomized integer (the header works as a variable)
- Varnish then rewrites the url to /randomizeme/1-4
- Varnish finally fetches the above url from the backend which is rigged to display different content depending on which "slot" is chosen.
- Varnish delivers the randomized content to the visitor/aggregator/whatever

A little warning: If you modify the header name "X-Distribution", note that the "\017" must be updated, it's octal for the length of the string. I forget if it should include the ":" or not. Trial and error, watch that log for segfaults!