LoadBalancer External Sites: Apache mod balancer

From RARFORGE
Jump to: navigation, search

Contents

INFO

  • DO NOT CHANGE servers hostnames - pacemaker will freak out.
  • Centos 5 ( hardware required it )


WHY?
  • LOADbalance ANY external website
  • SSL offloading is required for SSL.
the server will house the SSL cert and direct traffice to external:80


What's running
  • Corosync + Pacemaker cluster (uses heartbeat libs)
  • Apache + mod_proxy + mod_proxy_balancer
  • Two servers (as of 2013-03-26)
  • modbalancer1: host
  • modbalancer2: host


Troubleshooting

cluster failures

  • The cluster will retry to fix all services every 60s [ONLY if the cluster is ONLINE]

Working Cluster looks like

[root@modbalancer2 init.d]# crm_mon -rf1
============
Last updated: Wed Mar 27 14:39:08 2013
Stack: openais
Current DC: modbalancer2 - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
9 Resources configured.
============
 
Online: [ modbalancer1 modbalancer2 ]
 
Full list of resources:
 
 Clone Set: clone_httpd [httpd]
     Started: [ modbalancer2 modbalancer1 ]
 p_ip-10.0.0.104     (ocf::heartbeat:IPaddr):        Started modbalancer1
 p_ip-10.0.0.105     (ocf::heartbeat:IPaddr):        Started modbalancer2
 p_ip-10.0.0.106     (ocf::heartbeat:IPaddr):        Started modbalancer1
 p_ip-10.0.0.107     (ocf::heartbeat:IPaddr):        Started modbalancer2
 p_ip-10.0.0.108     (ocf::heartbeat:IPaddr):        Started modbalancer1
 p_ip-10.0.0.109     (ocf::heartbeat:IPaddr):        Started modbalancer2
 p_ip-10.0.0.110     (ocf::heartbeat:IPaddr):        Started modbalancer1
 p_ip-10.0.0.111     (ocf::heartbeat:IPaddr):        Started modbalancer2
 
Migration summary:
* Node modbalancer1: 
* Node modbalancer2:


nagios page: check_crm CRITICAL - Connection to cluster FAILED: connection failed

FIX: make sure corosync is running and packemaker is running


nagios page: httpd:0 httpd:1 Stopped [ Clone OK-NONE!]

FIX: httpd configs are probably messed up on both. You need to fix any apache config errors
Once apache starts properly, cleanup the cluster httpd server
crm resource cleanup clone_httpd


UNCLEAN (offline)

  • stop pacemaker and corosync - then start - in that order
  • reason (in a bad failure . I.E. pacemaker cannot get status, it would notmally stonith, but that is not enabled)
# Stop
/etc/init.d/pacemaker stop
/etc/init.d/corosync stop
 
# Start
/etc/init.d/corosync start
/etc/init.d/pacemaker start


nagios page: httpd failure detected, fail-count=1

These will fix themselves if apache is starting propely.
If you are impatient, you can run
crm resource cleanup clone_httpd


httpd - crm

  • If apache failed for a while, you may need to kick start pacemaker if you are impatient
crm resource cleanup clone_httpd


Unison

  • Synchronisation failed : please check /root/unison.log file for diagnosis

FIX: remove /root/.unison/ar* files from both servers and resync manually

 ssh modbalancer1
 
 sudo su -
 
 cp -R /etc/httpd/ /etc/httpd.bak
 
 rm /root/.unison/ar*
 
 ssh modbalancer1 -C "rm /root/.unison/ar*"
 
 /usr/bin/unison -times=true -prefer newer -batch -auto /etc/httpd/ ssh://root@modbalancer2//etc/httpd/


inotify_watcher

  • PROCS CRITICAL: 0 processes with args '/usr/local/bin/inotify_watcher.pl'

FIX: check /etc/rc.local to make sure the startup script is there. Start the process in screen

screen -dmS inotify /usr/local/bin/inotify_watcher.pl



IP Pool

All of these need to be public IP addresses - using RFC1918 as an example
# Main block for servers - other might be useable for apache virts (might want to save them for addition servers?)
# 10.0.0.48/29:
10.0.0.49: secondary on Cisco Router (VLAN 321)
10.0.0.50: modbalancer1
10.0.0.51: modbalancer2
;52             IN      PTR
;53             IN      PTR
;54             IN      PTR
;55             IN      PTR
# These IP addresses can be used for Apache+mod_proxy_balancer VIRTs
# 10.0.0.104/29: Routed to Vlan 321 on Cisco Router - ip route 10.0.0.104 255.255.255.248 int vlan 321
10.0.0.104             IN      PTR     mb-vip1
10.0.0.105             IN      PTR     mb-vip2
10.0.0.106             IN      PTR     mb-vip3
10.0.0.107             IN      PTR     mb-vip4
10.0.0.108             IN      PTR     mb-vip5
10.0.0.109             IN      PTR     mb-vip6
10.0.0.110             IN      PTR     mb-vip7
10.0.0.111             IN      PTR     mb-vip8

Servers

modbalancer1

  • location: n/a
  • 10.0.0.50

modbalancer2

  • location: n/a
  • 10.0.0.51



Syncing / Configs

CRON

  • /etc/httpd auto-syncs (unison) every 5 minutes
[root@modbalancer1 /]# cat /etc/cron.d/unison
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin 
MAILTO=root
HOME=/root/
 
*/5 * * * * root . /root/.bash_profile && /usr/local/bin/sync_configs.sh  &> /dev/null
#!/bin/bash
/usr/bin/unison -terse -batch -auto /etc/httpd/ ssh://root@modbalancer2//etc/httpd/


Inotify

  • Inotify script watches /etc/httpd/conf/ and /etc/httpd/conf.d/ for changes.
  • If any *.conf are created / modified / deleted , then ->
  1. Unison will sync modbalancer1 <-> modbalancer2
  2. Apache will be reloaded on both (ONLY if /etc/rc.d/init.d/configtest is verified)


  • /etc/rc.local startup
#!/bin/sh
 
## Inotify watcher for HTTPD configs
screen -dmS inotify /usr/local/bin/inotify_watcher.pl
 
touch /var/lock/subsys/local
#!/usr/bin/perl -w
use strict;
use Linux::Inotify2;
use Path::Class;
use File::stat;
use Data::Dumper;
 
my $debug=1;  
my $wait=10; ## how many seconds to wait after proccessing an event. This will allow us to group events/updates. (60 seconds default)
 
my @hosts = qw(modbalancer1
               modbalancer2
               );
 
## files must mastch blablabl.$file_ext$ -- for events to fire
my @file_ext =  qw(conf);
 
## watched dirs - recursive                                                                                                                                                                  
my @dirs = ("/etc/httpd/conf.d/",
            "/etc/httpd/conf",                                                                                                                                                  
            );                                                                                                                                                                                                 
 
## This requires the script from prowl app                                                                                                                                                        
## url: http://prowlapp.com/static/prowl.pl                                                                                                                                                               
my $prowl = {'enabled' => '0',                                                                                                                                                                  
             'script' => '/usr/local/bin/prowl.pl',                                                                                                                                              
             'api_key' => '<enter prowl api key>',                                                                                                                             
             'app_name' => 'MOD Balancer',
         };
 
####################################################### Configuration DONE #######################################################
 
my $inotify = Linux::Inotify2->new
    or die "unable to create new inotify object: $!";
 
my $host=`hostname`; ## maybe used for later
my $pid=$$;
 
## on start - lets scan the library
 
&notifyProwl('Initializing','program startup');
 
## setup initial watch dirs
my $dirs =0;
my $restart_httpd=0;
my @start_list;  ## used for notifications
my @finish_list; ## used for notifications
my @remove_list; ## used for notifications
 
## Intialize watched dirs - INIT
foreach my $drop_dir(@dirs) {
    &log("Processing subdirs for: $drop_dir",1);
    my $dir = dir($drop_dir);
    my $c = 0;
    $dir->traverse(sub{
        my ($child, $cont, $indent) = @_;
        if ($child->is_dir) {
            $dirs++;
            &log("subdir: $child",1);
            $inotify->watch("$child",  IN_CLOSE_WRITE | IN_CREATE | IN_MOVED_TO |  IN_DELETE , \&MyWatch);
 
            ## in_modify if too verbose.. it grabs any changes to file before final close
            ## IN_CLOSE_WRITE == after a file is written too/closed
            ## IN_CREATE == new directories/files
            ## IN_MOVED_TO == moved directory/files (to watch dirs only)
        }
        $cont->($c + 1);
    });
}
 
## log/notify status of watched dirs
my $init_s = "Watching $dirs directories";
&log($init_s,1);
&notifyProwl('Initialized',$init_s);
 
## LOOP to keep watching dirs/files
while (1) {
    $inotify->poll;
    my $had_event = 0;
    ## notify on sync start and finish
 
    if (@start_list)  {  
        my $info = join(', ', @start_list);
        my $title = 'file Started';
        &notifyProwl($title,$info);
        @start_list = ();
    }
    if (@finish_list) {  
        my $info = join(', ', @finish_list);
        my $title = 'file Finished';
        &notifyProwl($title,$info);
        @finish_list = ();
    }
    if (@remove_list)  { 
        &notifyProwl('Removed',join(', ', @remove_list));  
        @remove_list = ();
    }
 
    if ($restart_httpd) {
        $had_event = 1;
        $restart_httpd = 0;
        &restart_HTTPD();
    }
 
    if ($had_event) {
        &log("Sleeping '$wait' seconds before processing any new events");
        sleep($wait);
        &log("I\'m awake again - ready to process any new events");
    }
 
}
 
 
sub restart_HTTPD() {
    &log("Unison sync && Restarting HTTPD");
    &notifyProwl('unison-HTTPD','sync-restart');
    my $out;
    $out = `/usr/local/bin/sync_configs.sh  2>&1`;
 
    ## Verify configs on Node 1 and Node 2
    my $failed;
    foreach my $host (@hosts) {
        my $check = `ssh $host /etc/init.d/httpd configtest 2>&1`;
        if ($check !~ /Syntax OK/i) {
            $failed .= "$host: $check\n";
        }
    }    
 
    if (!$failed) {
        ## config passed - reload apache on both 
        foreach my $host (@hosts) {
            $out .= "$host: ";
            $out .= `ssh $host -C /etc/init.d/httpd reload  2>&1`;
        }
    } else {
        $out = "HTTPD Config FAIL: cannot reload - $failed";
    }
    &log($out);
    &notifyProwl('unison-HTTPD',$out);
}
 
sub MyWatch() {
    my $event = shift;
    my $name = $event->fullname;
    my $file_name = $event->name;
    my $log = 'unknown';
 
    ## files to skip -- for now, emacs turds
    if ($file_name =~ /^\..*/i || $file_name =~ /^\#.*/i ) { 
        $log =  "SKIPPING: $name  [$file_name]";
    }
 
    ## continue on..
    elsif ($event->IN_IGNORED) {
        ## this is a DIR action
        ## remove watch - if directory
        $event->w->cancel;  ## cancel watch
        $log = "DIR $name removed -- cancelling watch";
        foreach my $ext (@file_ext) {
            if ($name =~ /\.$ext$/i) {    push (@remove_list,$name);     }
        }
    } elsif ( $event->IN_DELETE) {
        ## this is a FILE action
        ## group deletes together - for notify
        $log = "FILE: $name removed";
        foreach my $ext (@file_ext) {
            $log = "FILE: $name removed - will notify prowl";
            if ($name =~ /\.$ext$/i) { 
                $restart_httpd = 1; ## restart apache - conf file removed
                push (@remove_list,$name);    
            }
        }
    } else {
        if (-d $name) {
            $inotify->watch($name,  IN_CLOSE_WRITE | IN_CREATE | IN_MOVED_TO , \&MyWatch);
            $log = "DIR: $name created -- adding to watchlist";
            &notifyProwl('new',$log);
        } elsif (-f $name && $event->IN_CREATE) {
            ## file is created, but has not finished writing - no action needed -  just logging
            $log = "$name is FILE -- in_create called.. waiting for IN_CLOSE_WRITE to process";
            foreach my $ext (@file_ext) {
                ## push new files to start_list (sync started notifications)
                ## skip notify on this
                #if ($file_name =~ /\.$ext$/i) {    push (@start_list,$name);     }
            }
        } else {
            $log = "$name is IN_CLOSE_WRITE" if $event->IN_CLOSE_WRITE;
            $log = "$name is IN_CREATE" if $event->IN_CREATE;
            $log = "$name is IN_MOVED_TO" if $event->IN_MOVED_TO;
            $log = "events for $name have been lost" if $event->IN_Q_OVERFLOW;
            if ($file_name =~ /^\..*\.\w{5}$/) {
                &log("$file_name must be rsync - skip it");  
            } else {
                ## only update on ext matching @FILES
                foreach my $ext (@file_ext) {
                    ## push new files to finish_list (sync finished notifications)
                    if ($file_name =~ /\.$ext$/i) {
                        push (@finish_list,"$name");
                        &log("$name $ext is matched file -- restart httpd");
                        $restart_httpd = 1; ## restart apache
                    }
                }
                if (!$restart_httpd) {&log("$file_name does not match ext of \@files_ext -- skipping httpd restart");   }
            }
        }
    }
    &log($log);
}
 
 
sub log() {
    my $msg = shift;
    my $print= shift;
    if ($debug || $print) {
        print localtime() . ": $msg\n";    
    }
    system("logger  -t $0\[$pid\] \"$msg\"");
}
 
 
sub notifyProwl() {
    if (defined($prowl->{enabled}) && $prowl->{enabled} == 1) {
        my ($event,$msg) = @_;
        my $cmd = sprintf("perl %s -apikey='%s' -application='%s' -event='%s' -notification='%s'",$prowl->{script}, $prowl->{api_key}, $prowl->{app_name}, $event, $msg);
        my $res = `$cmd`;
        chomp($cmd);
        chomp($res);
        &log("Notify: Prowl Event: $cmd");
        &log("Notify: Prowl Result: $res");
    }
 
}



Load Balanced WEBSITE - example

  • IP: 10.0.0.111


Apache Config

  • Location: /etc/httpd/conf.d/balancer/10.0.0.111.conf
  • Customer Files: I.E. htpasswd, customer certs & keys, etc..
/etc/httpd/conf.d/balancer/10.0.0.111/
  • Verify the servername IS UNIQUE - This matters mainly for SSL.
If you use a servername already in use, apache will use the FIRST ssl cert
## IN USE 
 
<VirtualHost 10.0.0.111:80>
 ServerName mb-vip8.domain.tld
 
 # don't loose time with IP address lookups 
 HostnameLookups Off
 UseCanonicalName Off 
 
 Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED
 ProxyPreserveHost On
 ProxyRequests off
 <Proxy balancer://robreed>
         BalancerMember http://8.8.8.8:80 route=1
        BalancerMember http://4.4.4.2:80 route=2
         # the hot standby if all fail?
         # BalancerMember http://10.0.0.5:80 status=+H
         Order Deny,Allow
         Deny from none
         Allow from all
         ProxySet lbmethod=bybusyness stickysession=ROUTEID
 </Proxy>
 
 
 <Location /server-status>
  SetHandler server-status
  Order deny,allow
  Deny from all
  Allow from 10.0.0
 </Location>
 
 <Location /balancer-manager>
        SetHandler balancer-manager
        Deny from all
        AuthUserFile /etc/httpd/conf.d/balancer/10.0.0.111/htpasswd
        AuthName authorization
        AuthType Basic
        Allow from 10.0.0
 
        Satisfy Any
        require valid-user
 </Location>
 ProxyPass /balancer-manager !
 ProxyPass /server-status !
 ProxyPass / balancer://robreed/
 
</VirtualHost>
 
 
<VirtualHost 10.0.0.111:443>
 ServerName mb-vip8.domain.tld
 
 # don't loose time with IP address lookups 
 HostnameLookups Off
 UseCanonicalName Off 
 
 Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED
 ProxyPreserveHost On
 ProxyRequests off
 
 <Proxy balancer://robreed>
         BalancerMember https://8.8.8.8:443 route=1
#        BalancerMember https://4.4.4.2:443 route=2
         # the hot standby if all fail?
         # BalancerMember https://10.0.0.5:443 status=+H
         Order Deny,Allow
         Deny from none
         Allow from all
         ProxySet lbmethod=bybusyness stickysession=ROUTEID
 </Proxy>
 
 <Location /server-status>
  SetHandler server-status
  Order deny,allow
  Deny from all
  Allow from 10.0.0
 </Location>
 
 <Location /balancer-manager>
        SetHandler balancer-manager
        Deny from all
        AuthUserFile /etc/httpd/conf.d/balancer/10.0.0.111/htpasswd
        AuthName authorization
        AuthType Basic
        Satisfy Any
        require valid-user
 </Location>
 
 ProxyPass /balancer-manager !
 ProxyPass /server-status !
 ProxyPass / balancer://robreed/
 
 SSLEngine on
 SSLProxyEngine On
 SSLProtocol all -SSLv2
 SSLHonorCipherOrder On
 SSLCipherSuite ECDHE-RSA-AES128-SHA256:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH
 
 SSLCertificateFile /etc/httpd/conf/ssl.crt/selfsigned.crt
 SSLCertificateKeyFile /etc/httpd/conf/ssl.key/selfsigned.key
 
</VirtualHost>



Client Logging - X-Forwarded-For

  • logs will probably be turned off locally
  • clients can use X-Forwarded-For header
example Apache config to log remote IP
LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" proxy
CustomLog /etc/httpd/logs/rarforge-access_log_proxy proxy



Balancer Manager - Client Interface

  • clients can loging to http://<their balanced IP>/balancer-manager/ to perform basic tasks
  • an apache restart will revert any changes this might cause odd issues... need to let customer know.


HowTo

  • Click on worker URL to edit values ( not persistent on apache reload)
loadfactor 	1 	Worker load factor. Used with BalancerMember. It is a number between 1 and 100 and defines the normalized 
                        weighted load applied to the worker. 
 
lbset 	        0 	Sets the load balancer cluster set that the worker is a member of. The load balancer will try all members 
                        of a lower numbered lbset before trying higher numbered ones. 
 
route 	        - 	Route of the worker when used inside load balancer. The route is a value appended to session id. 
 
redirect 	- 	Redirection Route of the worker. This value is usually set dynamically to enable safe removal of the node 
                        from the cluster. If set all requests without session id will be redirected to the BalancerMember that has 
                        route parameter equal as this value.
Status Options:
 Dis:  is disabled (removes the server from the active list)
 Ign:  is ignore-errors ( monitoring is stopped?)
Stby:  is hot-standby ( only used when ALL servers are out of LB)
 
# This can also be set int LB config for persistence if needed
# Status can be set (which is the default) by prepending with '+' or cleared by prepending with '-'. 
#   Thus, a setting of 'S-E' sets this worker to Stopped and clears the in-error flag. 
status=+[E|S|D|I|H|S]
 'D' is disabled, 
 'I' is ignore-errors, 
 'H' is hot-standby 
 'E' is in an error state. 
 'S' is stopped,

Configs

corosync

#/etc/corosync/corosync.conf
totem {
 version: 2
 token: 5000
 token_retransmits_before_loss_const: 20
 join: 1000
 consensus: 7500
 vsftype: none
 max_messages: 20
 secauth: on
 threads: 0
 clear_node_high_bit: yes
 
 interface {
  ringnumber: 0
  bindnetaddr: 172.16.50.10
  mcastaddr: 226.94.50.231
  mcastport: 5405
 }
}
 
logging {
 fileline: off
 to_syslog: yes
 to_stderr: no
 syslog_facility: daemon
 debug: on
 timestamp: on
}
 
amf {
 mode: disabled
}
 
aisexec {
        # Run as root - this is necessary to be able to manage resources with Pacemaker
        user:        root
        group:       root
}
#/etc/corosync/service.d/pcmk
service {
 # Load the Pacemaker Cluster Resource Manager
 name: pacemaker
 ver: 1
 }

Pacemaker

  • use crm to modify/manage config
  • save config: crm configure save <FILENAME>
cat /root/crm.20120327-1441.crm
node modbalancer1
node modbalancer2
primitive httpd lsb:httpd \
        op monitor interval="10" timeout="30" start-delay="10" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120"
primitive p_ip-10.0.0.104 ocf:heartbeat:IPaddr \
        params ip="10.0.0.104" cidr_netmask="32" nic="eth2" \
        op monitor interval="2s"
primitive p_ip-10.0.0.105 ocf:heartbeat:IPaddr \
        params ip="10.0.0.105" cidr_netmask="32" nic="eth2" \
        op monitor interval="2s"
primitive p_ip-10.0.0.106 ocf:heartbeat:IPaddr \
        params ip="10.0.0.106" cidr_netmask="32" nic="eth2" \
        op monitor interval="2s"
primitive p_ip-10.0.0.107 ocf:heartbeat:IPaddr \
        params ip="10.0.0.107" cidr_netmask="32" nic="eth2" \
        op monitor interval="2s"
primitive p_ip-10.0.0.108 ocf:heartbeat:IPaddr \
        params ip="10.0.0.108" cidr_netmask="32" nic="eth2" \
        op monitor interval="2s"
primitive p_ip-10.0.0.109 ocf:heartbeat:IPaddr \
        params ip="10.0.0.109" cidr_netmask="32" nic="eth2" \
        op monitor interval="2s"
primitive p_ip-10.0.0.110 ocf:heartbeat:IPaddr \
        params ip="10.0.0.110" cidr_netmask="32" nic="eth2" \
        op monitor interval="2s"
primitive p_ip-10.0.0.111 ocf:heartbeat:IPaddr \
        params ip="10.0.0.111" cidr_netmask="32" nic="eth2" \
        op monitor interval="2s"
clone clone_httpd httpd
colocation c_10.0.0.104_on_http inf: p_ip-10.0.0.104 clone_httpd
colocation c_10.0.0.105_on_http inf: p_ip-10.0.0.105 clone_httpd
colocation c_10.0.0.106_on_http inf: p_ip-10.0.0.106 clone_httpd
colocation c_10.0.0.107_on_http inf: p_ip-10.0.0.107 clone_httpd
colocation c_10.0.0.108_on_http inf: p_ip-10.0.0.108 clone_httpd
colocation c_10.0.0.109_on_http inf: p_ip-10.0.0.109 clone_httpd
colocation c_10.0.0.110_on_http inf: p_ip-10.0.0.110 clone_httpd
colocation c_10.0.0.111_on_http inf: p_ip-10.0.0.111 clone_httpd
order o_httpd_before_10.0.0.104 inf: clone_httpd p_ip-10.0.0.104
order o_httpd_before_10.0.0.105 inf: clone_httpd p_ip-10.0.0.105
order o_httpd_before_10.0.0.106 inf: clone_httpd p_ip-10.0.0.106
order o_httpd_before_10.0.0.107 inf: clone_httpd p_ip-10.0.0.107
order o_httpd_before_10.0.0.108 inf: clone_httpd p_ip-10.0.0.108
order o_httpd_before_10.0.0.109 inf: clone_httpd p_ip-10.0.0.109
order o_httpd_before_10.0.0.110 inf: clone_httpd p_ip-10.0.0.110
order o_httpd_before_10.0.0.111 inf: clone_httpd p_ip-10.0.0.111
property $id="cib-bootstrap-options" \
        dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore" \
        last-lrm-refresh="1364418010" \
        placement-strategy="balanced" \
        cluster-recheck-interval="60s"
rsc_defaults $id="rsc-options" \
        failure-timeout="60s"

references

http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html

http://clusterlabs.org/

http://savannah.nongnu.org/projects/crmsh/


Install - Corosync+Pacemaker

REPOS

corosync

  • this repo also provides a newer pacemaker, 1.1.8 - however crm shell is excluded and pcs is no where to be found

http://clusterlabs.org/rpm-next/rhel-5/

[clusterlabs-next-rhel5]
name=High Availability/Clustering server technologies (rhel-5-next)
baseurl=http://www.clusterlabs.org/rpm-next/rhel-5
metadata_expire=45m
type=rpm-md
gpgcheck=0
enabled=1
## do NOT update pacemaker - we want to keep crmsh
exclude=pacemaker*

pacemaker

  • version 1.1.5

http://clusterlabs.org/rpm-next/epel-5/

[clusterlabs-next-epel5]
name=High Availability/Clustering server technologies (epel-5-next)
baseurl=http://www.clusterlabs.org/rpm-next/epel-5
metadata_expire=45m
type=rpm-md
gpgcheck=0
enabled=1


Install

1) yum install -y pacemaker corosync heartbeat


2) Generate corosync key ( sync on both servers)

corosync-keygen
chown root:root /etc/corosync/authkey
chmod 400 /etc/corosync/authkey 
# copy this key to both servers


3) Corosync config ( sync on both servers)

/etc/corosync/corosync.conf
totem {
 version: 2
 token: 5000
 token_retransmits_before_loss_const: 20
 join: 1000
 consensus: 7500
 vsftype: none
 max_messages: 20
 ## uses the key ser just generated
 secauth: on
 threads: 0
 clear_node_high_bit: yes
 
 interface {
  ringnumber: 0
  ## MAKE THESE UNIQUE and uncomment
  #bindnetaddr: 172.16.50.10
  #mcastaddr: 226.94.50.231
  mcastport: 5405
 }
}
 
logging {
 fileline: off
 to_syslog: yes
 to_stderr: no
 syslog_facility: daemon
 debug: on
 timestamp: on
}
 
amf {
 mode: disabled
}
 
aisexec {
        # Run as root - this is necessary to be able to manage resources with Pacemaker
        user:        root
        group:       root
}


/etc/corosync/service.d/pcmk
service {
 # Load the Pacemaker Cluster Resource Manager
 name: pacemaker
 ver: 1
 }


4) start COROSYNC and PACEMAKER

  • must be in that order - reverse on shutdown
/etc/rc.d/init.d/corosync start 
/etc/rc.d/init.d/pacemaker start


5) disable stonith and quorum (two node cluster)

crm configure property stonith-enabled="false"
crm configure property no-quorum-policy=ignore


5) check your cluster - this may take a couple seconds/minutes before the nodes are added (first time)

# crm_mon
 
============
Last updated: Wed Mar 27 14:54:45 2013
Stack: openais
Current DC: modbalancer2 - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
9 Resources configured.
============
 
Online: [ modbalancer1 modbalancer2 ]



packages

  • known working packages
rpm -q -a --queryformat='%{N}-%{V}-%{R}.%{arch}\n' | egrep pacemaker\|corosync\|heart
corosync-1.4.1-7.el5.1.x86_64
heartbeat-libs-3.0.2-2.el5.x86_64
heartbeat-libs-3.0.2-2.el5.i386
corosynclib-1.4.1-7.el5.1.x86_64
corosync-1.4.1-7.el5.1.i686
pacemaker-1.1.5-1.1.el5.x86_64
pacemaker-libs-1.1.5-1.1.el5.x86_64
pacemaker-libs-1.1.5-1.1.el5.i386
corosynclib-1.4.1-7.el5.1.i686
heartbeat-3.0.2-2.el5.x86_64
pacemaker-1.1.5-1.1.el5.i386


After Install notes

  • disabled updates - manual update if you want
/etc/yum.conf
## repackage in case you need to go back
tsflags=repackage
 
## do NOT update pacemaker or corosync - things will break!
exclude=pacemaker* corosync*

PCS [testing]

  • Redhat decided to scrap crmshell (crm: suse's mature baby) for PCS (in house infantile replacement)
  • crmsh will still be developed and used, but you will have to compile or install from other means


  • crmsh does not seem to work from source on centos5 (other sources have rpms for centos6+)
  • pcs does seem to work on Centos5. (I still prefer CRMSH)
  • You will need epel to install Python2.6

Build

cd /usr/src
git clone https://github.com/feist/pcs.git
cd /usr/src/pcs
 
# EDIT the Makefile
# replace 'python' with 'python2.6' in all places
# optional: you could just change python to point to python2.6 binary - no clue what that breaks
 
make install
/usr/bin/python2.6 /usr/sbin/pcs 
 
Usage: pcs [-f file] [-h] [commands]...
Control and configure pacemaker and corosync.
 
Options:
    -h          Display usage and exit
    -f file     Perform actions on file instead of active CIB
 
Commands:
    resource    Manage cluster resources
    cluster     Configure cluster options and nodes
    stonith     Configure fence devices
    property    Set pacemaker properties
    constraint  Set resource constraints
    status      View cluster status
    config      Print full cluster configuration
Personal tools
Namespaces

Variants
Views
Actions
Navigation
Toolbox