LoadBalancer External Sites: Apache mod balancer

From RARForge
Jump to navigation Jump to search

INFO[edit]

  • DO NOT CHANGE servers hostnames - pacemaker will freak out.
  • Centos 5 ( hardware required it )


WHY?
  • LOADbalance ANY external website
  • SSL offloading is required for SSL.
the server will house the SSL cert and direct traffice to external:80


What's running
  • Corosync + Pacemaker cluster (uses heartbeat libs)
  • Apache + mod_proxy + mod_proxy_balancer
  • Two servers (as of 2013-03-26)
  • modbalancer1: host
  • modbalancer2: host


Troubleshooting[edit]

cluster failures[edit]

  • The cluster will retry to fix all services every 60s [ONLY if the cluster is ONLINE]

Working Cluster looks like[edit]

<source> [root@modbalancer2 init.d]# crm_mon -rf1

==[edit]

Last updated: Wed Mar 27 14:39:08 2013 Stack: openais Current DC: modbalancer2 - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 9 Resources configured.

==[edit]

Online: [ modbalancer1 modbalancer2 ]

Full list of resources:

Clone Set: clone_httpd [httpd]
    Started: [ modbalancer2 modbalancer1 ]
p_ip-10.0.0.104     (ocf::heartbeat:IPaddr):        Started modbalancer1
p_ip-10.0.0.105     (ocf::heartbeat:IPaddr):        Started modbalancer2
p_ip-10.0.0.106     (ocf::heartbeat:IPaddr):        Started modbalancer1
p_ip-10.0.0.107     (ocf::heartbeat:IPaddr):        Started modbalancer2
p_ip-10.0.0.108     (ocf::heartbeat:IPaddr):        Started modbalancer1
p_ip-10.0.0.109     (ocf::heartbeat:IPaddr):        Started modbalancer2
p_ip-10.0.0.110     (ocf::heartbeat:IPaddr):        Started modbalancer1
p_ip-10.0.0.111     (ocf::heartbeat:IPaddr):        Started modbalancer2

Migration summary:

  • Node modbalancer1:
  • Node modbalancer2:

</source>


nagios page: check_crm CRITICAL - Connection to cluster FAILED: connection failed[edit]

FIX: make sure corosync is running and packemaker is running


nagios page: httpd:0 httpd:1 Stopped [ Clone OK-NONE!][edit]

FIX: httpd configs are probably messed up on both. You need to fix any apache config errors
Once apache starts properly, cleanup the cluster httpd server

<source> crm resource cleanup clone_httpd </source>


UNCLEAN (offline)[edit]

  • stop pacemaker and corosync - then start - in that order
  • reason (in a bad failure . I.E. pacemaker cannot get status, it would notmally stonith, but that is not enabled)

<source>

  1. Stop

/etc/init.d/pacemaker stop /etc/init.d/corosync stop

  1. Start

/etc/init.d/corosync start /etc/init.d/pacemaker start </source>


nagios page: httpd failure detected, fail-count=1[edit]

These will fix themselves if apache is starting propely.
If you are impatient, you can run

<source> crm resource cleanup clone_httpd </source>


httpd - crm[edit]

  • If apache failed for a while, you may need to kick start pacemaker if you are impatient

<source> crm resource cleanup clone_httpd </source>


Unison[edit]

  • Synchronisation failed : please check /root/unison.log file for diagnosis

FIX: remove /root/.unison/ar* files from both servers and resync manually

<source>

ssh modbalancer1
sudo su -
cp -R /etc/httpd/ /etc/httpd.bak
rm /root/.unison/ar*
ssh modbalancer1 -C "rm /root/.unison/ar*"
/usr/bin/unison -times=true -prefer newer -batch -auto /etc/httpd/ ssh://root@modbalancer2//etc/httpd/

</source>


inotify_watcher[edit]

  • PROCS CRITICAL: 0 processes with args '/usr/local/bin/inotify_watcher.pl'

FIX: check /etc/rc.local to make sure the startup script is there. Start the process in screen <source> screen -dmS inotify /usr/local/bin/inotify_watcher.pl </source>



IP Pool[edit]

All of these need to be public IP addresses - using RFC1918 as an example

<source>

  1. Main block for servers - other might be useable for apache virts (might want to save them for addition servers?)
  2. 10.0.0.48/29:

10.0.0.49: secondary on Cisco Router (VLAN 321) 10.0.0.50: modbalancer1 10.0.0.51: modbalancer2

52 IN PTR
53 IN PTR
54 IN PTR
55 IN PTR

</source> <source>

  1. These IP addresses can be used for Apache+mod_proxy_balancer VIRTs
  2. 10.0.0.104/29: Routed to Vlan 321 on Cisco Router - ip route 10.0.0.104 255.255.255.248 int vlan 321

10.0.0.104 IN PTR mb-vip1 10.0.0.105 IN PTR mb-vip2 10.0.0.106 IN PTR mb-vip3 10.0.0.107 IN PTR mb-vip4 10.0.0.108 IN PTR mb-vip5 10.0.0.109 IN PTR mb-vip6 10.0.0.110 IN PTR mb-vip7 10.0.0.111 IN PTR mb-vip8 </source>

Servers[edit]

modbalancer1[edit]

  • location: n/a
  • 10.0.0.50

modbalancer2[edit]

  • location: n/a
  • 10.0.0.51



Syncing / Configs[edit]

CRON[edit]

  • /etc/httpd auto-syncs (unison) every 5 minutes

<source> [root@modbalancer1 /]# cat /etc/cron.d/unison SHELL=/bin/bash PATH=/sbin:/bin:/usr/sbin:/usr/bin MAILTO=root HOME=/root/

  • /5 * * * * root . /root/.bash_profile && /usr/local/bin/sync_configs.sh &> /dev/null

</source> <source>

  1. !/bin/bash

/usr/bin/unison -terse -batch -auto /etc/httpd/ ssh://root@modbalancer2//etc/httpd/ </source>


Inotify[edit]

  • Inotify script watches /etc/httpd/conf/ and /etc/httpd/conf.d/ for changes.
  • If any *.conf are created / modified / deleted , then ->
  1. Unison will sync modbalancer1 <-> modbalancer2
  2. Apache will be reloaded on both (ONLY if /etc/rc.d/init.d/configtest is verified)


  • /etc/rc.local startup

<source>

  1. !/bin/sh
    1. Inotify watcher for HTTPD configs

screen -dmS inotify /usr/local/bin/inotify_watcher.pl

touch /var/lock/subsys/local </source> <source lang=perl>

  1. !/usr/bin/perl -w

use strict; use Linux::Inotify2; use Path::Class; use File::stat; use Data::Dumper;

my $debug=1; my $wait=10; ## how many seconds to wait after proccessing an event. This will allow us to group events/updates. (60 seconds default)

my @hosts = qw(modbalancer1

              modbalancer2
              );
    1. files must mastch blablabl.$file_ext$ -- for events to fire

my @file_ext = qw(conf);

    1. watched dirs - recursive

my @dirs = ("/etc/httpd/conf.d/",

           "/etc/httpd/conf",                                                                                                                                                  
           );                                                                                                                                                                                                 
    1. This requires the script from prowl app
    2. url: http://prowlapp.com/static/prowl.pl

my $prowl = {'enabled' => '0',

            'script' => '/usr/local/bin/prowl.pl',                                                                                                                                              
            'api_key' => '<enter prowl api key>',                                                                                                                             
            'app_name' => 'MOD Balancer',
        };
                                                                                                              1. Configuration DONE #######################################################

my $inotify = Linux::Inotify2->new

   or die "unable to create new inotify object: $!";

my $host=`hostname`; ## maybe used for later my $pid=$$;

    1. on start - lets scan the library

&notifyProwl('Initializing','program startup');

    1. setup initial watch dirs

my $dirs =0; my $restart_httpd=0; my @start_list; ## used for notifications my @finish_list; ## used for notifications my @remove_list; ## used for notifications

    1. Intialize watched dirs - INIT

foreach my $drop_dir(@dirs) {

   &log("Processing subdirs for: $drop_dir",1);
   my $dir = dir($drop_dir);
   my $c = 0;
   $dir->traverse(sub{
       my ($child, $cont, $indent) = @_;
       if ($child->is_dir) {
           $dirs++;
           &log("subdir: $child",1);
           $inotify->watch("$child",  IN_CLOSE_WRITE | IN_CREATE | IN_MOVED_TO |  IN_DELETE , \&MyWatch);
           
           ## in_modify if too verbose.. it grabs any changes to file before final close
           ## IN_CLOSE_WRITE == after a file is written too/closed
           ## IN_CREATE == new directories/files
           ## IN_MOVED_TO == moved directory/files (to watch dirs only)
       }
       $cont->($c + 1);
   });

}

    1. log/notify status of watched dirs

my $init_s = "Watching $dirs directories"; &log($init_s,1); &notifyProwl('Initialized',$init_s);

    1. LOOP to keep watching dirs/files

while (1) {

   $inotify->poll;
   my $had_event = 0;
   ## notify on sync start and finish
   
   if (@start_list)  {  
       my $info = join(', ', @start_list);
       my $title = 'file Started';
       &notifyProwl($title,$info);
       @start_list = ();
   }
   if (@finish_list) {  
       my $info = join(', ', @finish_list);
       my $title = 'file Finished';
       &notifyProwl($title,$info);
       @finish_list = ();
   }
   if (@remove_list)  { 
       &notifyProwl('Removed',join(', ', @remove_list));  
       @remove_list = ();
   }
   
   if ($restart_httpd) {
       $had_event = 1;
       $restart_httpd = 0;
       &restart_HTTPD();
   }
   
   if ($had_event) {
       &log("Sleeping '$wait' seconds before processing any new events");
       sleep($wait);
       &log("I\'m awake again - ready to process any new events");
   }
   

}


sub restart_HTTPD() {

   &log("Unison sync && Restarting HTTPD");
   &notifyProwl('unison-HTTPD','sync-restart');
   my $out;
   $out = `/usr/local/bin/sync_configs.sh  2>&1`;
   
   ## Verify configs on Node 1 and Node 2
   my $failed;
   foreach my $host (@hosts) {
       my $check = `ssh $host /etc/init.d/httpd configtest 2>&1`;
       if ($check !~ /Syntax OK/i) {
           $failed .= "$host: $check\n";
       }
   }    
   
   if (!$failed) {
       ## config passed - reload apache on both 
       foreach my $host (@hosts) {
           $out .= "$host: ";
           $out .= `ssh $host -C /etc/init.d/httpd reload  2>&1`;
       }
   } else {
       $out = "HTTPD Config FAIL: cannot reload - $failed";
   }
   &log($out);
   &notifyProwl('unison-HTTPD',$out);

}

sub MyWatch() {

   my $event = shift;
   my $name = $event->fullname;
   my $file_name = $event->name;
   my $log = 'unknown';
   
   ## files to skip -- for now, emacs turds
   if ($file_name =~ /^\..*/i || $file_name =~ /^\#.*/i ) { 
       $log =  "SKIPPING: $name  [$file_name]";
   }
   
   ## continue on..
   elsif ($event->IN_IGNORED) {
       ## this is a DIR action
       ## remove watch - if directory
       $event->w->cancel;  ## cancel watch
       $log = "DIR $name removed -- cancelling watch";
       foreach my $ext (@file_ext) {
           if ($name =~ /\.$ext$/i) {    push (@remove_list,$name);     }
       }
   } elsif ( $event->IN_DELETE) {
       ## this is a FILE action
       ## group deletes together - for notify
       $log = "FILE: $name removed";
       foreach my $ext (@file_ext) {
           $log = "FILE: $name removed - will notify prowl";
           if ($name =~ /\.$ext$/i) { 
               $restart_httpd = 1; ## restart apache - conf file removed
               push (@remove_list,$name);    
           }
       }
   } else {
       if (-d $name) {
           $inotify->watch($name,  IN_CLOSE_WRITE | IN_CREATE | IN_MOVED_TO , \&MyWatch);
           $log = "DIR: $name created -- adding to watchlist";
           &notifyProwl('new',$log);
       } elsif (-f $name && $event->IN_CREATE) {
           ## file is created, but has not finished writing - no action needed -  just logging
           $log = "$name is FILE -- in_create called.. waiting for IN_CLOSE_WRITE to process";
           foreach my $ext (@file_ext) {
               ## push new files to start_list (sync started notifications)
               ## skip notify on this
               #if ($file_name =~ /\.$ext$/i) {    push (@start_list,$name);     }
           }
       } else {
           $log = "$name is IN_CLOSE_WRITE" if $event->IN_CLOSE_WRITE;
           $log = "$name is IN_CREATE" if $event->IN_CREATE;
           $log = "$name is IN_MOVED_TO" if $event->IN_MOVED_TO;
           $log = "events for $name have been lost" if $event->IN_Q_OVERFLOW;
           if ($file_name =~ /^\..*\.\w{5}$/) {
               &log("$file_name must be rsync - skip it");  
           } else {
               ## only update on ext matching @FILES
               foreach my $ext (@file_ext) {
                   ## push new files to finish_list (sync finished notifications)
                   if ($file_name =~ /\.$ext$/i) {
                       push (@finish_list,"$name");
                       &log("$name $ext is matched file -- restart httpd");
                       $restart_httpd = 1; ## restart apache
                   }
               }
               if (!$restart_httpd) {&log("$file_name does not match ext of \@files_ext -- skipping httpd restart");   }
           }
       }
   }
   &log($log);

}


sub log() {

   my $msg = shift;
   my $print= shift;
   if ($debug || $print) {
       print localtime() . ": $msg\n";    
   }
   system("logger  -t $0\[$pid\] \"$msg\"");

}


sub notifyProwl() {

   if (defined($prowl->{enabled}) && $prowl->{enabled} == 1) {
       my ($event,$msg) = @_;
       my $cmd = sprintf("perl %s -apikey='%s' -application='%s' -event='%s' -notification='%s'",$prowl->{script}, $prowl->{api_key}, $prowl->{app_name}, $event, $msg);
       my $res = `$cmd`;
       chomp($cmd);
       chomp($res);
       &log("Notify: Prowl Event: $cmd");
       &log("Notify: Prowl Result: $res");
   }
   

} </source>



Load Balanced WEBSITE - example[edit]

  • IP: 10.0.0.111


Apache Config[edit]

  • Location: /etc/httpd/conf.d/balancer/10.0.0.111.conf
  • Customer Files: I.E. htpasswd, customer certs & keys, etc..
/etc/httpd/conf.d/balancer/10.0.0.111/
  • Verify the servername IS UNIQUE - This matters mainly for SSL.
If you use a servername already in use, apache will use the FIRST ssl cert

<source lang=text>

    1. IN USE

<VirtualHost 10.0.0.111:80>

ServerName mb-vip8.domain.tld
# don't loose time with IP address lookups 
HostnameLookups Off
UseCanonicalName Off 
Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED
ProxyPreserveHost On
ProxyRequests off
<Proxy balancer://robreed>
        BalancerMember http://8.8.8.8:80 route=1
       BalancerMember http://4.4.4.2:80 route=2
        # the hot standby if all fail?
        # BalancerMember http://10.0.0.5:80 status=+H
        Order Deny,Allow
        Deny from none
        Allow from all
        ProxySet lbmethod=bybusyness stickysession=ROUTEID
</Proxy>


<Location /server-status>
 SetHandler server-status
 Order deny,allow
 Deny from all
 Allow from 10.0.0
</Location>
<Location /balancer-manager>
       SetHandler balancer-manager
       Deny from all
       AuthUserFile /etc/httpd/conf.d/balancer/10.0.0.111/htpasswd
       AuthName authorization
       AuthType Basic
       Allow from 10.0.0
       Satisfy Any
       require valid-user
</Location>
ProxyPass /balancer-manager !
ProxyPass /server-status !
ProxyPass / balancer://robreed/

</VirtualHost>


<VirtualHost 10.0.0.111:443>

ServerName mb-vip8.domain.tld
# don't loose time with IP address lookups 
HostnameLookups Off
UseCanonicalName Off 
Header add Set-Cookie "ROUTEID=.%{BALANCER_WORKER_ROUTE}e; path=/" env=BALANCER_ROUTE_CHANGED
ProxyPreserveHost On
ProxyRequests off
<Proxy balancer://robreed>
        BalancerMember https://8.8.8.8:443 route=1
  1. BalancerMember https://4.4.4.2:443 route=2
        # the hot standby if all fail?
        # BalancerMember https://10.0.0.5:443 status=+H
        Order Deny,Allow
        Deny from none
        Allow from all
        ProxySet lbmethod=bybusyness stickysession=ROUTEID
</Proxy>
<Location /server-status>
 SetHandler server-status
 Order deny,allow
 Deny from all
 Allow from 10.0.0
</Location>
<Location /balancer-manager>
       SetHandler balancer-manager
       Deny from all
       AuthUserFile /etc/httpd/conf.d/balancer/10.0.0.111/htpasswd
       AuthName authorization
       AuthType Basic
       Satisfy Any
       require valid-user
</Location>
ProxyPass /balancer-manager !
ProxyPass /server-status !
ProxyPass / balancer://robreed/
SSLEngine on
SSLProxyEngine On
SSLProtocol all -SSLv2
SSLHonorCipherOrder On
SSLCipherSuite ECDHE-RSA-AES128-SHA256:AES128-GCM-SHA256:RC4:HIGH:!MD5:!aNULL:!EDH
SSLCertificateFile /etc/httpd/conf/ssl.crt/selfsigned.crt
SSLCertificateKeyFile /etc/httpd/conf/ssl.key/selfsigned.key

</VirtualHost>

</source>



Client Logging - X-Forwarded-For[edit]

  • logs will probably be turned off locally
  • clients can use X-Forwarded-For header
example Apache config to log remote IP

<source lang=text> LogFormat "%{X-Forwarded-For}i %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" proxy CustomLog /etc/httpd/logs/rarforge-access_log_proxy proxy </source>



Balancer Manager - Client Interface[edit]

  • clients can loging to http://<their balanced IP>/balancer-manager/ to perform basic tasks
  • an apache restart will revert any changes this might cause odd issues... need to let customer know.


HowTo[edit]

  • Click on worker URL to edit values ( not persistent on apache reload)

<source lang=text> loadfactor 1 Worker load factor. Used with BalancerMember. It is a number between 1 and 100 and defines the normalized

                       weighted load applied to the worker. 

lbset 0 Sets the load balancer cluster set that the worker is a member of. The load balancer will try all members

                       of a lower numbered lbset before trying higher numbered ones. 

route - Route of the worker when used inside load balancer. The route is a value appended to session id.

redirect - Redirection Route of the worker. This value is usually set dynamically to enable safe removal of the node

                       from the cluster. If set all requests without session id will be redirected to the BalancerMember that has 
                       route parameter equal as this value. 

</source> <source lang=text> Status Options:

Dis:  is disabled (removes the server from the active list)
Ign:  is ignore-errors ( monitoring is stopped?)

Stby: is hot-standby ( only used when ALL servers are out of LB)

  1. This can also be set int LB config for persistence if needed
  2. Status can be set (which is the default) by prepending with '+' or cleared by prepending with '-'.
  3. Thus, a setting of 'S-E' sets this worker to Stopped and clears the in-error flag.

status=+[E|S|D|I|H|S]

'D' is disabled, 
'I' is ignore-errors, 
'H' is hot-standby 
'E' is in an error state. 
'S' is stopped, 

</source>

Configs[edit]

corosync[edit]

<source>

  1. /etc/corosync/corosync.conf

totem {

version: 2
token: 5000
token_retransmits_before_loss_const: 20
join: 1000
consensus: 7500
vsftype: none
max_messages: 20
secauth: on
threads: 0
clear_node_high_bit: yes

interface {
 ringnumber: 0
 bindnetaddr: 172.16.50.10
 mcastaddr: 226.94.50.231
 mcastport: 5405
}

}

logging {

fileline: off
to_syslog: yes
to_stderr: no
syslog_facility: daemon
debug: on
timestamp: on

}

amf {

mode: disabled

}

aisexec {

       # Run as root - this is necessary to be able to manage resources with Pacemaker
       user:        root
       group:       root

} </source> <source>

  1. /etc/corosync/service.d/pcmk

service {

# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 1
}

</source>

Pacemaker[edit]

  • use crm to modify/manage config
  • save config: crm configure save <FILENAME>

<source> cat /root/crm.20120327-1441.crm node modbalancer1 node modbalancer2 primitive httpd lsb:httpd \

       op monitor interval="10" timeout="30" start-delay="10" \
       op start interval="0" timeout="120" \
       op stop interval="0" timeout="120"

primitive p_ip-10.0.0.104 ocf:heartbeat:IPaddr \

       params ip="10.0.0.104" cidr_netmask="32" nic="eth2" \
       op monitor interval="2s"

primitive p_ip-10.0.0.105 ocf:heartbeat:IPaddr \

       params ip="10.0.0.105" cidr_netmask="32" nic="eth2" \
       op monitor interval="2s"

primitive p_ip-10.0.0.106 ocf:heartbeat:IPaddr \

       params ip="10.0.0.106" cidr_netmask="32" nic="eth2" \
       op monitor interval="2s"

primitive p_ip-10.0.0.107 ocf:heartbeat:IPaddr \

       params ip="10.0.0.107" cidr_netmask="32" nic="eth2" \
       op monitor interval="2s"

primitive p_ip-10.0.0.108 ocf:heartbeat:IPaddr \

       params ip="10.0.0.108" cidr_netmask="32" nic="eth2" \
       op monitor interval="2s"

primitive p_ip-10.0.0.109 ocf:heartbeat:IPaddr \

       params ip="10.0.0.109" cidr_netmask="32" nic="eth2" \
       op monitor interval="2s"

primitive p_ip-10.0.0.110 ocf:heartbeat:IPaddr \

       params ip="10.0.0.110" cidr_netmask="32" nic="eth2" \
       op monitor interval="2s"

primitive p_ip-10.0.0.111 ocf:heartbeat:IPaddr \

       params ip="10.0.0.111" cidr_netmask="32" nic="eth2" \
       op monitor interval="2s"

clone clone_httpd httpd colocation c_10.0.0.104_on_http inf: p_ip-10.0.0.104 clone_httpd colocation c_10.0.0.105_on_http inf: p_ip-10.0.0.105 clone_httpd colocation c_10.0.0.106_on_http inf: p_ip-10.0.0.106 clone_httpd colocation c_10.0.0.107_on_http inf: p_ip-10.0.0.107 clone_httpd colocation c_10.0.0.108_on_http inf: p_ip-10.0.0.108 clone_httpd colocation c_10.0.0.109_on_http inf: p_ip-10.0.0.109 clone_httpd colocation c_10.0.0.110_on_http inf: p_ip-10.0.0.110 clone_httpd colocation c_10.0.0.111_on_http inf: p_ip-10.0.0.111 clone_httpd order o_httpd_before_10.0.0.104 inf: clone_httpd p_ip-10.0.0.104 order o_httpd_before_10.0.0.105 inf: clone_httpd p_ip-10.0.0.105 order o_httpd_before_10.0.0.106 inf: clone_httpd p_ip-10.0.0.106 order o_httpd_before_10.0.0.107 inf: clone_httpd p_ip-10.0.0.107 order o_httpd_before_10.0.0.108 inf: clone_httpd p_ip-10.0.0.108 order o_httpd_before_10.0.0.109 inf: clone_httpd p_ip-10.0.0.109 order o_httpd_before_10.0.0.110 inf: clone_httpd p_ip-10.0.0.110 order o_httpd_before_10.0.0.111 inf: clone_httpd p_ip-10.0.0.111 property $id="cib-bootstrap-options" \

       dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
       cluster-infrastructure="openais" \
       expected-quorum-votes="2" \
       stonith-enabled="false" \
       no-quorum-policy="ignore" \
       last-lrm-refresh="1364418010" \
       placement-strategy="balanced" \
       cluster-recheck-interval="60s"

rsc_defaults $id="rsc-options" \

       failure-timeout="60s"

</source>

references[edit]

http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html

http://clusterlabs.org/

http://savannah.nongnu.org/projects/crmsh/


Install - Corosync+Pacemaker[edit]

REPOS[edit]

corosync[edit]

  • this repo also provides a newer pacemaker, 1.1.8 - however crm shell is excluded and pcs is no where to be found

http://clusterlabs.org/rpm-next/rhel-5/

<source> [clusterlabs-next-rhel5] name=High Availability/Clustering server technologies (rhel-5-next) baseurl=http://www.clusterlabs.org/rpm-next/rhel-5 metadata_expire=45m type=rpm-md gpgcheck=0 enabled=1

    1. do NOT update pacemaker - we want to keep crmsh

exclude=pacemaker* </source>

pacemaker[edit]

  • version 1.1.5

http://clusterlabs.org/rpm-next/epel-5/

<source> [clusterlabs-next-epel5] name=High Availability/Clustering server technologies (epel-5-next) baseurl=http://www.clusterlabs.org/rpm-next/epel-5 metadata_expire=45m type=rpm-md gpgcheck=0 enabled=1 </source>


Install[edit]

1) yum install -y pacemaker corosync heartbeat


2) Generate corosync key ( sync on both servers) <source> corosync-keygen chown root:root /etc/corosync/authkey chmod 400 /etc/corosync/authkey

  1. copy this key to both servers

</source>


3) Corosync config ( sync on both servers)

/etc/corosync/corosync.conf

<source> totem {

version: 2
token: 5000
token_retransmits_before_loss_const: 20
join: 1000
consensus: 7500
vsftype: none
max_messages: 20
## uses the key ser just generated
secauth: on
threads: 0
clear_node_high_bit: yes

interface {
 ringnumber: 0
 ## MAKE THESE UNIQUE and uncomment
 #bindnetaddr: 172.16.50.10
 #mcastaddr: 226.94.50.231
 mcastport: 5405
}

}

logging {

fileline: off
to_syslog: yes
to_stderr: no
syslog_facility: daemon
debug: on
timestamp: on

}

amf {

mode: disabled

}

aisexec {

       # Run as root - this is necessary to be able to manage resources with Pacemaker
       user:        root
       group:       root

} </source>


/etc/corosync/service.d/pcmk

<source> service {

# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 1
}

</source>


4) start COROSYNC and PACEMAKER

  • must be in that order - reverse on shutdown

<source> /etc/rc.d/init.d/corosync start /etc/rc.d/init.d/pacemaker start </source>


5) disable stonith and quorum (two node cluster) <source> crm configure property stonith-enabled="false" crm configure property no-quorum-policy=ignore </source>


5) check your cluster - this may take a couple seconds/minutes before the nodes are added (first time) <source>

  1. crm_mon
==[edit]

Last updated: Wed Mar 27 14:54:45 2013 Stack: openais Current DC: modbalancer2 - partition with quorum Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f 2 Nodes configured, 2 expected votes 9 Resources configured.

==[edit]

Online: [ modbalancer1 modbalancer2 ] </source>



packages[edit]

  • known working packages

<source> rpm -q -a --queryformat='%{N}-%{V}-%{R}.%{arch}\n' | egrep pacemaker\|corosync\|heart corosync-1.4.1-7.el5.1.x86_64 heartbeat-libs-3.0.2-2.el5.x86_64 heartbeat-libs-3.0.2-2.el5.i386 corosynclib-1.4.1-7.el5.1.x86_64 corosync-1.4.1-7.el5.1.i686 pacemaker-1.1.5-1.1.el5.x86_64 pacemaker-libs-1.1.5-1.1.el5.x86_64 pacemaker-libs-1.1.5-1.1.el5.i386 corosynclib-1.4.1-7.el5.1.i686 heartbeat-3.0.2-2.el5.x86_64 pacemaker-1.1.5-1.1.el5.i386 </source>


After Install notes[edit]

  • disabled updates - manual update if you want
/etc/yum.conf

<source>

    1. repackage in case you need to go back

tsflags=repackage

    1. do NOT update pacemaker or corosync - things will break!

exclude=pacemaker* corosync* </source>

PCS [testing][edit]

  • Redhat decided to scrap crmshell (crm: suse's mature baby) for PCS (in house infantile replacement)
  • crmsh will still be developed and used, but you will have to compile or install from other means


  • crmsh does not seem to work from source on centos5 (other sources have rpms for centos6+)
  • pcs does seem to work on Centos5. (I still prefer CRMSH)
  • You will need epel to install Python2.6

Build[edit]

<source> cd /usr/src git clone https://github.com/feist/pcs.git cd /usr/src/pcs

  1. EDIT the Makefile
  2. replace 'python' with 'python2.6' in all places
  3. optional: you could just change python to point to python2.6 binary - no clue what that breaks

make install </source> <source> /usr/bin/python2.6 /usr/sbin/pcs

Usage: pcs [-f file] [-h] [commands]... Control and configure pacemaker and corosync.

Options:

   -h          Display usage and exit
   -f file     Perform actions on file instead of active CIB

Commands:

   resource    Manage cluster resources
   cluster     Configure cluster options and nodes
   stonith     Configure fence devices
   property    Set pacemaker properties
   constraint  Set resource constraints
   status      View cluster status
   config      Print full cluster configuration


</source>