File:Check crm

From RARForge
Jump to navigation Jump to search

Check_crm(file size: 5 KB, MIME type: application/x-perl)

Warning: This file type may contain malicious code. By executing it, your system may be compromised.

Nagios cluster check - pacemaker/crm

<source lang=perl>

  1. !/usr/bin/perl
  2. check_crm_v0_5
  3. Copyright © 2011 Philip Garner, Sysnix Consultants Limited
  4. This program is free software: you can redistribute it and/or modify
  5. it under the terms of the GNU General Public License as published by
  6. the Free Software Foundation, either version 3 of the License, or
  7. (at your option) any later version.
  8. This program is distributed in the hope that it will be useful,
  9. but WITHOUT ANY WARRANTY; without even the implied warranty of
  10. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  11. GNU General Public License for more details.
  12. You should have received a copy of the GNU General Public License
  13. along with this program. If not, see <http://www.gnu.org/licenses/>.
  14. Authors: Phil Garner - phil@sysnix.com & Peter Mottram - peter@sysnix.com
  15. Acknowledgements: Vadym Chepkov, Sönke Martens
  16. v0.1 09/01/2011
  17. v0.2 11/01/2011
  18. v0.3 22/08/2011 - bug fix and changes suggested by Vadym Chepkov
  19. v0.4 23/08/2011 - update for spelling and anchor regex capture (Vadym Chepkov)
  20. v0.5 29/09/2011 - Add standby warn/crit suggested by Sönke Martens & removal
  21. of 'our' to 'my' to completely avoid problems with ePN
  22. NOTES: Requires Perl 5.8 or higher & the Perl Module Nagios::Plugin
  23. Nagios user will need sudo acces - suggest adding line below to
  24. sudoers
  25. nagios ALL=(ALL) NOPASSWD: /usr/sbin/crm_mon -1 -r -f
  26. In sudoers if requiretty is on (off state is default)
  27. you will also need to add the line below
  28. Defaults:nagios !requiretty

use warnings; use strict; use Nagios::Plugin;

  1. Lines below may need changing if crm_mon or sudo installed in a
  2. different location.

my $sudo = '/usr/bin/sudo'; my $crm_mon = '/usr/sbin/crm_mon';

my $np = Nagios::Plugin->new(

   shortname => 'check_crm',
   version   => '0.5',
   usage     => "Usage: %s <ARGS>\n\t\t--help for help\n",

);

$np->add_arg(

   spec => 'warning|w',
   help =>

'If failed Nodes, stopped Resources detected or Standby Nodes sends Warning instead of Critical (default) as long as there are no other errors and there is Quorum',

   required => 0,

);

$np->add_arg(

   spec     => 'standbyignore|s',
   help     => 'Ignore any node(s) in standby, by default sends Critical',
   required => 0,

);

$np->getopts;

my @standby;

  1. Check for -w option set warn if this is case instead of crit

my $warn_or_crit = 'CRITICAL'; $warn_or_crit = 'WARNING' if $np->opts->warning;

my $fh;

open( $fh, "$sudo $crm_mon -1 -r -f|" )

 or $np->nagios_exit( CRITICAL, "Running sudo has failed" );

foreach my $line (<$fh>) {

   if ( $line =~ m/Connection to cluster failed\:(.*)/i ) {
       # Check Cluster connected
       $np->nagios_exit( CRITICAL, "Connection to cluster FAILED: $1" );
   }
   elsif ( $line =~ m/Current DC:/ ) {
       # Check for Quorum
       if ( $line =~ m/partition with quorum$/ ) {
           # Assume cluster is OK - we only add warn/crit after here
           $np->add_message( OK, "Cluster OK" );
       }
       else {
           $np->add_message( CRITICAL, "No Quorum" );
       }
   }
   elsif ( $line =~ m/^offline:\s*\[\s*(\S.*?)\s*\]/i ) {

next if $line =~ /\/dev\/block\//i;

       # Count offline nodes
       my @offline = split( /\s+/, $1 );
       my $numoffline = scalar @offline;
       $np->add_message( $warn_or_crit, ": $numoffline Nodes Offline" );
   }
   elsif ( $line =~ m/^node\s+(\S.*):\s*standby/i ) {
       # Check for standby nodes (suggested by Sönke Martens)
       # See later in code for message created from this
       push @standby, $1;
   }
   elsif ( $line =~ m/\s*([\w-]+)\s+\(\S+\)\:\s+Stopped/ ) {
       $np->add_message( $warn_or_crit, ": $1 Stopped" );
   }
   elsif ( $line =~ m/\s*stopped\:\s*\[(.*)\]/i ) {

next if $line =~ /openvz/i;

       # Check Master/Slave stopped
       $np->add_message( $warn_or_crit, ": $1 Stopped" );
   }
   elsif ( $line =~ m/^Failed actions\:/ ) {

### RR I have disabled this in my production.. #next;

       $np->add_message( CRITICAL,
           ": FAILED actions detected or not cleaned up" );
   }
   elsif (
       $line =~ m/\s*(\S+?)\s+ \(.*\)\:\s+\w+\s+\w+\s+\(unmanaged\)\s+FAILED/ )
   {
       # Check Unmanaged
       $np->add_message( CRITICAL, ": $1 unmanaged FAILED" );
   }
   elsif ( $line =~ m/\s*(\S+?)\s+ \(.*\)\:\s+not installed/i ) {
       # Check for errors
       $np->add_message( CRITICAL, ": $1 not installed" );
   }
   elsif ( $line =~ m/\s*(\S+?):.*(fail-count=\d+)/i ) {

my $one = $1; my $two = $2;

if (-f "/tmp/backup.$1") { last; }

       $np->add_message( WARNING, ": $1 failure detected, $2" );
   }

}

  1. If found any Nodes in standby & no -s option used send warn/crit

if ( scalar @standby > 0 && !$np->opts->standbyignore ) {

   $np->add_message( $warn_or_crit,
       ": " . join( ', ', @standby ) . " in Standby" );

}

close($fh) or $np->nagios_exit( CRITICAL, "Running crm_mon FAILED" );

$np->nagios_exit( $np->check_messages() );


</source>

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeDimensionsUserComment
current23:47, 1 March 2013 (5 KB)Robertr (talk | contribs)Nagios cluster check - pacemaker/crm

The following page uses this file: