File:Check crm

From RARForge
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Check_crm(file size: 5 KB, MIME type: application/x-perl)

Warning: This file type may contain malicious code. By executing it, your system may be compromised.

Nagios cluster check - pacemaker/crm

<source lang=perl>

  1. !/usr/bin/perl
  2. check_crm_v0_5
  3. Copyright © 2011 Philip Garner, Sysnix Consultants Limited
  4. This program is free software: you can redistribute it and/or modify
  5. it under the terms of the GNU General Public License as published by
  6. the Free Software Foundation, either version 3 of the License, or
  7. (at your option) any later version.
  8. This program is distributed in the hope that it will be useful,
  9. but WITHOUT ANY WARRANTY; without even the implied warranty of
  10. MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
  11. GNU General Public License for more details.
  12. You should have received a copy of the GNU General Public License
  13. along with this program. If not, see <http://www.gnu.org/licenses/>.
  14. Authors: Phil Garner - phil@sysnix.com & Peter Mottram - peter@sysnix.com
  15. Acknowledgements: Vadym Chepkov, Sönke Martens
  16. v0.1 09/01/2011
  17. v0.2 11/01/2011
  18. v0.3 22/08/2011 - bug fix and changes suggested by Vadym Chepkov
  19. v0.4 23/08/2011 - update for spelling and anchor regex capture (Vadym Chepkov)
  20. v0.5 29/09/2011 - Add standby warn/crit suggested by Sönke Martens & removal
  21. of 'our' to 'my' to completely avoid problems with ePN
  22. NOTES: Requires Perl 5.8 or higher & the Perl Module Nagios::Plugin
  23. Nagios user will need sudo acces - suggest adding line below to
  24. sudoers
  25. nagios ALL=(ALL) NOPASSWD: /usr/sbin/crm_mon -1 -r -f
  26. In sudoers if requiretty is on (off state is default)
  27. you will also need to add the line below
  28. Defaults:nagios !requiretty

use warnings; use strict; use Nagios::Plugin;

  1. Lines below may need changing if crm_mon or sudo installed in a
  2. different location.

my $sudo = '/usr/bin/sudo'; my $crm_mon = '/usr/sbin/crm_mon';

my $np = Nagios::Plugin->new(

   shortname => 'check_crm',
   version   => '0.5',
   usage     => "Usage: %s <ARGS>\n\t\t--help for help\n",

);

$np->add_arg(

   spec => 'warning|w',
   help =>

'If failed Nodes, stopped Resources detected or Standby Nodes sends Warning instead of Critical (default) as long as there are no other errors and there is Quorum',

   required => 0,

);

$np->add_arg(

   spec     => 'standbyignore|s',
   help     => 'Ignore any node(s) in standby, by default sends Critical',
   required => 0,

);

$np->getopts;

my @standby;

  1. Check for -w option set warn if this is case instead of crit

my $warn_or_crit = 'CRITICAL'; $warn_or_crit = 'WARNING' if $np->opts->warning;

my $fh;

open( $fh, "$sudo $crm_mon -1 -r -f|" )

 or $np->nagios_exit( CRITICAL, "Running sudo has failed" );

foreach my $line (<$fh>) {

   if ( $line =~ m/Connection to cluster failed\:(.*)/i ) {
       # Check Cluster connected
       $np->nagios_exit( CRITICAL, "Connection to cluster FAILED: $1" );
   }
   elsif ( $line =~ m/Current DC:/ ) {
       # Check for Quorum
       if ( $line =~ m/partition with quorum$/ ) {
           # Assume cluster is OK - we only add warn/crit after here
           $np->add_message( OK, "Cluster OK" );
       }
       else {
           $np->add_message( CRITICAL, "No Quorum" );
       }
   }
   elsif ( $line =~ m/^offline:\s*\[\s*(\S.*?)\s*\]/i ) {

next if $line =~ /\/dev\/block\//i;

       # Count offline nodes
       my @offline = split( /\s+/, $1 );
       my $numoffline = scalar @offline;
       $np->add_message( $warn_or_crit, ": $numoffline Nodes Offline" );
   }
   elsif ( $line =~ m/^node\s+(\S.*):\s*standby/i ) {
       # Check for standby nodes (suggested by Sönke Martens)
       # See later in code for message created from this
       push @standby, $1;
   }
   elsif ( $line =~ m/\s*([\w-]+)\s+\(\S+\)\:\s+Stopped/ ) {
       $np->add_message( $warn_or_crit, ": $1 Stopped" );
   }
   elsif ( $line =~ m/\s*stopped\:\s*\[(.*)\]/i ) {

next if $line =~ /openvz/i;

       # Check Master/Slave stopped
       $np->add_message( $warn_or_crit, ": $1 Stopped" );
   }
   elsif ( $line =~ m/^Failed actions\:/ ) {

### RR I have disabled this in my production.. #next;

       $np->add_message( CRITICAL,
           ": FAILED actions detected or not cleaned up" );
   }
   elsif (
       $line =~ m/\s*(\S+?)\s+ \(.*\)\:\s+\w+\s+\w+\s+\(unmanaged\)\s+FAILED/ )
   {
       # Check Unmanaged
       $np->add_message( CRITICAL, ": $1 unmanaged FAILED" );
   }
   elsif ( $line =~ m/\s*(\S+?)\s+ \(.*\)\:\s+not installed/i ) {
       # Check for errors
       $np->add_message( CRITICAL, ": $1 not installed" );
   }
   elsif ( $line =~ m/\s*(\S+?):.*(fail-count=\d+)/i ) {

my $one = $1; my $two = $2;

if (-f "/tmp/backup.$1") { last; }

       $np->add_message( WARNING, ": $1 failure detected, $2" );
   }

}

  1. If found any Nodes in standby & no -s option used send warn/crit

if ( scalar @standby > 0 && !$np->opts->standbyignore ) {

   $np->add_message( $warn_or_crit,
       ": " . join( ', ', @standby ) . " in Standby" );

}

close($fh) or $np->nagios_exit( CRITICAL, "Running crm_mon FAILED" );

$np->nagios_exit( $np->check_messages() );


</source>

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeDimensionsUserComment
current23:47, 1 March 2013 (5 KB)Robertr (talk | contribs)Nagios cluster check - pacemaker/crm

The following page uses this file: