2 Node Cluster: Dual Primary DRBD + CLVM + KVM + Live Migrations: Difference between revisions

From RARForge
Jump to navigation Jump to search
(No difference)

Revision as of 20:01, 18 April 2013

 RARforge :: 2 Node Cluster: Dual Primary DRBD + CLVM + KVM + Live Migrations


a brain dump...

OS Details

  • OS Centos 6.3
  • Packages listed here is a base ( others will be installed )
    • pacemaker-1.1.7-6.el6.x86_64
    • cman-
    • corosync-1.4.1-7.el6_3.1.x86_64
    • drbd83-utils-8.3.15-1.el6.elrepo.x86_64
    • libvirt-0.9.10-21.el6_3.8.x86_64



my naming conventions are not great (vg and lv named the same) this was a work in progress..
  • PV -> VG -> LV -> DRBD PV -> VG (CLVM) > LV [ Raw KVM Image ]
    • PV: /dev/md10 -> VG: raid10 -> LV: drbd_spacewalk -> PV: /dev/drbd9 -> VG: drbd_spacewalk -> LV: spacewalk -- spacewalk-ha kvm
  • node1
    • PV: /dev/md10
    • VG: raid10
  • node2
    • PV: /dev/sdb1
    • VG: raid1

Dual Primary DRBD/KVM Virt Install

New KVM Virt - Details

  • NewVirt: spacewalk
  • SIZE: 20GB
  • DRBD res: 8
  • NODE1
Name: bigeye
VG: raid1
  • NODE2
Name: blindpig
VG: raid10
  • KVM DISK cache setting: none

Creating the Dual Primary DRBD KVM Virt

  • Run this on NODE1

1) create LVM for DRBD device <source lang="bash">

     lvcreate --name drbd_spacewalk --size 21.1GB raid1
     ssh -C lvcreate --name drbd_spacewalk --size 21.1GB raid10

</source> 2) copy spacewalk.res to /etc/drbd.d/ <source lang="bash">

     cp spacewalk.res /etc/drbd.d/
     scp spacewalk.res

</source> 3) reloading drbd <source lang="bash">

     /etc/init.d/drbd reload
     ssh -C /etc/init.d/drbd reload

</source> 4) create DRBD device on both nodes <source lang="bash">

     drbdadm -- --force create-md spacewalk
     ssh -C drbdadm -- --force create-md spacewalk

</source> 5) reloading drbd <source lang="bash">

     /etc/init.d/drbd reload
     ssh -C /etc/init.d/drbd reload

</source> 6) bring drbd up on both nodes <source lang="bash">

     drbdadm up spacewalk
     ssh -C drbdadm up spacewalk

</source> 7) set bigeye primary and overwrite blindpig <source lang="bash">

     drbdadm -- --overwrite-data-of-peer primary spacewalk

</source> 8) set blindpig secondary (should already be set) <source lang="bash">

     ssh -C drbdadm secondary spacewalk

</source> 9) bigeye create PV/VG/LV (not setting VG to cluster aware yet due to LVM bug not using --monitor y) <source lang="bash">

     pvcreate /dev/drbd9
     vgcreate -c n drbd_spacewalk /dev/drbd9
     lvcreate -L20G -nspacewalk drbd_spacewalk

</source> 10) Activating VG drbd_spacewalk -- (should already be, but just incase) <source lang="bash">

     vgchange -a y drbd_spacewalk

</source> 11) create the POOL in virsh <source lang="bash">

     virsh pool-create-as drbd_spacewalk --type=logical --target=/dev/drbd_spacewalk


12a) If this is NEW kvm install - continue following - else go to step 12b

1. Install new virt on bigeye:/dev/drbd_spacewalk/spacewalk named spacewalk-ha
2. After installed and rebooted - scp virt definition and define

<source lang="bash">

scp /etc/libvirt/qemu/spacewalk-ha.xml
ssh -C virsh define /etc/libvirt/qemu/spacewalk-ha.xml


3. Linux? Test virsh shutdown (may need to install acpid)

<source lang="bash">

virsh shutdown -ha


4. SKIP step 12b (go to #13)

12b) If this is a migration from an exsiting KVM virt - continue, else skip this (ONLY if you completed 12a)

1. restore your KVM/LVM to the new LV: of=/dev/drbd_spacewalk/spacewalk bs=1M

<source lang="bash">

command: dd if=<your image files.img> of=/dev/drbd_spacewalk/spacewalk bs=1M


2. Edit the exists KVM xml file -- copy the existing file to edit

<source lang="bash"> cp /etc/libvirt/qemu/spacewalk.xml ./spacewalk-ha.xml </source>

        #-modify: <name>spacewalk</name> to <name>spacewalk-ha</name>
        #-remove: <uuid>[some long uuid]</uuid>

<source lang="bash">

        emacs spacewalk-ha.xml
        cp spacewalk-ha.xml /etc/libvirt/qemu/spacewalk-ha.xml
        # this will setup a uniuq UUID, which is needed before you copy to blindpig
        virsh define /etc/libvirt/qemu/spacewalk-ha.xml
        scp /etc/libvirt/qemu/spacewalk-ha.xml
        ssh -C virsh define /etc/libvirt/qemu/spacewalk-ha.xml


All install work is done. deactivate VG / set cluster aware / and down drbd for pacemaker provisioning

13) deactivate VG drbd_spacewalk on blindpig <source lang="bash">

       vgchange -a n drbd_spacewalk

</source> 14) set drbd primary on blindpig to set VG cluster aware <source lang="bash">

       vgchange -a n drbd_spacewalk
       ssh -C drbdadm primary spacewalk

</source> 15) activate VG on both nodes <source lang="bash">

       vgchange -a y drbd_spacewalk
       ssh -C vgchange -a y drbd_spacewalk

</source> 16) set VG cluster aware on both nodes (only one command is needed due to drbd) <source lang="bash">

       vgchange -c y drbd_spacewalk

</source> 17) deactivate VG <source lang="bash">

       vgchange -a n drbd_spacewalk
       ssh -C vgchange -a n drbd_spacewalk

</source> 18) down drbd on both - so we can put it in pacemaker <source lang="bash">

       drbdadm down spacewalk
       ssh -C drbdadm down spacewalk


Now lets provision Pacemaker -- we already expect you have a working pacemaker config with DLM/CLVM

19) Load the dual primary drbd/lvm RA config to the cluster <source lang="bash">

       crm configure < spacewalk.crm


20) verify all is good with crm_mon: DRBD should look like something below <source lang="bash">

  crm_mon -f
          Master/Slave Set:  ms_drbd-spacewalk [p_drbd-spacewalk]
              Masters: [ blindpig blindpig ]


21) Load the VirtualDomain RA confi to the cluster <source lang="bash">

 crm configure < spacewalk-vd.crm


Files Created
  1. spacewalk.res # for DRBD
  2. spacewalk.crm # DRBD/LVM configs to load into crm configure
  3. spacewalk-vd.crm # KVM VirtualDomain configs to load into crm configure

Config Examples

Pacemaker / crmsh

* Note - we do not monitor LVM. Sometimes LVM command hang and are not really an issue..
* these are all auto created from the script below
primitive p_drbd-spacewalk ocf:linbit:drbd \
        params drbd_resource="spacewalk" \
        operations $id="p_drbd_spacewalk-operations" \
        op monitor interval="20" role="Slave" timeout="20" \
        op monitor interval="10" role="Master" timeout="20" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" start-delay="0"
primitive p_lvm-spacewalk ocf:heartbeat:LVM \
        operations $id="spacewalk-LVM-operations" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        params volgrpname="drbd_spacewalk"
ms ms_drbd-spacewalk p_drbd-spacewalk \
        meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true"
clone clone_lvm-spacewalk p_lvm-spacewalk \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
colocation c_lvm-spacewalk_on_drbd-spacewalk inf: clone_lvm-spacewalk ms_drbd-spacewalk:Master
KVM Virt - VirtualDomain
  • these are all auto created from the script below
primitive p_vd-spacewalk-ha ocf:heartbeat:VirtualDomain \
        params config="/etc/libvirt/qemu/spacewalk-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \
        operations $id="p_vd-spacewalk-operations" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="90" \
        op migrate_from interval="0" timeout="240" \
        op migrate_to interval="0" timeout="240" \
        op monitor interval="10" timeout="30" start-delay="0" \
        meta allow-migrate="true" failure-timeout="10min" target-role="Started"
colocation c_vd-spacewalk-on-master inf: p_vd-spacewalk-ha ms_drbd-spacewalk:Master
order o_drbm-lvm-vd-start-spacewalk inf: ms_drbd-spacewalk:promote clone_lvm-spacewalk:start p_vd-spacewalk-ha:start


  • these are all auto created from the script below
  resource spacewalk {
                protocol        C;

                startup {
                        become-primary-on both;

                net {
                     after-sb-0pri discard-zero-changes;
                     after-sb-1pri discard-secondary;
                     after-sb-2pri disconnect;

                disk {
                    on-io-error detach;
                    fencing resource-only;

                handlers {
                   #split-brain "/usr/lib/drbd/notify-split-brain.sh root";                
                   fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                   after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
                syncer {
                        rate    50M;

                on bigeye {
                        device  /dev/drbd9;
                        disk    /dev/raid1/drbd_spacewalk;
                        meta-disk internal;

                on blindpig {
                        device  /dev/drbd9;
                        disk    /dev/raid10/drbd_spacewalk;
                        meta-disk internal;



  • this will create the config/install above
#cat create.new.sh
NAME=spacewalk    ## virt name
SIZE=20           ## virt size GB
LVMETA=lvmeta     ## volume group on VG stated above for metadata
DRBDNUM=8         ## how many drbds do you have right now?

NODE1_VG=raid1      ## VolumeGroup for DRBD lvm
NODE2_VG=raid10     ## VolumeGroup for DRBD lvm




############ DO NOT EDIT BELOW #######################

let DRBD_SIZE+=1 
let DRBDNUM+=1 
#let DRBDNUM+=1 

echo '  resource '$NAME' {
                protocol        C;

                startup {
                        become-primary-on both;

                net {
                     after-sb-0pri discard-zero-changes;
                     after-sb-1pri discard-secondary;
                     after-sb-2pri disconnect;

                disk {
                    on-io-error detach;
                    fencing resource-only;

                handlers {
                   #split-brain "/usr/lib/drbd/notify-split-brain.sh root";                
                   fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                   after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
                syncer {
                        rate    50M;

                on '$NODE1_NAME' {
                        device  /dev/drbd'$DRBDNUM';
                        disk    /dev/'$NODE1_VG'/drbd_'$NAME';
                        address '$NODE1_IP':'$PORT';
                        meta-disk internal;

                on '$NODE2_NAME' {
                        device  /dev/drbd'$DRBDNUM';
                        disk    /dev/'$NODE2_VG'/drbd_'$NAME';
                        address '$NODE2_IP':'$PORT';
                        meta-disk internal;

' > $NAME.res

echo 'primitive p_drbd-'$NAME' ocf:linbit:drbd \
        params drbd_resource="'$NAME'" \
        operations $id="p_drbd_'$NAME'-operations" \
        op monitor interval="20" role="Slave" timeout="20" \
        op monitor interval="10" role="Master" timeout="20" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" start-delay="0"
primitive p_lvm-'$NAME' ocf:heartbeat:LVM \
        operations $id="'$NAME'-LVM-operations" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        params volgrpname="drbd_'$NAME'"
ms ms_drbd-'$NAME' p_drbd-'$NAME' \
        meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true"
clone clone_lvm-'$NAME' p_lvm-'$NAME' \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
colocation c_lvm-'$NAME'_on_drbd-'$NAME' inf: clone_lvm-'$NAME' ms_drbd-'$NAME':Master
' > $NAME'.crm'

#location drbd_'$NAME'_excl ms_drbd-'$NAME' \
#        rule $id="drbd_'$NAME'_excl-rule" -inf: #uname eq '$NODE3_NAME'

echo 'primitive p_vd-'$NAME'-ha ocf:heartbeat:VirtualDomain \
        params config="/etc/libvirt/qemu/'$NAME'-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \
        operations $id="p_vd-'$NAME'-operations" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="90" \
        op migrate_from interval="0" timeout="240" \
        op migrate_to interval="0" timeout="240" \
        op monitor interval="10" timeout="30" start-delay="0" \
        meta allow-migrate="true" failure-timeout="10min" target-role="Started"
colocation c_vd-'$NAME'-on-master inf: p_vd-'$NAME'-ha ms_drbd-'$NAME':Master
order o_drbm-lvm-vd-start-'$NAME' inf: ms_drbd-'$NAME':promote clone_lvm-'$NAME':start p_vd-'$NAME'-ha:start
' > $NAME'-vd.crm'

## test DRBD before 
cmd="drbdadm dump -t $NAME.res"
$cmd  >/dev/null
if [[ $rc != 0 ]] ; then
    echo -e "\n !!! DRBD config ("$NAME.res")file will not work.. need to fix this first. exiting...\n"
    echo -e " check command: "$cmd"\n";
    echo -e "\n * HINT: you might just need to remove the file /etc/drbd.d/"$NAME.res" [be careful]";
    echo -e "       mv /etc/drbd.d/"$NAME.res" ./$NAME.res.disabled."$NODE1_NAME
    echo -e "       scp "$NODE2":/etc/drbd.d/"$NAME.res" ./$NAME.res.disabled."$NODE2_NAME
    echo -e "       ssh "$NODE2" -C mv /etc/drbd.d/"$NAME.res" /tmp/$NAME.res.disabled"
#    exit $rc
echo -e " * DRBD config verified (it should work)\n"

echo '      '
echo -e '\n# 1) create LVM for DRBD device'
echo '      'lvcreate --name drbd_$NAME --size $DRBD_SIZE'.1GB' $NODE1_VG
echo '      'ssh $NODE2 -C lvcreate --name drbd_$NAME --size $DRBD_SIZE'.1GB' $NODE2_VG

echo -e '\n# 2) copy '$NAME'.res to /etc/drbd.d/'
echo '      'cp $NAME.res /etc/drbd.d/
echo '      'scp $NAME.res $NODE2:/etc/drbd.d/

echo -e '\n# 3) reloading drbd'
echo '      '/etc/init.d/drbd reload
echo '      'ssh $NODE2 -C /etc/init.d/drbd reload

echo -e '\n# 4) create DRBD device on both nodes'
echo '      'drbdadm -- --force create-md $NAME
echo '      'ssh $NODE2 -C drbdadm -- --force create-md $NAME

echo -e '\n# 5) reloading drbd'
echo '      '/etc/init.d/drbd reload
echo '      'ssh $NODE2 -C /etc/init.d/drbd reload

echo -e '\n# 6) bring drbd up on both nodes'
echo '      'drbdadm up $NAME
echo '      'ssh $NODE2 -C drbdadm up $NAME

echo -e '\n# 7) set '$NODE1_NAME' primary and overwrite '$NODE2_NAME
echo '      'drbdadm -- --overwrite-data-of-peer primary  $NAME

echo -e '\n# 8) set '$NODE2_NAME' secondary (should already be set)'
echo '      'ssh $NODE2 -C drbdadm secondary $NAME

echo -e '\n# 9) '$NODE1_NAME' create PV/VG/LV (not setting VG to cluster aware yet due to LVM bug not using --monitor y)'
echo '      'pvcreate /dev/drbd$DRBDNUM
echo '      'vgcreate -c n drbd_$NAME /dev/drbd$DRBDNUM
echo '      'lvcreate -L$SIZE'G' -n$NAME drbd_$NAME

echo -e '\n# 10) Activating VG drbd_'$NAME' -- (should already be, but just incase)'
echo '      'vgchange -a y drbd_$NAME
## ubuntu bug -- enable if ubuntu host
#echo '      'vgchange -a y drbd_$NAME --monitor y

echo -e '\n# 11) create the POOL in virsh'
echo '      'virsh pool-create-as drbd_$NAME --type=logical --target=/dev/drbd_$NAME

echo -e '\n# 12a) If this is NEW kvm install - continue following - else go to step 12b'
echo '        + NOW install new virt from '$NODE1_NAME' on /dev/drbd_'$NAME'/'$NAME named $NAME'-ha'
echo '        # after intalled and rebooted'
echo '        ' scp /etc/libvirt/qemu/$NAME'-ha.xml' $NODE2:/etc/libvirt/qemu/$NAME'-ha.xml'
echo '        ' ssh $NODE2 -C virsh define /etc/libvirt/qemu/$NAME'-ha.xml'
echo '        # test virsh shutdown --  install acpid'
echo '        ' virsh shutdown $NAME1'-ha'
echo '         * SKIP 12b '

echo ' 12b) If this is a migration from an exsiting KVM virt - continue, else skip 2, you already completed step 1 right?'
echo '        ## restore your KVM/LVM to the new LV: of=/dev/drbd_'$NAME'/'$NAME' bs=1M'
echo '        command: dd if=<your image files.img> of=/dev/drbd_'$NAME'/'$NAME' bs=1M'
echo '        ## Edit the exists KVM xml file -- copy the existing file to edit'
echo '        ' cp /etc/libvirt/qemu/$NAME'.xml' ./$NAME'-ha.xml'
echo '         -modify: <name>'$NAME'</name> to <name>'$NAME'-ha</name>'
echo '         -remove: <uuid>[some long uuid]</uuid>'
echo '        ' emacs $NAME'-ha.xml'
echo '        ' cp $NAME'-ha.xml' /etc/libvirt/qemu/$NAME'-ha.xml'
echo '         #' this will setup a uniuq UUID, which is needed before you copy to $NODE2_NAME
echo '        ' virsh define /etc/libvirt/qemu/$NAME'-ha.xml' 
echo '        ' scp /etc/libvirt/qemu/$NAME'-ha.xml' $NODE2:/etc/libvirt/qemu/$NAME'-ha.xml'
echo '        ' ssh $NODE2 -C virsh define /etc/libvirt/qemu/$NAME'-ha.xml'

echo -e '\n#'
echo '# All install work is done. deactivate VG / set cluster aware / and down drbd for pacemaker provisioning'
echo -e "#\n"

echo -e '\n# 13) deactivate VG drbd_'$NAME' on '$NODE2_NAME
## ubuntu bug -- enable if ubuntu host
#echo '        'vgchange -a n drbd_$NAME --monitor y
echo '        'vgchange -a n drbd_$NAME

echo -e '\n# 14) set drbd primary on '$NODE2_NAME' to set VG cluster aware'
## ubuntu bug -- enable if ubuntu host
#echo '        'vgchange -a n drbd_$NAME --monitor y
echo '        'vgchange -a n drbd_$NAME
echo '        'ssh $NODE2 -C drbdadm primary $NAME

echo -e '\n# 15) activate VG on both nodes'
## ubuntu bug -- enable if ubuntu host
#echo '        'vgchange -a y drbd_$NAME --monitor y
#echo '        'ssh $NODE2 -C vgchange -a y drbd_$NAME --monitor y
echo '        'vgchange -a y drbd_$NAME
echo '        'ssh $NODE2 -C vgchange -a y drbd_$NAME

echo -e '\n# 16) set VG cluster aware on both nodes (only one command is needed due to drbd)'
echo '        'vgchange -c y drbd_$NAME

echo -e '\n# 17) deactivate VG'
## ubuntu bug -- enable if ubuntu host
#echo '        'vgchange -a n drbd_$NAME --monitor y
#echo '        'ssh $NODE2 -C vgchange -a n drbd_$NAME --monitor y
echo '        'vgchange -a n drbd_$NAME
echo '        'ssh $NODE2 -C vgchange -a n drbd_$NAME

echo -e '\n# 18) down drbd on both - so we can put it in pacemaker'
echo '        'drbdadm down $NAME
echo '        'ssh $NODE2 -C drbdadm down $NAME

echo -e '\n# 19) MAKE sure the disk cache for the virtio is set to NONE - live migrate will fail is no'

echo -e '\n#'
echo '# Now lets provision Pacemaker -- we already expect you have a working pacemaker config with DLM/CLVM'
echo -e "#\n"

echo -e '\n# 19) Load the dual primary drbd/lvm RA config to the cluster'
echo '        crm configure < '$NAME'.crm'  
echo -e '\n# 20) verify all is good with crm_mon: DRBD should look like something below'
echo -e "   crm_mon -f\n"
echo '           Master/Slave Set:  ms_drbd-'$NAME' [p_drbd-'$NAME']'
echo -e '               Masters: [ '$NODE2_NAME' '$NODE2_NAME" ]\n"

echo -e '\n# 21) Load the VirtualDomain RA confi to the cluster'
echo '  crm configure < '$NAME'-vd.crm' 

echo '#####################################################################'
echo '# Files Created'
echo '# '$NAME'.res    # for DRBD' 
echo '# '$NAME'.crm    # DRBD/LVM configs to load into crm configure'
echo '# '$NAME'-vd.crm # KVM VirtualDomain configs to load into crm configure'


* running the script will test the DRBD resource and at least print a warning
 !!! DRBD config (spacewalk.res) file will not work.. need to fix this first. exiting...

 check command: drbdadm dump -t spacewalk.res

 * HINT: you might just need to remove the file /etc/drbd.d/spacewalk.res [be careful]
       mv /etc/drbd.d/spacewalk.res ./spacewalk.res.disabled.bigeye
       scp ./spacewalk.res.disabled.blindpig
       ssh -C mv /etc/drbd.d/spacewalk.res /tmp/spacewalk.res.disabled



crm configure save /path/to/file.bak


We have the option to snapshopt both the DRBD backing device and the KVM Virt LV
Major issues with backups.

1) LVM DRBD backing device snapshots hang (when primary)

  • workaround: we will set DRBD device as secondary / snapshot+backup DRBD LV / set DRBD primary

2) CLVM does not allow for snapshots

  • workaround: we will set DRBD device as secondary / remove VG cluster bit / snapshot+backup CLVM / set VG cluster bit / set DRBD primary

DRBD backing Device

  • you will have to edit some variables for this to work properly
  • It will also set DRBD and others in unmanaged mode, so pacemaker will not potential fence on failures
  • This is a heavily modified version of http://repo.firewall-services.com/misc/virt/virt-backup.pl (other options like cleanup do not work)

./virt-backup-drbd_backdevice.pl vm=<virt_name> [--compress]
#!/usr/bin/perl -w

# vm == drbd

use XML::Simple;
use Sys::Virt;
use Getopt::Long;

# Set umask

# Some constant
my $drbd_dir = '/etc/drbd.d/';
our %opts = ();
our @vms = ();
our @excludes = ();
our @disks = ();
our $drbd_dev;

my $migrate_to = 'bigeye'; ## host to migrate machines to if they are running locally
my $migrate_from = 'blindpig'; ## ht

# Sets some defaults values

my $host =`hostname`;
my $migration = 0; #placeholder

# What to run. The default action is to dump
$opts{action} = 'dump';
# Where backups will be stored. This directory must already exists
$opts{backupdir} = '/NFS/_local_/_backups/DRBD/';
# Size of LVM snapshots (which will be used to backup VM with minimum downtown
# if the VM store data directly on a LV)
$opts{snapsize} = '5G';
# Debug
$opts{debug} = 1;
$opts{snapshot} = 1;
$opts{compress} = 'none';
$opts{lvcreate} = '/sbin/lvcreate -c 512';
$opts{lvremove} = '/sbin/lvremove';
$opts{blocksize} = '262144';
$opts{nice} = 'nice -n 19';
$opts{ionice} = 'ionice -c 2 -n 7';

$opts{livebackup} = 1;
$opts{wasrunning} = 1;
# get command line arguments
    "debug"        => \$opts{debug},
    "keep-lock"    => \$opts{keeplock},
    "state"        => \$opts{state},
    "snapsize=s"   => \$opts{snapsize},
    "backupdir=s"  => \$opts{backupdir},
    "vm=s"         => \@vms,
    "action=s"     => \$opts{action},
    "cleanup"      => \$opts{cleanup},
    "dump"         => \$opts{dump},
    "unlock"       => \$opts{unlock},
    "connect=s"    => \$opts{connect},
    "snapshot!"    => \$opts{snapshot},
    "compress:s"   => \$opts{compress},
    "exclude=s"    => \@excludes,
    "blocksize=s" => \$opts{blocksize},
    "help"         => \$opts{help}

# Set compression settings
if ($opts{compress} eq 'lzop'){
    $opts{compext} = ".lzo";
    $opts{compcmd} = "lzop -c";
elsif ($opts{compress} eq 'bzip2'){
    $opts{compext} = ".bz2";
    $opts{compcmd} = "bzip2 -c";
elsif ($opts{compress} eq 'pbzip2'){
    $opts{compext} = ".bz2";
    $opts{compcmd} = "pbzip2 -c";
elsif ($opts{compress} eq 'xz'){
    $opts{compext} = ".xz";
    $opts{compcmd} = "xz -c";
elsif ($opts{compress} eq 'lzip'){
    $opts{compext} = ".lz";
    $opts{compcmd} = "lzip -c";
elsif ($opts{compress} eq 'plzip'){
    $opts{compext} = ".lz";
    $opts{compcmd} = "plzip -c";
# Default is gzip
elsif (($opts{compress} eq 'gzip') || ($opts{compress} eq '')) {
    $opts{compext} = ".gz";
    $opts{compcmd} = "gzip -c";
    $opts{compext} = "";
    $opts{compcmd} = "cat";
# Allow comma separated multi-argument
@vms = split(/,/,join(',',@vms));
@excludes = split(/,/,join(',',@excludes));

# Backward compatible with --dump --cleanup --unlock
$opts{action} = 'dump' if ($opts{dump});
$opts{action} = 'cleanup' if ($opts{cleanup});
$opts{action} = 'unlock' if ($opts{unlock});

# Stop here if we have no vm
# Or the help flag is present
if ((!@vms) || ($opts{help})){
    exit 1;
if (! -d $opts{backupdir} ){
    print "$opts{backupdir} is not a valid directory\n";
    exit 1;

print "\n" if ($opts{debug});

foreach our $vm (@vms){
    print "Checking $vm status\n\n" if ($opts{debug});
    our $backupdir = $opts{backupdir}.'/'.$vm;

    if ($opts{action} eq 'cleanup'){
        print "Running cleanup routine for $vm\n\n" if ($opts{debug});
       # run_cleanup();
    elsif ($opts{action} eq 'dump'){
        print "Running dump routine for $vm\n\n" if ($opts{debug});
#    else {
#        usage();
#        exit 1;
#    }

##############                FUNCTIONS                 ####################

sub prepare_backup{
    my ($source,$res);
    my $target = $vm;
    my $match=0;


    ## locate the backing device for this res
    my @drbd_res = &runcmd("drbdadm dump $vm");
    foreach my $line (@drbd_res) {
        $res = $line;
        if ($match == 1 && $line =~ /disk\s+(.*);/) {
            $source = $1;
            $match = 0;
        if ($line =~ /device\s+.*(drbd\d+)\s+minor/) {
            $drbd_dev = $1;

        if ($line =~ /on\s$host\s+{/i) {    $match = 1; }
    if (!$source) {
        print "Did not find DRBD backing deviced for VM\n";
    } else {
        ## set target backup file based on device
        $target = $source;
        $target =~ s/\//_-_/g; ## rename / to _-_
        $target =~ s/^_-_//g;  ## remove leading _-_

    ## Check if VM is running locally - migrate if off to backup
    ## set migrate = 1, to migrate back when done
    my $local_test = join("",&runcmd("virsh list"));
    if ($local_test =~ /$vm.*running/i) {
        print "$vm running locally - migration to $migrate_to\n";
        my $pvd = &GetPVD($vm);
        &runcmd("crm resource migrate $pvd $migrate_to");
        $migration = 1;
        sleep 1;

        my $remote_test = join("",&runcmd("ssh $migrate_to -C virsh list | grep -i $vm",1));
        while($local_test =~ /(.*$vm.*)/) {
            print " $migrate_from:\t" . $1 . "\n";
            print "(r)$migrate_to:\t$remote_test\n";
            sleep 5;
            $local_test = join("",&runcmd("virsh list",1));
            $remote_test = join("",&runcmd("ssh $migrate_to -C virsh list | grep -i $vm",1));
        $remote_test = join("",&runcmd("ssh $migrate_to -C virsh list | grep -i $vm",1));
        print "We must of migrated ok... \n";
        print "(r)$migrate_to:\t$remote_test\n";

    &runcmd("crm resource unmanage clone_lvm-" . $vm);
    &runcmd("crm resource unmanage ms_drbd-" . $vm);
    #&runcmd("crm resource unmanage p_drbd-" . $vm);
    sleep 1;
    &runcmd("vgchange -aln drbd_" . $vm,0,5);
    sleep 2;
    &runcmd("drbdadm secondary " . $vm);

    &runcmd("ssh $migrate_to -C touch /tmp/backup.$drbd_dev");
    &runcmd("ssh $migrate_to -C touch /tmp/backup.p_drbd-$vm");
    &runcmd("touch /tmp/backup.$drbd_dev");
    &runcmd("touch /tmp/backup.p_drbd-$vm");

    my $sec_check = join("",&runcmd("drbdadm role $vm"));
    if( $sec_check !~ /Secondary\/Primary/) {
        print "Fail: DRBD res [$vm] is not Secondary! result: $sec_check\n";
    } else {
        print "OK: DRBD res [$vm] is Secondary. result: $sec_check\n";
    if (!-d $backupdir) {
        mkdir $backupdir || die $!;
    if (!-d $backupdir.'.meta') {
        mkdir $backupdir . '.meta' || die $!;
    my $time = "_".time();
    # Try to snapshot the source if snapshot is enabled
    if ( ($opts{snapshot}) && (create_snapshot($source,$time)) ){
        print "$source seems to be a valid logical volume (LVM), a snapshot has been taken as " .
            $source . $time ."\n" if ($opts{debug});
        $source = $source.$time;
        push (@disks, {source => $source, target => $target . '_' . $time, type => 'snapshot'});
    # Summarize the list of disk to be dumped
    if ($opts{debug}){
        if ($opts{action} eq 'dump'){
            print "\n\nThe following disks will be dumped:\n\n";
            foreach $disk (@disks){
                print "Source: $disk->{source}\tDest: $backupdir/$vm" . '_' . $disk->{target} .
    if ($opts{livebackup}){
        print "\nWe can run a live backup\n" if ($opts{debug});

sub run_dump{
    # Pause VM, dump state, take snapshots etc..
    # Now, it's time to actually dump the disks
    foreach $disk (@disks){
        my $source = $disk->{source};
        my $dest = "$backupdir/$vm" . '_' . $disk->{target} . ".img$opts{compext}";
        print "\nStarting dump of $source to $dest\n\n" if ($opts{debug});
        my $ddcmd = "$opts{ionice} dd if=$source bs=$opts{blocksize} | $opts{nice} $opts{compcmd} > $dest 2>/dev/null";
        print $ddcmd . "\n";
        unless( system("$ddcmd") == 0 ){
            die "Couldn't dump the block device/file $source to $dest\n";
        # Remove the snapshot if the current dumped disk is a snapshot
        destroy_snapshot($source) if ($disk->{type} eq 'snapshot');
    &runcmd("crm resource manage p_drbd-" . $vm);
    &runcmd("crm resource manage ms_drbd-" . $vm);
    &runcmd("crm resource manage clone_lvm-" . $vm);
    &runcmd("drbdadm primary " . $vm);
    sleep 1;
    &runcmd("ssh $migrate_to -C rm /tmp/backup.$drbd_dev");
    &runcmd("ssh $migrate_to -C rm /tmp/backup.p_drbd-$vm");
    &runcmd("rm /tmp/backup.$drbd_dev");
    &runcmd("rm /tmp/backup.p_drbd-$vm");

    &runcmd("vgchange -ay drbd_" . $vm);
    sleep 1;
    &runcmd("crm_resource -r ms_drbd-$vm -C");
    sleep 1;
    &runcmd("crm_resource -r clone_lvm-$vm -C");
    sleep 3;
    my $prim_check = join("",&runcmd("drbdadm role $vm"));
    print "DRBD resource: $prim_check\n";

    ## if this was migrations, move it back
    if ($migration) {
        if ($prim_check =~ /primary\/primary/i) {
            ## migrate back
            my $local_test = join("",&runcmd("virsh list"));
            if ($local_test !~ /$vm.*running/i) {
                print "$vm NOT running locally - migration to $migrate_from\n";
                my $pvd = &GetPVD($vm);
                &runcmd("crm resource migrate $pvd $migrate_from");
                sleep 1;

                my$remote_test = join("",&runcmd("ssh $migrate_to -C virsh list | grep -i $vm"));
                my $status = 'unknown';
                while($local_test !~ /(.*$vm.*running)/i) {
                    if ($local_test =~ /(.*$vm.*)/i) { $status = $1; }
                    print " $migrate_from:\t" . $status . "\n";
                    print "(r)$migrate_to:\t$remote_test\n";
                    sleep 5;
                    $local_test = join("",&runcmd("virsh list",1));
                    $remote_test = join("",&runcmd("ssh $migrate_to -C virsh list | grep -i $vm",1));

                print "Migration is Done!\n";
                print "(r)$migrate_from:\t$local_test\n";
    ## done

    # And remove the lock file, unless the --keep-lock flag is present
    unlock_vm() unless ($opts{keeplock});

sub usage{
    print "usage:\n$0 --action=[dump|cleanup|chunkmount|unlock] --vm=vm1[,vm2,vm3] [--debug] [--exclude=hda,hdb] [--compress] ".
        "[--state] [--no-snapshot] [--snapsize=<size>] [--backupdir=/path/to/dir] [--connect=<URI>] ".
        "[--keep-lock] [--bs=<block size>]\n" .
    "\n\n" .
    "\t--action: What action the script will run. Valid actions are\n\n" .
    "\t\t- dump: Run the dump routine (dump disk image to temp dir, pausing the VM if needed). It's the default action\n" .
    "\t\t- unlock: just remove the lock file, but don't cleanup the backup dir\n\n" .
    "\t--vm=name: The VM you want to work on (as known by libvirt). You can backup several VMs in one shot " .
        "if you separate them with comma, or with multiple --vm argument. You have to use the name of the domain, ".
        "ID and UUID are not supported at the moment\n\n" .
    "\n\nOther options:\n\n" .
    "\t--snapsize=<snapsize>: The amount of space to use for snapshots. Use the same format as -L option of lvcreate. " .
        "eg: --snapsize=15G. Default is 5G\n\n" .
    "\t--compress[=[gzip|bzip2|pbzip2|lzop|xz|lzip|plzip]]: On the fly compress the disks images during the dump. If you " .
        "don't specify a compression algo, gzip will be used.\n\n" .
    "\t--backupdir=/path/to/backup: Use an alternate backup dir. The directory must exists and be writable. " .
        "The default is /var/lib/libvirt/backup\n\n" .
    "\t--keep-lock: Let the lock file present. This prevent another " .
        "dump to run while an third party backup software (BackupPC for example) saves the dumped files.\n\n";
# Dump the domain description as XML
sub save_drbd_res{
    my $res = shift;
    print "\nSaving XML description for $vm to $backupdir/$vm.res\n" if ($opts{debug});
    open(XML, ">$backupdir/$vm" . ".res") || die $!;
    print XML $res;
    close XML;
# Create an LVM snapshot
# Pass the original logical volume and the suffix
# to be added to the snapshot name as arguments

sub create_snapshot{
    my ($blk,$suffix) = @_;
    my $ret = 0;
    print "Running: $opts{lvcreate} -p r -s -n " . $blk . $suffix .
        " -L $opts{snapsize} $blk > /dev/null 2>&1\n" if $opts{debug};
    if ( system("$opts{lvcreate} -s -n " . $blk . $suffix .
        " -L $opts{snapsize} $blk > /dev/null 2>&1") == 0 ) {
        $ret = 1;
        open SNAPLIST, ">>$backupdir.meta/snapshots" or die "Error, couldn't open snapshot list file\n";
        print SNAPLIST $blk.$suffix ."\n";
        close SNAPLIST;
    return $ret;
# Remove an LVM snapshot
sub destroy_snapshot{
    my $ret = 0;
    my ($snap) = @_;
    print "Removing snapshot $snap\n" if $opts{debug};
    if (system ("$opts{lvremove} -f $snap > /dev/null 2>&1") == 0 ){
        $ret = 1;
    return $ret;
# Lock a VM backup dir
# Just creates an empty lock file
sub lock_vm{
    print "Locking $vm\n" if $opts{debug};
    open ( LOCK, ">$backupdir.meta/$vm.lock" ) || die $!;
    print LOCK "";
    close LOCK;
# Unlock the VM backup dir
# Just removes the lock file
sub unlock_vm{
    print "Removing lock file for $vm\n\n" if $opts{debug};
    unlink <$backupdir.meta/$vm.lock>;

sub runcmd() {
    my $cmd = shift;
    my $quiet = shift;
    my $ignore = shift;
    ## ignore exit code 1 with greps -- not found is OK..
    if ($cmd =~ /grep/) {       $ignore = 1;    }
    if (!$quiet) { print "exec: $cmd ... ";}
    my @output = `$cmd`;
    if ($?) {
        print $ignore . "\n";
        my $e = sprintf("%d", $? >> 8);
        if ($ignore && $ignore == $e) {
            print "exit code = $e -- ignoring exit code $e\n";
        } else {
            printf "\n******** command $cmd exited with value %d\n", $? >> 8;
            print @output;
            exit $? >> 8;
    if (!$quiet) {    print "success\n"; }
    return @output;

## get primative VirtualDomain
sub GetPVD() {
    my $vm = shift;
    my $out = join("",&runcmd("crm resource show | grep $vm | grep VirtualDomain"));
    if ($out =~ /([\d\w\-\_]+)/) {
        return $1;
    } else {
        print "Could not locate Primative VirtualDomain for $vm\n";

CLVM - KVM virt Snapshot

  • you will have to edit some variables for this to work properly
  • It will also set DRBD and others in unmanaged mode, so pacemaker will not potential fence on failures
  • This is a heavily modified version of http://repo.firewall-services.com/misc/virt/virt-backup.pl (other options like cleanup do not work)

./virt-backup-drbd_clvm.pl vm=<virt_name> [--compress]
#!/usr/bin/perl -w

## lots of hacks due to bugs.. in lvm/clustered vg

use XML::Simple;
use Sys::Virt;
use Getopt::Long;
use Data::Dumper;
# Set umask

# Some constant
my $drbd_dir = '/etc/drbd.d/';
our %opts = ();
our @vms = ();
our @excludes = ();
our @disks = ();
our $drbd_dev;

my $migrate_to = 'blindpig'; ## host to migrate machines to if they are running locally
my $migrate_from = 'bigeye'; ## ht

# Sets some defaults values

my $host =`hostname`;
my $migration = 0; #placeholder

# What to run. The default action is to dump
$opts{action} = 'dump';
$opts{backupdir} = '/NFS/_local_/_backups/KVM/';
$opts{snapsize} = '1G';
# Debug
$opts{debug} = 1;
$opts{snapshot} = 1;
$opts{compress} = 'none';
$opts{lvcreate} = '/sbin/lvcreate -c 512';
$opts{lvremove} = '/sbin/lvremove';
$opts{blocksize} = '262144';
$opts{nice} = 'nice -n 19';
$opts{ionice} = 'ionice -c 2 -n 7';

$opts{livebackup} = 1;
$opts{wasrunning} = 1;
# get command line arguments
    "debug"        => \$opts{debug},
    "keep-lock"    => \$opts{keeplock},
    "state"        => \$opts{state},
    "snapsize=s"   => \$opts{snapsize},
    "backupdir=s"  => \$opts{backupdir},
    "vm=s"         => \@vms,
    "action=s"     => \$opts{action},
    "cleanup"      => \$opts{cleanup},
    "dump"         => \$opts{dump},
    "unlock"       => \$opts{unlock},
    "connect=s"    => \$opts{connect},
    "snapshot!"    => \$opts{snapshot},
    "compress:s"   => \$opts{compress},
    "exclude=s"    => \@excludes,
    "blocksize=s" => \$opts{blocksize},
    "help"         => \$opts{help}

# Set compression settings
if ($opts{compress} eq 'lzop'){
    $opts{compext} = ".lzo";
    $opts{compcmd} = "lzop -c";
elsif ($opts{compress} eq 'bzip2'){
    $opts{compext} = ".bz2";
    $opts{compcmd} = "bzip2 -c";
elsif ($opts{compress} eq 'pbzip2'){
    $opts{compext} = ".bz2";
    $opts{compcmd} = "pbzip2 -c";
elsif ($opts{compress} eq 'xz'){
    $opts{compext} = ".xz";
    $opts{compcmd} = "xz -c";
elsif ($opts{compress} eq 'lzip'){
    $opts{compext} = ".lz";
    $opts{compcmd} = "lzip -c";
elsif ($opts{compress} eq 'plzip'){
    $opts{compext} = ".lz";
    $opts{compcmd} = "plzip -c";
# Default is gzip
elsif (($opts{compress} eq 'gzip') || ($opts{compress} eq '')) {
    $opts{compext} = ".gz";
    $opts{compcmd} = "gzip -c";
#    $opts{compcmd} = "pigz -c -p 2";
    $opts{compext} = "";
    $opts{compcmd} = "cat";
# Allow comma separated multi-argument
@vms = split(/,/,join(',',@vms));
@excludes = split(/,/,join(',',@excludes));

# Backward compatible with --dump --cleanup --unlock
$opts{action} = 'dump' if ($opts{dump});
$opts{action} = 'cleanup' if ($opts{cleanup});
$opts{action} = 'unlock' if ($opts{unlock});
# Libvirt URI to connect to
$opts{connect} = "qemu:///system";

# Stop here if we have no vm
# Or the help flag is present
if ((!@vms) || ($opts{help})){
    exit 1;
if (! -d $opts{backupdir} ){
    print "$opts{backupdir} is not a valid directory\n";
    exit 1;

print "\n" if ($opts{debug});

# Connect to libvirt
print "\n\nConnecting to libvirt daemon using $opts{connect} as URI\n" if ($opts{debug});
our $libvirt = Sys::Virt->new( uri => $opts{connect} ) || 
    die "Error connecting to libvirt on URI: $opts{connect}";

foreach our $vm (@vms){
    print "Checking $vm status\n\n" if ($opts{debug});
    our $backupdir = $opts{backupdir}.'/'.$vm;

    my $vdom = $vm . '-ha';
    $vdom =~ s/-ha-ha/-ha/;
    our $dom = $libvirt->get_domain_by_name($vdom) ||
        die "Error opening $vm object";

    if ($opts{action} eq 'cleanup'){
        print "Running cleanup routine for $vm\n\n" if ($opts{debug});
       # run_cleanup();
    elsif ($opts{action} eq 'dump'){
        print "Running dump routine for $vm\n\n" if ($opts{debug});
#    else {
#        usage();
#        exit 1;
#    }

##############                FUNCTIONS                 ####################

sub prepare_backup{
    my ($source,$res);
    my $target = $vm;
    my $match=0;

    my $xml = new XML::Simple ();
    my $data = $xml->XMLin( $dom->get_xml_description(), forcearray => ['disk'] );

    my @drbd_res = &runcmd("drbdadm dump $vm");
    foreach my $line (@drbd_res) {
        $res = $line;
        if ($line =~ /device\s+.*(drbd\d+)\s+minor/) {
            $drbd_dev = $1;

    # Create a list of disks used by the VM
    foreach $disk (@{$data->{devices}->{disk}}){
        if ($disk->{type} eq 'block'){
            $source = $disk->{source}->{dev};
        elsif ($disk->{type} eq 'file'){
            $source = $disk->{source}->{file};
            print "\nSkiping $source for vm $vm as it's type is $disk->{type}: " .
                " and only block is supported\n" if ($opts{debug});
        ## we only support the first block device for now.
        if ($target && $source) {           last;       }

    ## locate the backing device for this res
    #my @drbd_res = &runcmd("drbdadm dump $vm");
    #foreach my $line (@drbd_res) {
    #   $res = $line;
    #   if ($match == 1 && $line =~ /disk\s+(.*);/) {
    #       $source = $1;
    #       $match = 0;
    #   }
    #   if ($line =~ /on\s$host\s+{/i) {    $match = 1; }
    #   }
    #  if (!$source) {
    #   print "Did not find DRBD backing deviced for VM\n";
    #   exit;
    #   } else {
    #   ## set target backup file based on device
    #   $target = $source;
    #   $target =~ s/\//_-_/g; ## rename / to _-_
    #   $target =~ s/^_-_//g;  ## remove leading _-_
    #   }

    ## check if running on node2 - migrate here if so
    my $local_test = join("",&runcmd("virsh list"));

    if ($local_test !~ /$vm.*running/i) {
        my $status = 'not running';
        print "$vm running remotely - migration to $migrate_to\n";
        my $pvd = &GetPVD($vm);
        &runcmd("crm resource migrate $pvd $migrate_to");
        $migration = 1;
        sleep 1;

        my $remote_test = join("",&runcmd("ssh $migrate_from -C virsh list | grep -i $vm",1));
        while($remote_test =~ /(.*$vm.*)/) {
            print " $migrate_to:\t" . $status . "\n";
            print "(r)$migrate_from:\t$remote_test\n";
            sleep 5;
            $local_test = join("",&runcmd("virsh list",1));
            if ($local_test =~ /(.*$vm.*)/i) { $status = $1; }
            $remote_test = join("",&runcmd("ssh $migrate_from -C virsh list | grep -i $vm",1));
        $remote_test = join("",&runcmd("ssh $migrate_from -C virsh list | grep -i $vm",1));
        print "We must of migrated ok... \n";
        print "(r)$migrate_to:\t$remote_test\n";

    &runcmd("crm resource unmanage clone_lvm-" . $vm);
    &runcmd("crm resource unmanage ms_drbd-" . $vm);
    sleep 1;
    &runcmd("ssh $migrate_from -C vgchange -aln drbd_" . $vm);
#    sleep 2;
    &runcmd("ssh $migrate_from -C drbdadm secondary " . $vm);
    &runcmd("ssh $migrate_from -C touch /tmp/backup.$drbd_dev");
    &runcmd("ssh $migrate_from -C touch /tmp/backup.p_drbd-$vm");
    &runcmd("touch /tmp/backup.$drbd_dev");
    &runcmd("touch /tmp/backup.p_drbd-$vm");

    my $sec_check = join("",&runcmd("drbdadm role $vm"));
    if( $sec_check !~ /Primary\/Secondary/) {
        print "Fail: DRBD res [$vm] is not the ONLY primary! result: $sec_check\n";
    } else {
        print "OK: DRBD res [$vm] is the ONLY Primary. result: $sec_check\n";
    if (!-d $backupdir) {
        mkdir $backupdir || die $!;
    if (!-d $backupdir.'.meta') {
        mkdir $backupdir . '.meta' || die $!;

    &runcmd("vgchange -c n drbd_" . $vm);
    sleep 1;
    &runcmd("vgchange -aey drbd_" . $vm);
    my $time = "_".time();
    # Try to snapshot the source if snapshot is enabled
    if ( ($opts{snapshot}) && (create_snapshot($source,$time)) ){
        print "$source seems to be a valid logical volume (LVM), a snapshot has been taken as " .
            $source . $time ."\n" if ($opts{debug});
        $source = $source.$time;
        push (@disks, {source => $source, target => $target . '_' . $time, type => 'snapshot'});
    # Summarize the list of disk to be dumped
    if ($opts{debug}){
        if ($opts{action} eq 'dump'){
            print "\n\nThe following disks will be dumped:\n\n";
            foreach $disk (@disks){
                print "Source: $disk->{source}\tDest: $backupdir/$vm" . '_' . $disk->{target} .
    if ($opts{livebackup}){
        print "\nWe can run a live backup\n" if ($opts{debug});

sub run_dump{
    # Pause VM, dump state, take snapshots etc..
    # Now, it's time to actually dump the disks
    foreach $disk (@disks){
        my $source = $disk->{source};
        my $dest = "$backupdir/$vm" . '_' . $disk->{target} . ".img$opts{compext}";
        print "\nStarting dump of $source to $dest\n\n" if ($opts{debug});
        my $ddcmd = "$opts{ionice} dd if=$source bs=$opts{blocksize} | $opts{nice} $opts{compcmd} > $dest 2>/dev/null";
        unless( system("$ddcmd") == 0 ){
            die "Couldn't dump the block device/file $source to $dest\n";
        # Remove the snapshot if the current dumped disk is a snapshot
        destroy_snapshot($source) if ($disk->{type} eq 'snapshot');
    $meta = unlink <$backupdir.meta/*>;
    rmdir "$backupdir.meta";
    print "$meta metadata files removed\n\n" if $opts{debug};

    &runcmd("ssh $migrate_from -C drbdadm primary " . $vm);
    &runcmd("ssh $migrate_from -C rm /tmp/backup.$drbd_dev");
    &runcmd("ssh $migrate_from -C rm /tmp/backup.p_drbd-$vm");
    &runcmd("rm /tmp/backup.$drbd_dev");
    &runcmd("rm /tmp/backup.p_drbd-$vm");
    sleep 1;
    &runcmd("vgchange -c y drbd_" . $vm);
    sleep 1;
    &runcmd("vgchange -ay drbd_" . $vm);
    sleep 1;

    &runcmd("crm resource manage p_drbd-" . $vm);
    &runcmd("crm resource manage ms_drbd-" . $vm);
    &runcmd("crm resource manage clone_lvm-" . $vm);
    sleep 1;

    &runcmd("crm_resource -r ms_drbd-$vm -C");
    sleep 1;
    &runcmd("crm_resource -r clone_lvm-$vm -C");
    sleep 3;
    my $prim_check = join("",&runcmd("drbdadm role $vm"));
    print "DRBD resource: $prim_check\n";
    ## if this was migrations, move it back
    if ($migration) {
        if ($prim_check =~ /primary\/primary/i) {
            ## migrate back
            my $local_test = join("",&runcmd("virsh list"));
            if ($local_test =~ /$vm.*running/i) {
                print "$vm running locally - migration to $migrate_from from $migrate_to\n";
                my $pvd = &GetPVD($vm);
                &runcmd("crm resource migrate $pvd $migrate_from");
                sleep 1;

                my$remote_test = join("",&runcmd("ssh $migrate_to -C virsh list | grep -i $vm"));
                my $status = 'unknown';
                while($local_test =~ /(.*$vm.*running)/i) {
                    if ($local_test =~ /(.*$vm.*)/i) { $status = $1; }
                    print " $migrate_to:\t" . $status . "\n";
                    print "(r)$migrate_from:\t$remote_test\n";
                    sleep 5;
                    $local_test = join("",&runcmd("virsh list",1));
                    $remote_test = join("",&runcmd("ssh $migrate_from -C virsh list | grep -i $vm",1));

                print "Migration is Done!\n";
                print "(r)$migrate_from:\t$local_test\n";
    ## done

    # And remove the lock file, unless the --keep-lock flag is present
    unlock_vm() unless ($opts{keeplock});

sub usage{
    print "usage:\n$0 --action=[dump|cleanup|chunkmount|unlock] --vm=vm1[,vm2,vm3] [--debug] [--exclude=hda,hdb] [--compress] ".
        "[--state] [--no-snapshot] [--snapsize=<size>] [--backupdir=/path/to/dir] [--connect=<URI>] ".
        "[--keep-lock] [--bs=<block size>]\n" .
    "\n\n" .
    "\t--action: What action the script will run. Valid actions are\n\n" .
    "\t\t- dump: Run the dump routine (dump disk image to temp dir, pausing the VM if needed). It's the default action\n" .
    "\t\t- unlock: just remove the lock file, but don't cleanup the backup dir\n\n" .
    "\t--vm=name: The VM you want to work on (as known by libvirt). You can backup several VMs in one shot " .
        "if you separate them with comma, or with multiple --vm argument. You have to use the name of the domain, ".
        "ID and UUID are not supported at the moment\n\n" .
    "\n\nOther options:\n\n" .
    "\t--snapsize=<snapsize>: The amount of space to use for snapshots. Use the same format as -L option of lvcreate. " .
        "eg: --snapsize=15G. Default is 5G\n\n" .
    "\t--compress[=[gzip|bzip2|pbzip2|lzop|xz|lzip|plzip]]: On the fly compress the disks images during the dump. If you " .
        "don't specify a compression algo, gzip will be used.\n\n" .
    "\t--backupdir=/path/to/backup: Use an alternate backup dir. The directory must exists and be writable. " .
        "The default is /var/lib/libvirt/backup\n\n" .
    "\t--keep-lock: Let the lock file present. This prevent another " .
        "dump to run while an third party backup software (BackupPC for example) saves the dumped files.\n\n";
# Dump the domain description as XML
sub save_drbd_res{
    my $res = shift;
    print "\nSaving XML description for $vm to $backupdir/$vm.res\n" if ($opts{debug});
    open(XML, ">$backupdir/$vm" . ".res") || die $!;
    print XML $res;
    close XML;
# Create an LVM snapshot
# Pass the original logical volume and the suffix
# to be added to the snapshot name as arguments

sub create_snapshot{
    my ($blk,$suffix) = @_;
    my $ret = 0;
    print "Running: $opts{lvcreate} -p r -s -n " . $blk . $suffix .
        " -L $opts{snapsize} $blk > /dev/null 2>&1\n" if $opts{debug};
    if ( system("$opts{lvcreate} -s -n " . $blk . $suffix .
        " -L $opts{snapsize} $blk > /dev/null 2>&1") == 0 ) {
        $ret = 1;
        open SNAPLIST, ">>$backupdir.meta/snapshots" or die "Error, couldn't open snapshot list file\n";
        print SNAPLIST $blk.$suffix ."\n";
        close SNAPLIST;
    return $ret;
# Remove an LVM snapshot
sub destroy_snapshot{
    my $ret = 0;
    my ($snap) = @_;
    print `lvs drbd_$vm`;
    print "Removing snapshot $snap\n" if $opts{debug};
    if (system ("$opts{lvremove} -f $snap > /dev/null 2>&1") == 0 ){
        $ret = 1;
    return $ret;
# Lock a VM backup dir
# Just creates an empty lock file
sub lock_vm{
    print "Locking $vm\n" if $opts{debug};
    open ( LOCK, ">$backupdir.meta/$vm.lock" ) || die $!;
    print LOCK "";
    close LOCK;
# Unlock the VM backup dir
# Just removes the lock file
sub unlock_vm{
    print "Removing lock file for $vm\n\n" if $opts{debug};
    unlink <$backupdir.meta/$vm.lock>;

sub runcmd() {
    my $cmd = shift;
    my $quiet = shift;
    my $ignore;
    ## ignore exit code 1 with greps -- not found is OK..
    if ($cmd =~ /grep/) {       $ignore = 1;    }
    if (!$quiet) { print "exec: $cmd ... ";}
    my @output = `$cmd`;
    if ($?) {
        my $e = sprintf("%d", $? >> 8);
        if ($ignore && $ignore == $e) {
            print "grep - ignore exit code $e\n";
        } else {
            printf "\n******** command $cmd exited with value %d\n", $? >> 8;
            print @output;
            exit $? >> 8;
    if (!$quiet) {    print "success\n"; }
    return @output;

## get primative VirtualDomain
sub GetPVD() {
    my $vm = shift;
    my $out = join("",&runcmd("crm resource show | grep $vm | grep VirtualDomain"));
    if ($out =~ /([\d\w\-\_]+)/) {
        return $1;
    } else {
        print "Could not locate Primative VirtualDomain for $vm\n";

# Dump the domain description as XML
sub save_xml{
    print "\nSaving XML description for $vm to $backupdir/$vm.xml\n" if ($opts{debug});
    open(XML, ">$backupdir/$vm" . ".xml") || die $!;
    print XML $dom->get_xml_description();
    close XML;

Cman/Pacemaker Notes


  • Allow UDP 5405 for message layer


  • just a basic cluster.conf config that works

<cluster name="ipa" config_version="31">
 <cman two_node="1" expected_votes="1" cluster_id="1208">
    <multicast addr=""/>

     <clusternode name="blindpig" nodeid="1">
           <method name="pcmk-redirect">
                <device name="pcmk" port="blindpig"/>

     <clusternode name="bigeye" nodeid="3">
           <method name="pcmk-redirect">
                <device name="pcmk" port="bigeye"/>


       <fencedevice name="pcmk" agent="fence_pcmk"/>

 <fence_daemon clean_start="1" post_fail_delay="10" post_join_delay="30">

 <logging to_syslog="yes" syslog_facility="local6" debug="off">



crm_mon -Qrf1

# check_crm_v0_5
# Copyright © 2011 Philip Garner, Sysnix Consultants Limited
#    This program is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    GNU General Public License for more details.
#    You should have received a copy of the GNU General Public License
#    along with this program.  If not, see <http://www.gnu.org/licenses/>.
# Authors: Phil Garner - phil@sysnix.com & Peter Mottram - peter@sysnix.com
# Acknowledgements: Vadym Chepkov, Sönke Martens
# v0.1 09/01/2011
# v0.2 11/01/2011
# v0.3 22/08/2011 - bug fix and changes suggested by Vadym Chepkov
# v0.4 23/08/2011 - update for spelling and anchor regex capture (Vadym Chepkov)
# v0.5 29/09/2011 - Add standby warn/crit suggested by Sönke Martens & removal
#                   of 'our' to 'my' to completely avoid problems with ePN
# NOTES: Requires Perl 5.8 or higher & the Perl Module Nagios::Plugin
#        Nagios user will need sudo acces - suggest adding line below to
#        sudoers
#            nagios  ALL=(ALL) NOPASSWD: /usr/sbin/crm_mon -1 -r -f
#            In sudoers if requiretty is on (off state is default)
#            you will also need to add the line below
#            Defaults:nagios !requiretty

use warnings;
use strict;
use Nagios::Plugin;

# Lines below may need changing if crm_mon or sudo installed in a
# different location.

my $sudo    = '/usr/bin/sudo';
my $crm_mon = '/usr/sbin/crm_mon';

my $np = Nagios::Plugin->new(
    shortname => 'check_crm',
    version   => '0.5',
    usage     => "Usage: %s <ARGS>\n\t\t--help for help\n",

    spec => 'warning|w',
    help =>
'If failed Nodes, stopped Resources detected or Standby Nodes sends Warning instead of Critical (default) as long as there are no other errors and there is Quorum',
    required => 0,

    spec     => 'standbyignore|s',
    help     => 'Ignore any node(s) in standby, by default sends Critical',
    required => 0,


my @standby;

# Check for -w option set warn if this is case instead of crit
my $warn_or_crit = 'CRITICAL';
$warn_or_crit = 'WARNING' if $np->opts->warning;

my $fh;

open( $fh, "$sudo $crm_mon -1 -r -f|" )
  or $np->nagios_exit( CRITICAL, "Running sudo has failed" );

foreach my $line (<$fh>) {

    if ( $line =~ m/Connection to cluster failed\:(.*)/i ) {

        # Check Cluster connected
        $np->nagios_exit( CRITICAL, "Connection to cluster FAILED: $1" );
    elsif ( $line =~ m/Current DC:/ ) {

        # Check for Quorum
        if ( $line =~ m/partition with quorum$/ ) {

            # Assume cluster is OK - we only add warn/crit after here

            $np->add_message( OK, "Cluster OK" );
        else {
            $np->add_message( CRITICAL, "No Quorum" );
    elsif ( $line =~ m/^offline:\s*\[\s*(\S.*?)\s*\]/i ) {
        next if $line =~ /\/dev\/block\//i;
        # Count offline nodes
        my @offline = split( /\s+/, $1 );
        my $numoffline = scalar @offline;
        $np->add_message( $warn_or_crit, ": $numoffline Nodes Offline" );
    elsif ( $line =~ m/^node\s+(\S.*):\s*standby/i ) {

        # Check for standby nodes (suggested by Sönke Martens)
        # See later in code for message created from this
        push @standby, $1;

    elsif ( $line =~ m/\s*([\w-]+)\s+\(\S+\)\:\s+Stopped/ ) {
        #next if $line =~ /hopvpn/i;
        # Check Resources Stopped
        $np->add_message( $warn_or_crit, ": $1 Stopped" );
    elsif ( $line =~ m/\s*stopped\:\s*\[(.*)\]/i ) {
        next if $line =~ /openvz/i;
        # Check Master/Slave stopped
        $np->add_message( $warn_or_crit, ": $1 Stopped" );
    elsif ( $line =~ m/^Failed actions\:/ ) {
        # Check Failed Actions
        ### rob fix this
        $np->add_message( CRITICAL,
            ": FAILED actions detected or not cleaned up" );
    elsif (
        $line =~ m/\s*(\S+?)\s+ \(.*\)\:\s+\w+\s+\w+\s+\(unmanaged\)\s+FAILED/ )

        # Check Unmanaged
        $np->add_message( CRITICAL, ": $1 unmanaged FAILED" );
    elsif ( $line =~ m/\s*(\S+?)\s+ \(.*\)\:\s+not installed/i ) {
        # Check for errors
        $np->add_message( CRITICAL, ": $1 not installed" );
    elsif ( $line =~ m/\s*(\S+?):.*(fail-count=\d+)/i ) {
        my $one = $1;
        my $two = $2;

        if (-f "/tmp/backup.$1") {      last;  }

        $np->add_message( WARNING, ": $1 failure detected, $2" );

# If found any Nodes in standby & no -s option used send warn/crit
if ( scalar @standby > 0 && !$np->opts->standbyignore ) {
    $np->add_message( $warn_or_crit,
        ": " . join( ', ', @standby ) . " in Standby" );

close($fh) or $np->nagios_exit( CRITICAL, "Running crm_mon FAILED" );

$np->nagios_exit( $np->check_messages() );

Troubleshooting Tips

ocf:heartbeat:LVM patch


  • When deactivating multiple clustered (maybe non-cluster too) at once, sometimes vgchange -a ln fails.
Instead We should continue trying until pacemaker calls it quits (logic used from linbits drbd RA)
  • vgchange -a ln fails, then the operation completes with a failure

<source lang=text>

      lrmd:   notice: operation_finished:      p_lvm-vhosts_stop_0:845 [ 2013/04/17_23:48:03 INFO: Deactivating volume group drbd_vhosts ]
      lrmd:   notice: operation_finished:      p_lvm-vhosts_stop_0:845 [ 2013/04/17_23:48:03 ERROR: Can't deactivate volume group "drbd_vhosts" with 1 open logical volume(s) ]
      crmd:   notice: process_lrm_event:       LRM operation p_lvm-vhosts_stop_0 (call=518, rc=1, cib-update=113, confirmed=true) unknown error


success: with patch
  • with the patch - it will try again and succeed

<source lang=text>

      lrmd:   notice: operation_finished:       p_lvm-vhosts_stop_0:24355 [ 2013/04/18_11:05:31 INFO: Deactivating volume group drbd_vhosts ]
      lrmd:   notice: operation_finished:       p_lvm-vhosts_stop_0:24355 [ 2013/04/18_11:05:31 ERROR: Can't deactivate volume group "drbd_vhosts" with 1 open logical volume(s) ]
      lrmd:   notice: operation_finished:       p_lvm-vhosts_stop_0:24355 [ 2013/04/18_11:05:31 WARNING: drbd_vhosts still Active, Deactivating volume group drbd_vhosts. ]
      lrmd:   notice: operation_finished:       p_lvm-vhosts_stop_0:24355 [ 2013/04/18_11:05:31 INFO: Deactivating volume group drbd_vhosts ]
      lrmd:   notice: operation_finished:       p_lvm-vhosts_stop_0:24355 [ 2013/04/18_11:05:31 INFO: 0 logical volume(s) in volume group "drbd_vhosts" now active ]
      crmd:   notice: process_lrm_event:        LRM operation p_lvm-vhosts_stop_0 (call=963, rc=0, cib-update=200, confirmed=true) ok



<source> --- /usr/lib/ocf/resource.d/heartbeat/LVM.orig 2013-04-18 10:08:57.333596804 -0700 +++ /usr/lib/ocf/resource.d/heartbeat/LVM 2013-04-18 10:36:39.741388039 -0700 @@ -229,24 +229,34 @@

#      Disable the LVM volume
LVM_stop() {

+ local first_try=true + rc=$OCF_ERR_GENERIC

  vgdisplay "$1" 2>&1 | grep 'Volume group .* not found' >/dev/null && {
    ocf_log info "Volume group $1 not found"
    return 0

- ocf_log info "Deactivating volume group $1" - ocf_run vgchange -a ln $1 || return 1

- if - LVM_status $1 - then - ocf_log err "LVM: $1 did not stop correctly" - return $OCF_ERR_GENERIC - fi + # try to deactivate first time + ocf_log info "Deactivating volume group $1" + ocf_run vgchange -a ln $1

- # TODO: This MUST run vgexport as well + # Keep trying to bring down the resource; + # wait for the CRM to time us out if this fails + while :; do + if LVM_status $1; then + ocf_log warn "$1 still Active, Deactivating volume group $1." + ocf_log info "Deactivating volume group $1" + ocf_run vgchange -a ln $1 + else + rc=$OCF_SUCCESS + break; + fi + $first_try || sleep 1 + first_try=false + done

- return $OCF_SUCCESS + return $rc



CMAN+CLVMD+Pacemaker 1.1.8+

  • Pacemaker now starts and stop CMAN. Issue is that it doesn't account for CLVMD
  • Fix INIT script to also start/stop CLVMD

<source> --- pacemaker.orig 2013-04-15 12:40:53.085307309 -0700 +++ pacemaker 2013-04-16 10:17:10.359833467 -0700 @@ -119,6 +119,9 @@


+ ## stop clvmd before leaving fence domain + [ -f /etc/rc.d/init.d/clvmd ] && service clvmd stop +

    echo -n "Leaving fence domain"
    fence_tool leave -w 10

@@ -163,6 +166,7 @@

       # For consistency with stop
       [ -f /etc/rc.d/init.d/cman ] && service cman start

+ [ -f /etc/rc.d/init.d/clvmd ] && service clvmd start



Live Migrations Fail

error: Unsafe migration: Migration may lead to data corruption if disks use cache != none
  • Make sure you set your KVM disk cache to none

Verify you can with virsh first


virsh migrate --live <hostname> qemu+ssh://<other_server_name>/system

</source> <source>

virsh migrate --live vhosts-ha qemu+ssh://blindpig/system


Pacemaker 1.1.8 / libvirt-0.10.2-18

  • Live migration fails going standby / online
Libvirt Fix (migrate-setspeed)
  • libvirt - migrate-setspeed (this is not persistent though across service restarts)
  • persistent fix: edit /etc/init.d/libvirtd

<source> --- libvirtd 2013-04-16 09:28:53.257824206 -0700 +++ libvirtd.orig 2013-04-16 10:25:31.358915941 -0700 @@ -85,22 +85,6 @@

    [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$SERVICE

- - ## hook to set bandwidth for live migration - BW=50 - VIRSH=`which virsh` - LIST_VM=`virsh list --all | grep -v Name | awk '{print $2}' | egrep "\w"` - DATE=`date -R` - LOGFILE="/var/log/kvm_setspeed.log" - for vm in $LIST_VM - do - BWprev=`/usr/bin/virsh migrate-getspeed $vm` - /usr/bin/virsh migrate-setspeed --bandwidth $BW $vm > /dev/null - BWcur=`/usr/bin/virsh migrate-getspeed $vm` - echo "$DATE : $VIRSH migrate-setspeed --bandwidth $BW $vm [cur: $BWcur -- prev: $BWprev]" >> $LOGFILE - - done - # end BW hook


stop() {


Pacemaker Fix (migration-limit)
  • Pacemaker - set the migration-limit (default -1 unlimited)

<source> crm_attribute --attr-name migration-limit --attr-value 2 crm_attribute --attr-name migration-limit --get-value scope=crm_config name=migration-limit value=2 <source>

virsh commands used
  • setting this to 30mb/s per Virt

<source> virsh migrate-setspeed --bandwidth 30 <VIRTNAME> virsh migrate-getspeed <VIRTNAME> </source> <source>

  1. Default is infinite

virsh migrate-getspeed vhosts-ha 8796093022207

  1. set the speed to 30MB/s

virsh migrate-setspeed --bandwidth 30 vhosts-ha

  1. now it's limited

virsh migrate-getspeed vhosts-ha 30 </source>

pacemaker config - dump

  • This has other examples - not just drbd/kvm
node bigeye \
        attributes standby="off"
node blindpig \
        attributes standby="off"
primitive p_cluster_mon ocf:pacemaker:ClusterMon \
        params pidfile="/var/run/crm_mon.pid" htmlfile="/var/www/html/index.html" \
        op start interval="0" timeout="20s" \
        op stop interval="0" timeout="20s" \
        op monitor interval="10s" timeout="20s"
primitive p_drbd-backuppc ocf:linbit:drbd \
        params drbd_resource="backuppc" \
        operations $id="p_drbd_backuppc-operations" \
        op monitor interval="20" role="Slave" timeout="20" \
        op monitor interval="10" role="Master" timeout="20" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" start-delay="0"
primitive p_drbd-dogfish-ha ocf:linbit:drbd \
        params drbd_resource="dogfish-ha" \
        operations $id="p_drbd_dogfish-ha-operations" \
        op monitor interval="20" role="Slave" timeout="20" \
        op monitor interval="10" role="Master" timeout="20" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" start-delay="0"
primitive p_drbd-hopmon ocf:linbit:drbd \
        params drbd_resource="hopmon" \
        operations $id="p_drbd_hopmon-operations" \
        op monitor interval="20" role="Slave" timeout="20" \
        op monitor interval="10" role="Master" timeout="20" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" start-delay="0"
primitive p_drbd-hoptical ocf:linbit:drbd \
        params drbd_resource="hoptical" \
        operations $id="p_drbd_hoptical-operations" \
        op monitor interval="20" role="Slave" timeout="20" \
        op monitor interval="10" role="Master" timeout="20" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" start-delay="0"
primitive p_drbd-hopvpn ocf:linbit:drbd \
        params drbd_resource="hopvpn" \
        operations $id="p_drbd_hopvpn-operations" \
        op monitor interval="20" role="Slave" timeout="30" \
        op monitor interval="10" role="Master" timeout="30" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" start-delay="0"
primitive p_drbd-musicbrainz ocf:linbit:drbd \
        params drbd_resource="musicbrainz" \
        operations $id="p_drbd_musicbrainz-operations" \
        op monitor interval="20" role="Slave" timeout="20" \
        op monitor interval="10" role="Master" timeout="20" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" start-delay="0"
primitive p_drbd-spacewalk ocf:linbit:drbd \
        params drbd_resource="spacewalk" \
        operations $id="p_drbd_spacewalk-operations" \
        op monitor interval="20" role="Slave" timeout="20" \
        op monitor interval="10" role="Master" timeout="20" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="240" start-delay="0"
primitive p_drbd-vhosts ocf:linbit:drbd \
        params drbd_resource="vhosts" \
        operations $id="p_drbd_vhosts-operations" \
        op monitor interval="20" role="Slave" timeout="20" \
        op monitor interval="10" role="Master" timeout="20" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" start-delay="0"
primitive p_drbd-vz ocf:linbit:drbd \
        params drbd_resource="vz" \
        operations $id="p_drbd_vz-operations" \
        op monitor interval="20" role="Slave" timeout="20" \
        op monitor interval="10" role="Master" timeout="20" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" start-delay="0"
primitive p_drbd-win7 ocf:linbit:drbd \
        params drbd_resource="win7" \
        operations $id="p_drbd_win7-operations" \
        op monitor interval="20" role="Slave" timeout="20" \
        op monitor interval="10" role="Master" timeout="20" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" start-delay="0"
primitive p_gfs2-vz-config ocf:heartbeat:Filesystem \
        params device="/dev/mapper/vg_drbd_vz-gfs_vz_config" directory="/etc/vz" fstype="gfs2" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        op monitor interval="120s"
primitive p_gfs2-vz-storage ocf:heartbeat:Filesystem \
        params device="/dev/mapper/vg_drbd_vz-gfs_vz_storage" directory="/vz" fstype="gfs2" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        op monitor interval="120s"
primitive p_ip- ocf:heartbeat:IPaddr2 \
        params ip="" cidr_netmask="32" nic="lo" \
        meta target-role="Started"
primitive p_ip- ocf:heartbeat:IPaddr2 \
        params ip="" cidr_netmask="32" nic="lo" \
        meta target-role="Started"
primitive p_lvm-backuppc ocf:heartbeat:LVM \
        operations $id="backuppc-LVM-operations" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        params volgrpname="drbd_backuppc"
primitive p_lvm-dogfish-ha ocf:heartbeat:LVM \
        operations $id="dogfish-ha-LVM-operations" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        params volgrpname="drbd_dogfish-ha"
primitive p_lvm-hopmon ocf:heartbeat:LVM \
        operations $id="hopmon-LVM-operations" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        params volgrpname="drbd_hopmon"
primitive p_lvm-hoptical ocf:heartbeat:LVM \
        operations $id="hoptical-LVM-operations" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        params volgrpname="drbd_hoptical"
primitive p_lvm-hopvpn ocf:heartbeat:LVM \
        operations $id="hopvpn-LVM-operations" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        params volgrpname="drbd_hopvpn"
primitive p_lvm-musicbrainz ocf:heartbeat:LVM \
        operations $id="musicbrainz-LVM-operations" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        params volgrpname="drbd_musicbrainz"
primitive p_lvm-spacewalk ocf:heartbeat:LVM \
        operations $id="spacewalk-LVM-operations" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        params volgrpname="drbd_spacewalk"
primitive p_lvm-vhosts ocf:heartbeat:LVM \
        operations $id="vhosts-LVM-operations" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        params volgrpname="drbd_vhosts"
primitive p_lvm-vz ocf:heartbeat:LVM \
        operations $id="vz-LVM-operations" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        params volgrpname="vg_drbd_vz"
primitive p_lvm-win7 ocf:heartbeat:LVM \
        operations $id="win7-LVM-operations" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
        params volgrpname="drbd_win7"
primitive p_vd-backuppc-ha ocf:heartbeat:VirtualDomain \
        params config="/etc/libvirt/qemu/backuppc-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \
        operations $id="p_vd-backuppc-operations" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="90" \
        op migrate_from interval="0" timeout="240" \
        op migrate_to interval="0" timeout="240" \
        op monitor interval="10" timeout="30" start-delay="0" \
        meta allow-migrate="true" failure-timeout="3min" target-role="Started" is-managed="true" resource-stickiness="100"
primitive p_vd-dogfish-ha ocf:heartbeat:VirtualDomain \
        params config="/etc/libvirt/qemu/dogfish-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \
        operations $id="p_vd-dogfish-ha-operations" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="90" \
        op migrate_from interval="0" timeout="600" \
        op migrate_to interval="0" timeout="600" \
        op monitor interval="10" timeout="30" start-delay="10" \
        meta allow-migrate="true" failure-timeout="3min" target-role="Started" resource-stickiness="100" is-managed="true"
primitive p_vd-hopmon-ha ocf:heartbeat:VirtualDomain \
        params config="/etc/libvirt/qemu/hopmon-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \
        operations $id="p_vd-hopmon-operations" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="90" \
        op migrate_from interval="0" timeout="240" \
        op migrate_to interval="0" timeout="240" \
        op monitor interval="10" timeout="30" start-delay="10" \
        meta allow-migrate="true" failure-timeout="3min" target-role="Started"
primitive p_vd-hoptical-ha ocf:heartbeat:VirtualDomain \
        params config="/etc/libvirt/qemu/hoptical-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \
        operations $id="p_vd-hoptical-operations" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="90" \
        op migrate_from interval="0" timeout="240" \
        op migrate_to interval="0" timeout="240" \
        op monitor interval="10" timeout="30" start-delay="10" \
        meta allow-migrate="true" failure-timeout="3min" target-role="Started" resource-stickiness="100"
primitive p_vd-hopvpn-ha ocf:heartbeat:VirtualDomain \
        params config="/etc/libvirt/qemu/hopvpn-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \
        operations $id="p_vd-hopvpn-operations" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="90" \
        op migrate_from interval="0" timeout="240" \
        op migrate_to interval="0" timeout="240" \
        op monitor interval="10" timeout="30" start-delay="10" \
        meta allow-migrate="true" failure-timeout="3min" target-role="Started" resource-stickiness="100" is-managed="true"
primitive p_vd-musicbrainz-ha ocf:heartbeat:VirtualDomain \
        params config="/etc/libvirt/qemu/musicbrainz-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \
        operations $id="p_vd-musicbrainz-operations" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="90" \
        op migrate_from interval="0" timeout="240" \
        op migrate_to interval="0" timeout="240" \
        op monitor interval="10" timeout="30" start-delay="10" \
        meta allow-migrate="true" failure-timeout="3min" target-role="Started" resource-stickiness="100"
primitive p_vd-spacewalk-ha ocf:heartbeat:VirtualDomain \
        params config="/etc/libvirt/qemu/spacewalk-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \
        operations $id="p_vd-spacewalk-operations" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="90" \
        op migrate_from interval="0" timeout="240" \
        op migrate_to interval="0" timeout="240" \
        op monitor interval="10" timeout="30" start-delay="0" \
        meta allow-migrate="true" failure-timeout="10min" target-role="Started" is-managed="true"
primitive p_vd-vhosts-ha ocf:heartbeat:VirtualDomain \
        params config="/etc/libvirt/qemu/vhosts-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \
        operations $id="p_vd-vhosts-ha-operations" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="90" \
        op migrate_from interval="0" timeout="240" \
        op migrate_to interval="0" timeout="240" \
        op monitor interval="10" timeout="30" start-delay="10" \
        meta allow-migrate="true" failure-timeout="3min" target-role="Started" resource-stickiness="100"
primitive p_vd-win7-ha ocf:heartbeat:VirtualDomain \
        params config="/etc/libvirt/qemu/win7-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \
        operations $id="p_vd-win7-operations" \
        op start interval="0" timeout="90" \
        op stop interval="0" timeout="90" \
        op migrate_from interval="0" timeout="600" \
        op migrate_to interval="0" timeout="600" \
        op monitor interval="10" timeout="30" start-delay="0" \
        meta allow-migrate="true" failure-timeout="3min" target-role="Started"
primitive st_bigeye stonith:fence_drac5 \
        params ipaddr="<hidden>" login="cman" passwd="<hidden>" action="reboot" secure="true" pcmk_host_list="bigeye" pcmk_host_check="static-list"
primitive st_blindpig stonith:fence_apc_snmp \
        params inet4_only="1" community="<hidden>" port="blindpig" action="reboot" ipaddr="<hidden>" snmp_version="1" pcmk_host_check="static-list" pcmk_host_list="blindpig" pcmk_host_map="blindpig:6"
ms ms_drbd-backuppc p_drbd-backuppc \
        meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true"
ms ms_drbd-dogfish-ha p_drbd-dogfish-ha \
        meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true"
ms ms_drbd-hopmon p_drbd-hopmon \
        meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true"
ms ms_drbd-hoptical p_drbd-hoptical \
        meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true"
ms ms_drbd-hopvpn p_drbd-hopvpn \
        meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true"
ms ms_drbd-musicbrainz p_drbd-musicbrainz \
        meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true"
ms ms_drbd-spacewalk p_drbd-spacewalk \
        meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true"
ms ms_drbd-vhosts p_drbd-vhosts \
        meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true"
ms ms_drbd-vz p_drbd-vz \
        meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true"
ms ms_drbd-win7 p_drbd-win7 \
        meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true"
clone c_cluster_mon p_cluster_mon \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
clone c_st_bigeye st_bigeye \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
clone c_st_blindpig st_blindpig \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
clone clone_gfs2-vz-config p_gfs2-vz-config \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
clone clone_gfs2-vz-storage p_gfs2-vz-storage \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
clone clone_lvm-backuppc p_lvm-backuppc \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
clone clone_lvm-dogfish-ha p_lvm-dogfish-ha \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
clone clone_lvm-hopmon p_lvm-hopmon \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
clone clone_lvm-hoptical p_lvm-hoptical \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
clone clone_lvm-hopvpn p_lvm-hopvpn \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
clone clone_lvm-musicbrainz p_lvm-musicbrainz \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
clone clone_lvm-spacewalk p_lvm-spacewalk \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
clone clone_lvm-vhosts p_lvm-vhosts \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
clone clone_lvm-vz p_lvm-vz \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
clone clone_lvm-win7 p_lvm-win7 \
        meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true"
location cli-prefer-p_cluster_mon c_cluster_mon \
        rule $id="cli-prefer-rule-p_cluster_mon" inf: #uname eq bigeye
location cli-prefer-p_ip- p_ip- \
        rule $id="cli-prefer-rule-p_ip-" inf: #uname eq blindpig
location cli-prefer-p_ip- p_ip- \
        rule $id="cli-prefer-rule-p_ip-" inf: #uname eq bigeye
location cli-prefer-p_vd-backuppc-ha p_vd-backuppc-ha \
        rule $id="cli-prefer-rule-p_vd-backuppc-ha" inf: #uname eq blindpig
location cli-prefer-p_vd-dogfish-ha p_vd-dogfish-ha \
        rule $id="cli-prefer-rule-p_vd-dogfish-ha" inf: #uname eq blindpig
location cli-prefer-p_vd-hopmon-ha p_vd-hopmon-ha \
        rule $id="cli-prefer-rule-p_vd-hopmon-ha" inf: #uname eq bigeye
location cli-prefer-p_vd-hoptical-ha p_vd-hoptical-ha \
        rule $id="cli-prefer-rule-p_vd-hoptical-ha" inf: #uname eq bigeye
location cli-prefer-p_vd-hopvpn-ha p_vd-hopvpn-ha \
        rule $id="cli-prefer-rule-p_vd-hopvpn-ha" inf: #uname eq bigeye
location cli-prefer-p_vd-musicbrainz-ha p_vd-musicbrainz-ha \
        rule $id="cli-prefer-rule-p_vd-musicbrainz-ha" inf: #uname eq blindpig
location cli-prefer-p_vd-spacewalk-ha p_vd-spacewalk-ha \
        rule $id="cli-prefer-rule-p_vd-spacewalk-ha" inf: #uname eq blindpig
location cli-prefer-p_vd-vhosts-ha p_vd-vhosts-ha \
        rule $id="cli-prefer-rule-p_vd-vhosts-ha" inf: #uname eq bigeye
location cli-prefer-p_vd-win7-ha p_vd-win7-ha \
        rule $id="cli-prefer-rule-p_vd-win7-ha" inf: #uname eq blindpig
location drbd_backuppc_excl ms_drbd-backuppc \
        rule $id="drbd_backuppc_excl-rule" -inf: #uname eq blindpig2
location drbd_dogfish-ha_excl ms_drbd-dogfish-ha \
        rule $id="drbd_dogfish-ha_excl-rule" -inf: #uname eq blindpig2
location drbd_hopmon_excl ms_drbd-hopmon \
        rule $id="drbd_hopmon_excl-rule" -inf: #uname eq blindpig2
location drbd_hoptical_excl ms_drbd-hoptical \
        rule $id="drbd_hoptical_excl-rule" -inf: #uname eq blindpig2
location drbd_hopvpn_excl ms_drbd-hopvpn \
        rule $id="drbd_hopvpn_excl-rule" -inf: #uname eq blindpig2
location drbd_musicbrainz_excl ms_drbd-musicbrainz \
        rule $id="drbd_musicbrainz_excl-rule" -inf: #uname eq blindpig2
location drbd_vhost_excl ms_drbd-vhosts \
        rule $id="drbd_vhosts_excl-rule" -inf: #uname eq blindpig2
colocation c_gfs-vz-config_on_master inf: clone_gfs2-vz-config ms_drbd-vz:Master
colocation c_gfs-vz-storage_on_master inf: clone_gfs2-vz-storage ms_drbd-vz:Master
colocation c_lvm-backuppc_on_drbd-backuppc inf: clone_lvm-backuppc ms_drbd-backuppc:Master
colocation c_lvm-dogfish-ha_on_drbd-dogfish-ha inf: clone_lvm-dogfish-ha ms_drbd-dogfish-ha:Master
colocation c_lvm-hopmon_on_drbd-hopmon inf: clone_lvm-hopmon ms_drbd-hopmon:Master
colocation c_lvm-hoptical_on_drbd-hoptical inf: clone_lvm-hoptical ms_drbd-hoptical:Master
colocation c_lvm-hopvpn_on_drbd-hopvpn inf: clone_lvm-hopvpn ms_drbd-hopvpn:Master
colocation c_lvm-musicbrainz_on_drbd-musicbrainz inf: clone_lvm-musicbrainz ms_drbd-musicbrainz:Master
colocation c_lvm-spacewalk_on_drbd-spacewalk inf: clone_lvm-spacewalk ms_drbd-spacewalk:Master
colocation c_lvm-vhosts_on_drbd-vhosts inf: clone_lvm-vhosts ms_drbd-vhosts:Master
colocation c_lvm-vz_on_drbd-vz inf: clone_lvm-vz ms_drbd-vz:Master
colocation c_lvm-win7_on_drbd-win7 inf: clone_lvm-win7 ms_drbd-win7:Master
colocation c_vd-backuppc-on-master inf: p_vd-backuppc-ha ms_drbd-backuppc:Master
colocation c_vd-dogfish-ha-on-master inf: p_vd-dogfish-ha ms_drbd-dogfish-ha:Master
colocation c_vd-hopmon-on-master inf: p_vd-hopmon-ha ms_drbd-hopmon:Master
colocation c_vd-hoptical-on-master inf: p_vd-hoptical-ha ms_drbd-hoptical:Master
colocation c_vd-hopvpn-on-master inf: p_vd-hopvpn-ha ms_drbd-hopvpn:Master
colocation c_vd-musicbrainz-on-master inf: p_vd-musicbrainz-ha ms_drbd-musicbrainz:Master
colocation c_vd-spacewalk-on-master inf: p_vd-spacewalk-ha ms_drbd-spacewalk:Master
colocation c_vd-vhosts-on-master inf: p_vd-vhosts-ha ms_drbd-vhosts:Master
colocation c_vd-win7-on-master inf: p_vd-win7-ha ms_drbd-win7:Master
order o_drbm-lvm-gfs2-vz-config-storage inf: ms_drbd-vz:promote clone_lvm-vz:start clone_gfs2-vz-config:start clone_gfs2-vz-storage:start
order o_drbm-lvm-vd-start-backuppc inf: ms_drbd-backuppc:promote clone_lvm-backuppc:start p_vd-backuppc-ha:start
order o_drbm-lvm-vd-start-dogfish-ha inf: ms_drbd-dogfish-ha:promote clone_lvm-dogfish-ha:start p_vd-dogfish-ha:start
order o_drbm-lvm-vd-start-hopmon inf: ms_drbd-hopmon:promote clone_lvm-hopmon:start p_vd-hopmon-ha:start
order o_drbm-lvm-vd-start-hoptical inf: ms_drbd-hoptical:promote clone_lvm-hoptical:start p_vd-hoptical-ha:start
order o_drbm-lvm-vd-start-hopvpn inf: ms_drbd-hopvpn:promote clone_lvm-hopvpn:start p_vd-hopvpn-ha:start
order o_drbm-lvm-vd-start-musicbrainz inf: ms_drbd-musicbrainz:promote clone_lvm-musicbrainz:start p_vd-musicbrainz-ha:start
order o_drbm-lvm-vd-start-spacewalk inf: ms_drbd-spacewalk:promote clone_lvm-spacewalk:start p_vd-spacewalk-ha:start
order o_drbm-lvm-vd-start-vhosts inf: ms_drbd-vhosts:promote clone_lvm-vhosts:start p_vd-vhosts-ha:start
order o_drbm-lvm-vd-start-win7 inf: ms_drbd-win7:promote clone_lvm-win7:start p_vd-win7-ha:start
order o_gfs_before_openvz inf: _rsc_set_ clone_gfs2-vz-config clone_gfs2-vz-storage
property $id="cib-bootstrap-options" \
        dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \
        cluster-infrastructure="cman" \
        expected-quorum-votes="2" \
        stonith-enabled="true" \
        no-quorum-policy="ignore" \
        default-resource-stickiness="1" \
        last-lrm-refresh="1362432862" \
rsc_defaults $id="rsc-options" \
        resource-stickiness="1" \

Other References



