2 Node Cluster: Dual Primary DRBD + CLVM + KVM + Live Migrations
RARforge :: 2 Node Cluster: Dual Primary DRBD + CLVM + KVM + Live Migrations |
Info[edit]
a brain dump...
OS Details[edit]
- OS Centos 6.3
- Packages listed here is a base ( others will be installed )
- pacemaker-1.1.7-6.el6.x86_64
- cman-3.0.12.1-32.el6_3.2.x86_64
- corosync-1.4.1-7.el6_3.1.x86_64
- drbd83-utils-8.3.15-1.el6.elrepo.x86_64
- libvirt-0.9.10-21.el6_3.8.x86_64
Hardware[edit]
LVM[edit]
- my naming conventions are not great (vg and lv named the same) this was a work in progress..
- PV -> VG -> LV -> DRBD PV -> VG (CLVM) > LV [ Raw KVM Image ]
- PV: /dev/md10 -> VG: raid10 -> LV: drbd_spacewalk -> PV: /dev/drbd9 -> VG: drbd_spacewalk -> LV: spacewalk -- spacewalk-ha kvm
- node1
- PV: /dev/md10
- VG: raid10
- node2
- PV: /dev/sdb1
- VG: raid1
Dual Primary DRBD/KVM Virt Install[edit]
New KVM Virt - Details[edit]
- NewVirt: spacewalk
- SIZE: 20GB
- DRBD res: 8
- NODE1
- IP: 10.69.1.253
- Name: bigeye
- VG: raid1
- NODE2
- IP: 10.69.1.250
- Name: blindpig
- VG: raid10
- KVM DISK cache setting: none
Creating the Dual Primary DRBD KVM Virt[edit]
- Run this on NODE1
1) create LVM for DRBD device <source lang="bash">
lvcreate --name drbd_spacewalk --size 21.1GB raid1 ssh 10.69.1.250 -C lvcreate --name drbd_spacewalk --size 21.1GB raid10
</source> 2) copy spacewalk.res to /etc/drbd.d/ <source lang="bash">
cp spacewalk.res /etc/drbd.d/ scp spacewalk.res 10.69.1.250:/etc/drbd.d/
</source> 3) reloading drbd <source lang="bash">
/etc/init.d/drbd reload ssh 10.69.1.250 -C /etc/init.d/drbd reload
</source> 4) create DRBD device on both nodes <source lang="bash">
drbdadm -- --force create-md spacewalk ssh 10.69.1.250 -C drbdadm -- --force create-md spacewalk
</source> 5) reloading drbd <source lang="bash">
/etc/init.d/drbd reload ssh 10.69.1.250 -C /etc/init.d/drbd reload
</source> 6) bring drbd up on both nodes <source lang="bash">
drbdadm up spacewalk ssh 10.69.1.250 -C drbdadm up spacewalk
</source> 7) set bigeye primary and overwrite blindpig <source lang="bash">
drbdadm -- --overwrite-data-of-peer primary spacewalk
</source> 8) set blindpig secondary (should already be set) <source lang="bash">
ssh 10.69.1.250 -C drbdadm secondary spacewalk
</source> 9) bigeye create PV/VG/LV (not setting VG to cluster aware yet due to LVM bug not using --monitor y) <source lang="bash">
pvcreate /dev/drbd9 vgcreate -c n drbd_spacewalk /dev/drbd9 lvcreate -L20G -nspacewalk drbd_spacewalk
</source> 10) Activating VG drbd_spacewalk -- (should already be, but just incase) <source lang="bash">
vgchange -a y drbd_spacewalk
</source> 11) create the POOL in virsh <source lang="bash">
virsh pool-create-as drbd_spacewalk --type=logical --target=/dev/drbd_spacewalk
</source>
12a) If this is NEW kvm install - continue following - else go to step 12b
- 1. Install new virt on bigeye:/dev/drbd_spacewalk/spacewalk named spacewalk-ha
- 2. After installed and rebooted - scp virt definition and define
<source lang="bash">
scp /etc/libvirt/qemu/spacewalk-ha.xml 10.69.1.250:/etc/libvirt/qemu/spacewalk-ha.xml ssh 10.69.1.250 -C virsh define /etc/libvirt/qemu/spacewalk-ha.xml
</source>
- 3. Linux? Test virsh shutdown (may need to install acpid)
<source lang="bash">
virsh shutdown -ha
</source>
- 4. SKIP step 12b (go to #13)
12b) If this is a migration from an exsiting KVM virt - continue, else skip this (ONLY if you completed 12a)
- 1. restore your KVM/LVM to the new LV: of=/dev/drbd_spacewalk/spacewalk bs=1M
<source lang="bash">
command: dd if=<your image files.img> of=/dev/drbd_spacewalk/spacewalk bs=1M
</source>
- 2. Edit the exists KVM xml file -- copy the existing file to edit
<source lang="bash"> cp /etc/libvirt/qemu/spacewalk.xml ./spacewalk-ha.xml </source>
#-modify: <name>spacewalk</name> to <name>spacewalk-ha</name> #-remove: <uuid>[some long uuid]</uuid>
<source lang="bash">
emacs spacewalk-ha.xml cp spacewalk-ha.xml /etc/libvirt/qemu/spacewalk-ha.xml # this will setup a uniuq UUID, which is needed before you copy to blindpig virsh define /etc/libvirt/qemu/spacewalk-ha.xml scp /etc/libvirt/qemu/spacewalk-ha.xml 10.69.1.250:/etc/libvirt/qemu/spacewalk-ha.xml ssh 10.69.1.250 -C virsh define /etc/libvirt/qemu/spacewalk-ha.xml
</source>
- All install work is done. deactivate VG / set cluster aware / and down drbd for pacemaker provisioning
13) deactivate VG drbd_spacewalk on blindpig <source lang="bash">
vgchange -a n drbd_spacewalk
</source> 14) set drbd primary on blindpig to set VG cluster aware <source lang="bash">
vgchange -a n drbd_spacewalk ssh 10.69.1.250 -C drbdadm primary spacewalk
</source> 15) activate VG on both nodes <source lang="bash">
vgchange -a y drbd_spacewalk ssh 10.69.1.250 -C vgchange -a y drbd_spacewalk
</source> 16) set VG cluster aware on both nodes (only one command is needed due to drbd) <source lang="bash">
vgchange -c y drbd_spacewalk
</source> 17) deactivate VG <source lang="bash">
vgchange -a n drbd_spacewalk ssh 10.69.1.250 -C vgchange -a n drbd_spacewalk
</source> 18) down drbd on both - so we can put it in pacemaker <source lang="bash">
drbdadm down spacewalk ssh 10.69.1.250 -C drbdadm down spacewalk
</source>
- Now lets provision Pacemaker -- we already expect you have a working pacemaker config with DLM/CLVM
19) Load the dual primary drbd/lvm RA config to the cluster <source lang="bash">
crm configure < spacewalk.crm
</source>
20) verify all is good with crm_mon: DRBD should look like something below <source lang="bash">
crm_mon -f Master/Slave Set: ms_drbd-spacewalk [p_drbd-spacewalk] Masters: [ blindpig blindpig ]
</source>
21) Load the VirtualDomain RA confi to the cluster <source lang="bash">
crm configure < spacewalk-vd.crm
</source>
- Files Created
- spacewalk.res # for DRBD
- spacewalk.crm # DRBD/LVM configs to load into crm configure
- spacewalk-vd.crm # KVM VirtualDomain configs to load into crm configure
Config Examples[edit]
Pacemaker / crmsh[edit]
DRBD/LVM[edit]
* Note - we do not monitor LVM. Sometimes LVM command hang and are not really an issue.. * these are all auto created from the script below <pre> primitive p_drbd-spacewalk ocf:linbit:drbd \ params drbd_resource="spacewalk" \ operations $id="p_drbd_spacewalk-operations" \ op monitor interval="20" role="Slave" timeout="20" \ op monitor interval="10" role="Master" timeout="20" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" start-delay="0" primitive p_lvm-spacewalk ocf:heartbeat:LVM \ operations $id="spacewalk-LVM-operations" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ params volgrpname="drbd_spacewalk" ms ms_drbd-spacewalk p_drbd-spacewalk \ meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true" clone clone_lvm-spacewalk p_lvm-spacewalk \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" colocation c_lvm-spacewalk_on_drbd-spacewalk inf: clone_lvm-spacewalk ms_drbd-spacewalk:Master
KVM Virt - VirtualDomain[edit]
- these are all auto created from the script below
primitive p_vd-spacewalk-ha ocf:heartbeat:VirtualDomain \ params config="/etc/libvirt/qemu/spacewalk-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \ operations $id="p_vd-spacewalk-operations" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="90" \ op migrate_from interval="0" timeout="240" \ op migrate_to interval="0" timeout="240" \ op monitor interval="10" timeout="30" start-delay="0" \ meta allow-migrate="true" failure-timeout="10min" target-role="Started" colocation c_vd-spacewalk-on-master inf: p_vd-spacewalk-ha ms_drbd-spacewalk:Master order o_drbm-lvm-vd-start-spacewalk inf: ms_drbd-spacewalk:promote clone_lvm-spacewalk:start p_vd-spacewalk-ha:start
DRBD[edit]
- these are all auto created from the script below
resource spacewalk { protocol C; startup { become-primary-on both; } net { allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } disk { on-io-error detach; fencing resource-only; } handlers { #split-brain "/usr/lib/drbd/notify-split-brain.sh root"; fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } syncer { rate 50M; } on bigeye { device /dev/drbd9; disk /dev/raid1/drbd_spacewalk; address 10.69.1.253:7799; meta-disk internal; } on blindpig { device /dev/drbd9; disk /dev/raid10/drbd_spacewalk; address 10.69.1.250:7799; meta-disk internal; } }
Script[edit]
- this will create the config/install above
#cat create.new.sh NAME=spacewalk ## virt name SIZE=20 ## virt size GB LVMETA=lvmeta ## volume group on VG stated above for metadata DRBDNUM=8 ## how many drbds do you have right now? NODE1_VG=raid1 ## VolumeGroup for DRBD lvm NODE2_VG=raid10 ## VolumeGroup for DRBD lvm NODE1_IP=10.69.1.253 NODE2_IP=10.69.1.250 NODE1_NAME=bigeye NODE2_NAME=blindpig #NODE3_NAME=blindpig2 ############ DO NOT EDIT BELOW ####################### NODE2=$NODE2_IP DRBD_SIZE=$SIZE let DRBD_SIZE+=1 let DRBDNUM+=1 #let DRBDNUM+=1 let PORT=7790+DRBDNUM echo ' resource '$NAME' { protocol C; startup { become-primary-on both; } net { allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } disk { on-io-error detach; fencing resource-only; } handlers { #split-brain "/usr/lib/drbd/notify-split-brain.sh root"; fence-peer "/usr/lib/drbd/crm-fence-peer.sh"; after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh"; } syncer { rate 50M; } on '$NODE1_NAME' { device /dev/drbd'$DRBDNUM'; disk /dev/'$NODE1_VG'/drbd_'$NAME'; address '$NODE1_IP':'$PORT'; meta-disk internal; } on '$NODE2_NAME' { device /dev/drbd'$DRBDNUM'; disk /dev/'$NODE2_VG'/drbd_'$NAME'; address '$NODE2_IP':'$PORT'; meta-disk internal; } } ' > $NAME.res echo 'primitive p_drbd-'$NAME' ocf:linbit:drbd \ params drbd_resource="'$NAME'" \ operations $id="p_drbd_'$NAME'-operations" \ op monitor interval="20" role="Slave" timeout="20" \ op monitor interval="10" role="Master" timeout="20" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" start-delay="0" primitive p_lvm-'$NAME' ocf:heartbeat:LVM \ operations $id="'$NAME'-LVM-operations" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ params volgrpname="drbd_'$NAME'" ms ms_drbd-'$NAME' p_drbd-'$NAME' \ meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true" clone clone_lvm-'$NAME' p_lvm-'$NAME' \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" colocation c_lvm-'$NAME'_on_drbd-'$NAME' inf: clone_lvm-'$NAME' ms_drbd-'$NAME':Master ' > $NAME'.crm' #location drbd_'$NAME'_excl ms_drbd-'$NAME' \ # rule $id="drbd_'$NAME'_excl-rule" -inf: #uname eq '$NODE3_NAME' echo 'primitive p_vd-'$NAME'-ha ocf:heartbeat:VirtualDomain \ params config="/etc/libvirt/qemu/'$NAME'-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \ operations $id="p_vd-'$NAME'-operations" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="90" \ op migrate_from interval="0" timeout="240" \ op migrate_to interval="0" timeout="240" \ op monitor interval="10" timeout="30" start-delay="0" \ meta allow-migrate="true" failure-timeout="10min" target-role="Started" colocation c_vd-'$NAME'-on-master inf: p_vd-'$NAME'-ha ms_drbd-'$NAME':Master order o_drbm-lvm-vd-start-'$NAME' inf: ms_drbd-'$NAME':promote clone_lvm-'$NAME':start p_vd-'$NAME'-ha:start ' > $NAME'-vd.crm' ## test DRBD before cmd="drbdadm dump -t $NAME.res" $cmd >/dev/null rc=$? if [[ $rc != 0 ]] ; then echo -e "\n !!! DRBD config ("$NAME.res")file will not work.. need to fix this first. exiting...\n" echo -e " check command: "$cmd"\n"; echo -e "\n * HINT: you might just need to remove the file /etc/drbd.d/"$NAME.res" [be careful]"; echo -e " mv /etc/drbd.d/"$NAME.res" ./$NAME.res.disabled."$NODE1_NAME echo -e " scp "$NODE2":/etc/drbd.d/"$NAME.res" ./$NAME.res.disabled."$NODE2_NAME echo -e " ssh "$NODE2" -C mv /etc/drbd.d/"$NAME.res" /tmp/$NAME.res.disabled" # exit $rc fi echo -e " * DRBD config verified (it should work)\n" echo ' ' echo -e '\n# 1) create LVM for DRBD device' echo ' 'lvcreate --name drbd_$NAME --size $DRBD_SIZE'.1GB' $NODE1_VG echo ' 'ssh $NODE2 -C lvcreate --name drbd_$NAME --size $DRBD_SIZE'.1GB' $NODE2_VG echo -e '\n# 2) copy '$NAME'.res to /etc/drbd.d/' echo ' 'cp $NAME.res /etc/drbd.d/ echo ' 'scp $NAME.res $NODE2:/etc/drbd.d/ echo -e '\n# 3) reloading drbd' echo ' '/etc/init.d/drbd reload echo ' 'ssh $NODE2 -C /etc/init.d/drbd reload echo -e '\n# 4) create DRBD device on both nodes' echo ' 'drbdadm -- --force create-md $NAME echo ' 'ssh $NODE2 -C drbdadm -- --force create-md $NAME echo -e '\n# 5) reloading drbd' echo ' '/etc/init.d/drbd reload echo ' 'ssh $NODE2 -C /etc/init.d/drbd reload echo -e '\n# 6) bring drbd up on both nodes' echo ' 'drbdadm up $NAME echo ' 'ssh $NODE2 -C drbdadm up $NAME echo -e '\n# 7) set '$NODE1_NAME' primary and overwrite '$NODE2_NAME echo ' 'drbdadm -- --overwrite-data-of-peer primary $NAME echo -e '\n# 8) set '$NODE2_NAME' secondary (should already be set)' echo ' 'ssh $NODE2 -C drbdadm secondary $NAME echo -e '\n# 9) '$NODE1_NAME' create PV/VG/LV (not setting VG to cluster aware yet due to LVM bug not using --monitor y)' echo ' 'pvcreate /dev/drbd$DRBDNUM echo ' 'vgcreate -c n drbd_$NAME /dev/drbd$DRBDNUM echo ' 'lvcreate -L$SIZE'G' -n$NAME drbd_$NAME echo -e '\n# 10) Activating VG drbd_'$NAME' -- (should already be, but just incase)' echo ' 'vgchange -a y drbd_$NAME ## ubuntu bug -- enable if ubuntu host #echo ' 'vgchange -a y drbd_$NAME --monitor y echo -e '\n# 11) create the POOL in virsh' echo ' 'virsh pool-create-as drbd_$NAME --type=logical --target=/dev/drbd_$NAME echo -e '\n# 12a) If this is NEW kvm install - continue following - else go to step 12b' echo ' + NOW install new virt from '$NODE1_NAME' on /dev/drbd_'$NAME'/'$NAME named $NAME'-ha' echo ' # after intalled and rebooted' echo ' ' scp /etc/libvirt/qemu/$NAME'-ha.xml' $NODE2:/etc/libvirt/qemu/$NAME'-ha.xml' echo ' ' ssh $NODE2 -C virsh define /etc/libvirt/qemu/$NAME'-ha.xml' echo ' # test virsh shutdown -- install acpid' echo ' ' virsh shutdown $NAME1'-ha' echo ' * SKIP 12b ' echo ' 12b) If this is a migration from an exsiting KVM virt - continue, else skip 2, you already completed step 1 right?' echo ' ## restore your KVM/LVM to the new LV: of=/dev/drbd_'$NAME'/'$NAME' bs=1M' echo ' command: dd if=<your image files.img> of=/dev/drbd_'$NAME'/'$NAME' bs=1M' echo ' ## Edit the exists KVM xml file -- copy the existing file to edit' echo ' ' cp /etc/libvirt/qemu/$NAME'.xml' ./$NAME'-ha.xml' echo ' -modify: <name>'$NAME'</name> to <name>'$NAME'-ha</name>' echo ' -remove: <uuid>[some long uuid]</uuid>' echo ' ' emacs $NAME'-ha.xml' echo ' ' cp $NAME'-ha.xml' /etc/libvirt/qemu/$NAME'-ha.xml' echo ' #' this will setup a uniuq UUID, which is needed before you copy to $NODE2_NAME echo ' ' virsh define /etc/libvirt/qemu/$NAME'-ha.xml' echo ' ' scp /etc/libvirt/qemu/$NAME'-ha.xml' $NODE2:/etc/libvirt/qemu/$NAME'-ha.xml' echo ' ' ssh $NODE2 -C virsh define /etc/libvirt/qemu/$NAME'-ha.xml' echo -e '\n#' echo '# All install work is done. deactivate VG / set cluster aware / and down drbd for pacemaker provisioning' echo -e "#\n" echo -e '\n# 13) deactivate VG drbd_'$NAME' on '$NODE2_NAME ## ubuntu bug -- enable if ubuntu host #echo ' 'vgchange -a n drbd_$NAME --monitor y echo ' 'vgchange -a n drbd_$NAME echo -e '\n# 14) set drbd primary on '$NODE2_NAME' to set VG cluster aware' ## ubuntu bug -- enable if ubuntu host #echo ' 'vgchange -a n drbd_$NAME --monitor y echo ' 'vgchange -a n drbd_$NAME echo ' 'ssh $NODE2 -C drbdadm primary $NAME echo -e '\n# 15) activate VG on both nodes' ## ubuntu bug -- enable if ubuntu host #echo ' 'vgchange -a y drbd_$NAME --monitor y #echo ' 'ssh $NODE2 -C vgchange -a y drbd_$NAME --monitor y echo ' 'vgchange -a y drbd_$NAME echo ' 'ssh $NODE2 -C vgchange -a y drbd_$NAME echo -e '\n# 16) set VG cluster aware on both nodes (only one command is needed due to drbd)' echo ' 'vgchange -c y drbd_$NAME echo -e '\n# 17) deactivate VG' ## ubuntu bug -- enable if ubuntu host #echo ' 'vgchange -a n drbd_$NAME --monitor y #echo ' 'ssh $NODE2 -C vgchange -a n drbd_$NAME --monitor y echo ' 'vgchange -a n drbd_$NAME echo ' 'ssh $NODE2 -C vgchange -a n drbd_$NAME echo -e '\n# 18) down drbd on both - so we can put it in pacemaker' echo ' 'drbdadm down $NAME echo ' 'ssh $NODE2 -C drbdadm down $NAME echo -e '\n# 19) MAKE sure the disk cache for the virtio is set to NONE - live migrate will fail is no' echo -e '\n#' echo '# Now lets provision Pacemaker -- we already expect you have a working pacemaker config with DLM/CLVM' echo -e "#\n" echo -e '\n# 19) Load the dual primary drbd/lvm RA config to the cluster' echo ' crm configure < '$NAME'.crm' echo -e '\n# 20) verify all is good with crm_mon: DRBD should look like something below' echo -e " crm_mon -f\n" echo ' Master/Slave Set: ms_drbd-'$NAME' [p_drbd-'$NAME']' echo -e ' Masters: [ '$NODE2_NAME' '$NODE2_NAME" ]\n" echo -e '\n# 21) Load the VirtualDomain RA confi to the cluster' echo ' crm configure < '$NAME'-vd.crm' echo '#####################################################################' echo '# Files Created' echo '# '$NAME'.res # for DRBD' echo '# '$NAME'.crm # DRBD/LVM configs to load into crm configure' echo '# '$NAME'-vd.crm # KVM VirtualDomain configs to load into crm configure'
notes[edit]
* running the script will test the DRBD resource and at least print a warning
!!! DRBD config (spacewalk.res) file will not work.. need to fix this first. exiting... check command: drbdadm dump -t spacewalk.res * HINT: you might just need to remove the file /etc/drbd.d/spacewalk.res [be careful] mv /etc/drbd.d/spacewalk.res ./spacewalk.res.disabled.bigeye scp 10.69.1.250:/etc/drbd.d/spacewalk.res ./spacewalk.res.disabled.blindpig ssh 10.69.1.250 -C mv /etc/drbd.d/spacewalk.res /tmp/spacewalk.res.disabled
Backups[edit]
Pacemaker[edit]
crm configure save /path/to/file.bak
DRBD & CLVM[edit]
- We have the option to snapshopt both the DRBD backing device and the KVM Virt LV
- Major issues with backups.
1) LVM DRBD backing device snapshots hang (when primary)
- workaround: we will set DRBD device as secondary / snapshot+backup DRBD LV / set DRBD primary
2) CLVM does not allow for snapshots
- workaround: we will set DRBD device as secondary / remove VG cluster bit / snapshot+backup CLVM / set VG cluster bit / set DRBD primary
DRBD backing Device[edit]
- you will have to edit some variables for this to work properly
- It will also set DRBD and others in unmanaged mode, so pacemaker will not potential fence on failures
- This is a heavily modified version of http://repo.firewall-services.com/misc/virt/virt-backup.pl (other options like cleanup do not work)
- Usage
- ./virt-backup-drbd_backdevice.pl vm=<virt_name> [--compress]
- virt-backup-drbd_backdevice.pl
#!/usr/bin/perl -w # vm == drbd use XML::Simple; use Sys::Virt; use Getopt::Long; # Set umask umask(022); # Some constant my $drbd_dir = '/etc/drbd.d/'; our %opts = (); our @vms = (); our @excludes = (); our @disks = (); our $drbd_dev; my $migrate_to = 'bigeye'; ## host to migrate machines to if they are running locally my $migrate_from = 'blindpig'; ## ht # Sets some defaults values my $host =`hostname`; chomp($host); my $migration = 0; #placeholder # What to run. The default action is to dump $opts{action} = 'dump'; # Where backups will be stored. This directory must already exists $opts{backupdir} = '/NFS/_local_/_backups/DRBD/'; # Size of LVM snapshots (which will be used to backup VM with minimum downtown # if the VM store data directly on a LV) $opts{snapsize} = '5G'; # Debug $opts{debug} = 1; $opts{snapshot} = 1; $opts{compress} = 'none'; $opts{lvcreate} = '/sbin/lvcreate -c 512'; $opts{lvremove} = '/sbin/lvremove'; $opts{blocksize} = '262144'; $opts{nice} = 'nice -n 19'; $opts{ionice} = 'ionice -c 2 -n 7'; $opts{livebackup} = 1; $opts{wasrunning} = 1; # get command line arguments GetOptions( "debug" => \$opts{debug}, "keep-lock" => \$opts{keeplock}, "state" => \$opts{state}, "snapsize=s" => \$opts{snapsize}, "backupdir=s" => \$opts{backupdir}, "vm=s" => \@vms, "action=s" => \$opts{action}, "cleanup" => \$opts{cleanup}, "dump" => \$opts{dump}, "unlock" => \$opts{unlock}, "connect=s" => \$opts{connect}, "snapshot!" => \$opts{snapshot}, "compress:s" => \$opts{compress}, "exclude=s" => \@excludes, "blocksize=s" => \$opts{blocksize}, "help" => \$opts{help} ); # Set compression settings if ($opts{compress} eq 'lzop'){ $opts{compext} = ".lzo"; $opts{compcmd} = "lzop -c"; } elsif ($opts{compress} eq 'bzip2'){ $opts{compext} = ".bz2"; $opts{compcmd} = "bzip2 -c"; } elsif ($opts{compress} eq 'pbzip2'){ $opts{compext} = ".bz2"; $opts{compcmd} = "pbzip2 -c"; } elsif ($opts{compress} eq 'xz'){ $opts{compext} = ".xz"; $opts{compcmd} = "xz -c"; } elsif ($opts{compress} eq 'lzip'){ $opts{compext} = ".lz"; $opts{compcmd} = "lzip -c"; } elsif ($opts{compress} eq 'plzip'){ $opts{compext} = ".lz"; $opts{compcmd} = "plzip -c"; } # Default is gzip elsif (($opts{compress} eq 'gzip') || ($opts{compress} eq '')) { $opts{compext} = ".gz"; $opts{compcmd} = "gzip -c"; } else{ $opts{compext} = ""; $opts{compcmd} = "cat"; } # Allow comma separated multi-argument @vms = split(/,/,join(',',@vms)); @excludes = split(/,/,join(',',@excludes)); # Backward compatible with --dump --cleanup --unlock $opts{action} = 'dump' if ($opts{dump}); $opts{action} = 'cleanup' if ($opts{cleanup}); $opts{action} = 'unlock' if ($opts{unlock}); # Stop here if we have no vm # Or the help flag is present if ((!@vms) || ($opts{help})){ usage(); exit 1; } if (! -d $opts{backupdir} ){ print "$opts{backupdir} is not a valid directory\n"; exit 1; } print "\n" if ($opts{debug}); foreach our $vm (@vms){ print "Checking $vm status\n\n" if ($opts{debug}); our $backupdir = $opts{backupdir}.'/'.$vm; if ($opts{action} eq 'cleanup'){ print "Running cleanup routine for $vm\n\n" if ($opts{debug}); # run_cleanup(); } elsif ($opts{action} eq 'dump'){ print "Running dump routine for $vm\n\n" if ($opts{debug}); run_dump(); } # else { # usage(); # exit 1; # } } ############################################################################ ############## FUNCTIONS #################### ############################################################################ sub prepare_backup{ my ($source,$res); my $target = $vm; my $match=0; ## locate the backing device for this res my @drbd_res = &runcmd("drbdadm dump $vm"); foreach my $line (@drbd_res) { $res = $line; if ($match == 1 && $line =~ /disk\s+(.*);/) { $source = $1; $match = 0; } if ($line =~ /device\s+.*(drbd\d+)\s+minor/) { $drbd_dev = $1; } if ($line =~ /on\s$host\s+{/i) { $match = 1; } } if (!$source) { print "Did not find DRBD backing deviced for VM\n"; exit; } else { ## set target backup file based on device $target = $source; $target =~ s/\//_-_/g; ## rename / to _-_ $target =~ s/^_-_//g; ## remove leading _-_ } ## Check if VM is running locally - migrate if off to backup ## set migrate = 1, to migrate back when done my $local_test = join("",&runcmd("virsh list")); if ($local_test =~ /$vm.*running/i) { print "$vm running locally - migration to $migrate_to\n"; my $pvd = &GetPVD($vm); &runcmd("crm resource migrate $pvd $migrate_to"); $migration = 1; sleep 1; my $remote_test = join("",&runcmd("ssh $migrate_to -C virsh list | grep -i $vm",1)); while($local_test =~ /(.*$vm.*)/) { print " $migrate_from:\t" . $1 . "\n"; print "(r)$migrate_to:\t$remote_test\n"; sleep 5; $local_test = join("",&runcmd("virsh list",1)); $remote_test = join("",&runcmd("ssh $migrate_to -C virsh list | grep -i $vm",1)); } $remote_test = join("",&runcmd("ssh $migrate_to -C virsh list | grep -i $vm",1)); print "We must of migrated ok... \n"; print "(r)$migrate_to:\t$remote_test\n"; } &runcmd("crm resource unmanage clone_lvm-" . $vm); &runcmd("crm resource unmanage ms_drbd-" . $vm); #&runcmd("crm resource unmanage p_drbd-" . $vm); sleep 1; &runcmd("vgchange -aln drbd_" . $vm,0,5); sleep 2; &runcmd("drbdadm secondary " . $vm); &runcmd("ssh $migrate_to -C touch /tmp/backup.$drbd_dev"); &runcmd("ssh $migrate_to -C touch /tmp/backup.p_drbd-$vm"); &runcmd("touch /tmp/backup.$drbd_dev"); &runcmd("touch /tmp/backup.p_drbd-$vm"); my $sec_check = join("",&runcmd("drbdadm role $vm")); if( $sec_check !~ /Secondary\/Primary/) { print "Fail: DRBD res [$vm] is not Secondary! result: $sec_check\n"; exit; } else { print "OK: DRBD res [$vm] is Secondary. result: $sec_check\n"; } if (!-d $backupdir) { mkdir $backupdir || die $!; } if (!-d $backupdir.'.meta') { mkdir $backupdir . '.meta' || die $!; } lock_vm(); save_drbd_res($res); my $time = "_".time(); # Try to snapshot the source if snapshot is enabled if ( ($opts{snapshot}) && (create_snapshot($source,$time)) ){ print "$source seems to be a valid logical volume (LVM), a snapshot has been taken as " . $source . $time ."\n" if ($opts{debug}); $source = $source.$time; push (@disks, {source => $source, target => $target . '_' . $time, type => 'snapshot'}); } # Summarize the list of disk to be dumped if ($opts{debug}){ if ($opts{action} eq 'dump'){ print "\n\nThe following disks will be dumped:\n\n"; foreach $disk (@disks){ print "Source: $disk->{source}\tDest: $backupdir/$vm" . '_' . $disk->{target} . ".img$opts{compext}\n"; } } } if ($opts{livebackup}){ print "\nWe can run a live backup\n" if ($opts{debug}); } } sub run_dump{ # Pause VM, dump state, take snapshots etc.. prepare_backup(); # Now, it's time to actually dump the disks foreach $disk (@disks){ my $source = $disk->{source}; my $dest = "$backupdir/$vm" . '_' . $disk->{target} . ".img$opts{compext}"; print "\nStarting dump of $source to $dest\n\n" if ($opts{debug}); my $ddcmd = "$opts{ionice} dd if=$source bs=$opts{blocksize} | $opts{nice} $opts{compcmd} > $dest 2>/dev/null"; print $ddcmd . "\n"; unless( system("$ddcmd") == 0 ){ die "Couldn't dump the block device/file $source to $dest\n"; } # Remove the snapshot if the current dumped disk is a snapshot destroy_snapshot($source) if ($disk->{type} eq 'snapshot'); } &runcmd("crm resource manage p_drbd-" . $vm); &runcmd("crm resource manage ms_drbd-" . $vm); &runcmd("crm resource manage clone_lvm-" . $vm); &runcmd("drbdadm primary " . $vm); sleep 1; &runcmd("ssh $migrate_to -C rm /tmp/backup.$drbd_dev"); &runcmd("ssh $migrate_to -C rm /tmp/backup.p_drbd-$vm"); &runcmd("rm /tmp/backup.$drbd_dev"); &runcmd("rm /tmp/backup.p_drbd-$vm"); &runcmd("vgchange -ay drbd_" . $vm); sleep 1; &runcmd("crm_resource -r ms_drbd-$vm -C"); sleep 1; &runcmd("crm_resource -r clone_lvm-$vm -C"); sleep 3; my $prim_check = join("",&runcmd("drbdadm role $vm")); print "DRBD resource: $prim_check\n"; ## if this was migrations, move it back if ($migration) { if ($prim_check =~ /primary\/primary/i) { ## migrate back my $local_test = join("",&runcmd("virsh list")); if ($local_test !~ /$vm.*running/i) { print "$vm NOT running locally - migration to $migrate_from\n"; my $pvd = &GetPVD($vm); &runcmd("crm resource migrate $pvd $migrate_from"); sleep 1; my$remote_test = join("",&runcmd("ssh $migrate_to -C virsh list | grep -i $vm")); my $status = 'unknown'; while($local_test !~ /(.*$vm.*running)/i) { if ($local_test =~ /(.*$vm.*)/i) { $status = $1; } print " $migrate_from:\t" . $status . "\n"; print "(r)$migrate_to:\t$remote_test\n"; sleep 5; $local_test = join("",&runcmd("virsh list",1)); $remote_test = join("",&runcmd("ssh $migrate_to -C virsh list | grep -i $vm",1)); } print "Migration is Done!\n"; print "(r)$migrate_from:\t$local_test\n"; } } } ## done # And remove the lock file, unless the --keep-lock flag is present unlock_vm() unless ($opts{keeplock}); } sub usage{ print "usage:\n$0 --action=[dump|cleanup|chunkmount|unlock] --vm=vm1[,vm2,vm3] [--debug] [--exclude=hda,hdb] [--compress] ". "[--state] [--no-snapshot] [--snapsize=<size>] [--backupdir=/path/to/dir] [--connect=<URI>] ". "[--keep-lock] [--bs=<block size>]\n" . "\n\n" . "\t--action: What action the script will run. Valid actions are\n\n" . "\t\t- dump: Run the dump routine (dump disk image to temp dir, pausing the VM if needed). It's the default action\n" . "\t\t- unlock: just remove the lock file, but don't cleanup the backup dir\n\n" . "\t--vm=name: The VM you want to work on (as known by libvirt). You can backup several VMs in one shot " . "if you separate them with comma, or with multiple --vm argument. You have to use the name of the domain, ". "ID and UUID are not supported at the moment\n\n" . "\n\nOther options:\n\n" . "\t--snapsize=<snapsize>: The amount of space to use for snapshots. Use the same format as -L option of lvcreate. " . "eg: --snapsize=15G. Default is 5G\n\n" . "\t--compress[=[gzip|bzip2|pbzip2|lzop|xz|lzip|plzip]]: On the fly compress the disks images during the dump. If you " . "don't specify a compression algo, gzip will be used.\n\n" . "\t--backupdir=/path/to/backup: Use an alternate backup dir. The directory must exists and be writable. " . "The default is /var/lib/libvirt/backup\n\n" . "\t--keep-lock: Let the lock file present. This prevent another " . "dump to run while an third party backup software (BackupPC for example) saves the dumped files.\n\n"; } # Dump the domain description as XML sub save_drbd_res{ my $res = shift; print "\nSaving XML description for $vm to $backupdir/$vm.res\n" if ($opts{debug}); open(XML, ">$backupdir/$vm" . ".res") || die $!; print XML $res; close XML; } # Create an LVM snapshot # Pass the original logical volume and the suffix # to be added to the snapshot name as arguments sub create_snapshot{ my ($blk,$suffix) = @_; my $ret = 0; print "Running: $opts{lvcreate} -p r -s -n " . $blk . $suffix . " -L $opts{snapsize} $blk > /dev/null 2>&1\n" if $opts{debug}; if ( system("$opts{lvcreate} -s -n " . $blk . $suffix . " -L $opts{snapsize} $blk > /dev/null 2>&1") == 0 ) { $ret = 1; open SNAPLIST, ">>$backupdir.meta/snapshots" or die "Error, couldn't open snapshot list file\n"; print SNAPLIST $blk.$suffix ."\n"; close SNAPLIST; } return $ret; } # Remove an LVM snapshot sub destroy_snapshot{ my $ret = 0; my ($snap) = @_; print "Removing snapshot $snap\n" if $opts{debug}; if (system ("$opts{lvremove} -f $snap > /dev/null 2>&1") == 0 ){ $ret = 1; } return $ret; } # Lock a VM backup dir # Just creates an empty lock file sub lock_vm{ print "Locking $vm\n" if $opts{debug}; open ( LOCK, ">$backupdir.meta/$vm.lock" ) || die $!; print LOCK ""; close LOCK; } # Unlock the VM backup dir # Just removes the lock file sub unlock_vm{ print "Removing lock file for $vm\n\n" if $opts{debug}; unlink <$backupdir.meta/$vm.lock>; } sub runcmd() { my $cmd = shift; my $quiet = shift; my $ignore = shift; ## ignore exit code 1 with greps -- not found is OK.. if ($cmd =~ /grep/) { $ignore = 1; } if (!$quiet) { print "exec: $cmd ... ";} my @output = `$cmd`; if ($?) { print $ignore . "\n"; my $e = sprintf("%d", $? >> 8); if ($ignore && $ignore == $e) { print "exit code = $e -- ignoring exit code $e\n"; } else { printf "\n******** command $cmd exited with value %d\n", $? >> 8; print @output; exit $? >> 8; } } if (!$quiet) { print "success\n"; } return @output; } ## get primative VirtualDomain sub GetPVD() { my $vm = shift; my $out = join("",&runcmd("crm resource show | grep $vm | grep VirtualDomain")); if ($out =~ /([\d\w\-\_]+)/) { return $1; } else { print "Could not locate Primative VirtualDomain for $vm\n"; } }
CLVM - KVM virt Snapshot[edit]
- you will have to edit some variables for this to work properly
- It will also set DRBD and others in unmanaged mode, so pacemaker will not potential fence on failures
- This is a heavily modified version of http://repo.firewall-services.com/misc/virt/virt-backup.pl (other options like cleanup do not work)
- Usage
- ./virt-backup-drbd_clvm.pl vm=<virt_name> [--compress]
- virt-backup-drbd_clvm.pl
#!/usr/bin/perl -w ## lots of hacks due to bugs.. in lvm/clustered vg use XML::Simple; use Sys::Virt; use Getopt::Long; use Data::Dumper; # Set umask umask(022); # Some constant my $drbd_dir = '/etc/drbd.d/'; our %opts = (); our @vms = (); our @excludes = (); our @disks = (); our $drbd_dev; my $migrate_to = 'blindpig'; ## host to migrate machines to if they are running locally my $migrate_from = 'bigeye'; ## ht # Sets some defaults values my $host =`hostname`; chomp($host); my $migration = 0; #placeholder # What to run. The default action is to dump $opts{action} = 'dump'; $opts{backupdir} = '/NFS/_local_/_backups/KVM/'; $opts{snapsize} = '1G'; # Debug $opts{debug} = 1; $opts{snapshot} = 1; $opts{compress} = 'none'; $opts{lvcreate} = '/sbin/lvcreate -c 512'; $opts{lvremove} = '/sbin/lvremove'; $opts{blocksize} = '262144'; $opts{nice} = 'nice -n 19'; $opts{ionice} = 'ionice -c 2 -n 7'; $opts{livebackup} = 1; $opts{wasrunning} = 1; # get command line arguments GetOptions( "debug" => \$opts{debug}, "keep-lock" => \$opts{keeplock}, "state" => \$opts{state}, "snapsize=s" => \$opts{snapsize}, "backupdir=s" => \$opts{backupdir}, "vm=s" => \@vms, "action=s" => \$opts{action}, "cleanup" => \$opts{cleanup}, "dump" => \$opts{dump}, "unlock" => \$opts{unlock}, "connect=s" => \$opts{connect}, "snapshot!" => \$opts{snapshot}, "compress:s" => \$opts{compress}, "exclude=s" => \@excludes, "blocksize=s" => \$opts{blocksize}, "help" => \$opts{help} ); # Set compression settings if ($opts{compress} eq 'lzop'){ $opts{compext} = ".lzo"; $opts{compcmd} = "lzop -c"; } elsif ($opts{compress} eq 'bzip2'){ $opts{compext} = ".bz2"; $opts{compcmd} = "bzip2 -c"; } elsif ($opts{compress} eq 'pbzip2'){ $opts{compext} = ".bz2"; $opts{compcmd} = "pbzip2 -c"; } elsif ($opts{compress} eq 'xz'){ $opts{compext} = ".xz"; $opts{compcmd} = "xz -c"; } elsif ($opts{compress} eq 'lzip'){ $opts{compext} = ".lz"; $opts{compcmd} = "lzip -c"; } elsif ($opts{compress} eq 'plzip'){ $opts{compext} = ".lz"; $opts{compcmd} = "plzip -c"; } # Default is gzip elsif (($opts{compress} eq 'gzip') || ($opts{compress} eq '')) { $opts{compext} = ".gz"; $opts{compcmd} = "gzip -c"; # $opts{compcmd} = "pigz -c -p 2"; } else{ $opts{compext} = ""; $opts{compcmd} = "cat"; } # Allow comma separated multi-argument @vms = split(/,/,join(',',@vms)); @excludes = split(/,/,join(',',@excludes)); # Backward compatible with --dump --cleanup --unlock $opts{action} = 'dump' if ($opts{dump}); $opts{action} = 'cleanup' if ($opts{cleanup}); $opts{action} = 'unlock' if ($opts{unlock}); # Libvirt URI to connect to $opts{connect} = "qemu:///system"; # Stop here if we have no vm # Or the help flag is present if ((!@vms) || ($opts{help})){ usage(); exit 1; } if (! -d $opts{backupdir} ){ print "$opts{backupdir} is not a valid directory\n"; exit 1; } print "\n" if ($opts{debug}); # Connect to libvirt print "\n\nConnecting to libvirt daemon using $opts{connect} as URI\n" if ($opts{debug}); our $libvirt = Sys::Virt->new( uri => $opts{connect} ) || die "Error connecting to libvirt on URI: $opts{connect}"; foreach our $vm (@vms){ print "Checking $vm status\n\n" if ($opts{debug}); our $backupdir = $opts{backupdir}.'/'.$vm; my $vdom = $vm . '-ha'; $vdom =~ s/-ha-ha/-ha/; our $dom = $libvirt->get_domain_by_name($vdom) || die "Error opening $vm object"; if ($opts{action} eq 'cleanup'){ print "Running cleanup routine for $vm\n\n" if ($opts{debug}); # run_cleanup(); } elsif ($opts{action} eq 'dump'){ print "Running dump routine for $vm\n\n" if ($opts{debug}); run_dump(); } # else { # usage(); # exit 1; # } } ############################################################################ ############## FUNCTIONS #################### ############################################################################ sub prepare_backup{ my ($source,$res); my $target = $vm; my $match=0; my $xml = new XML::Simple (); my $data = $xml->XMLin( $dom->get_xml_description(), forcearray => ['disk'] ); my @drbd_res = &runcmd("drbdadm dump $vm"); foreach my $line (@drbd_res) { $res = $line; if ($line =~ /device\s+.*(drbd\d+)\s+minor/) { $drbd_dev = $1; last; } } # Create a list of disks used by the VM foreach $disk (@{$data->{devices}->{disk}}){ if ($disk->{type} eq 'block'){ $source = $disk->{source}->{dev}; } elsif ($disk->{type} eq 'file'){ $source = $disk->{source}->{file}; } else{ print "\nSkiping $source for vm $vm as it's type is $disk->{type}: " . " and only block is supported\n" if ($opts{debug}); next; } ## we only support the first block device for now. if ($target && $source) { last; } } ## locate the backing device for this res #my @drbd_res = &runcmd("drbdadm dump $vm"); #foreach my $line (@drbd_res) { # $res = $line; # if ($match == 1 && $line =~ /disk\s+(.*);/) { # $source = $1; # $match = 0; # } # # if ($line =~ /on\s$host\s+{/i) { $match = 1; } # } # if (!$source) { # print "Did not find DRBD backing deviced for VM\n"; # exit; # } else { # ## set target backup file based on device # $target = $source; # $target =~ s/\//_-_/g; ## rename / to _-_ # $target =~ s/^_-_//g; ## remove leading _-_ # } ## check if running on node2 - migrate here if so my $local_test = join("",&runcmd("virsh list")); if ($local_test !~ /$vm.*running/i) { my $status = 'not running'; print "$vm running remotely - migration to $migrate_to\n"; my $pvd = &GetPVD($vm); &runcmd("crm resource migrate $pvd $migrate_to"); $migration = 1; sleep 1; my $remote_test = join("",&runcmd("ssh $migrate_from -C virsh list | grep -i $vm",1)); while($remote_test =~ /(.*$vm.*)/) { print " $migrate_to:\t" . $status . "\n"; print "(r)$migrate_from:\t$remote_test\n"; sleep 5; $local_test = join("",&runcmd("virsh list",1)); if ($local_test =~ /(.*$vm.*)/i) { $status = $1; } $remote_test = join("",&runcmd("ssh $migrate_from -C virsh list | grep -i $vm",1)); } $remote_test = join("",&runcmd("ssh $migrate_from -C virsh list | grep -i $vm",1)); print "We must of migrated ok... \n"; print "(r)$migrate_to:\t$remote_test\n"; } &runcmd("crm resource unmanage clone_lvm-" . $vm); &runcmd("crm resource unmanage ms_drbd-" . $vm); sleep 1; &runcmd("ssh $migrate_from -C vgchange -aln drbd_" . $vm); # sleep 2; &runcmd("ssh $migrate_from -C drbdadm secondary " . $vm); &runcmd("ssh $migrate_from -C touch /tmp/backup.$drbd_dev"); &runcmd("ssh $migrate_from -C touch /tmp/backup.p_drbd-$vm"); &runcmd("touch /tmp/backup.$drbd_dev"); &runcmd("touch /tmp/backup.p_drbd-$vm"); my $sec_check = join("",&runcmd("drbdadm role $vm")); if( $sec_check !~ /Primary\/Secondary/) { print "Fail: DRBD res [$vm] is not the ONLY primary! result: $sec_check\n"; exit; } else { print "OK: DRBD res [$vm] is the ONLY Primary. result: $sec_check\n"; } if (!-d $backupdir) { mkdir $backupdir || die $!; } if (!-d $backupdir.'.meta') { mkdir $backupdir . '.meta' || die $!; } lock_vm(); &runcmd("vgchange -c n drbd_" . $vm); sleep 1; &runcmd("vgchange -aey drbd_" . $vm); #save_drbd_res($res); save_xml($res); my $time = "_".time(); # Try to snapshot the source if snapshot is enabled if ( ($opts{snapshot}) && (create_snapshot($source,$time)) ){ print "$source seems to be a valid logical volume (LVM), a snapshot has been taken as " . $source . $time ."\n" if ($opts{debug}); $source = $source.$time; push (@disks, {source => $source, target => $target . '_' . $time, type => 'snapshot'}); } # Summarize the list of disk to be dumped if ($opts{debug}){ if ($opts{action} eq 'dump'){ print "\n\nThe following disks will be dumped:\n\n"; foreach $disk (@disks){ print "Source: $disk->{source}\tDest: $backupdir/$vm" . '_' . $disk->{target} . ".img$opts{compext}\n"; } } } if ($opts{livebackup}){ print "\nWe can run a live backup\n" if ($opts{debug}); } } sub run_dump{ # Pause VM, dump state, take snapshots etc.. prepare_backup(); # Now, it's time to actually dump the disks foreach $disk (@disks){ my $source = $disk->{source}; my $dest = "$backupdir/$vm" . '_' . $disk->{target} . ".img$opts{compext}"; print "\nStarting dump of $source to $dest\n\n" if ($opts{debug}); my $ddcmd = "$opts{ionice} dd if=$source bs=$opts{blocksize} | $opts{nice} $opts{compcmd} > $dest 2>/dev/null"; unless( system("$ddcmd") == 0 ){ die "Couldn't dump the block device/file $source to $dest\n"; } # Remove the snapshot if the current dumped disk is a snapshot destroy_snapshot($source) if ($disk->{type} eq 'snapshot'); } $meta = unlink <$backupdir.meta/*>; rmdir "$backupdir.meta"; print "$meta metadata files removed\n\n" if $opts{debug}; &runcmd("ssh $migrate_from -C drbdadm primary " . $vm); &runcmd("ssh $migrate_from -C rm /tmp/backup.$drbd_dev"); &runcmd("ssh $migrate_from -C rm /tmp/backup.p_drbd-$vm"); &runcmd("rm /tmp/backup.$drbd_dev"); &runcmd("rm /tmp/backup.p_drbd-$vm"); sleep 1; &runcmd("vgchange -c y drbd_" . $vm); sleep 1; &runcmd("vgchange -ay drbd_" . $vm); sleep 1; &runcmd("crm resource manage p_drbd-" . $vm); &runcmd("crm resource manage ms_drbd-" . $vm); &runcmd("crm resource manage clone_lvm-" . $vm); sleep 1; &runcmd("crm_resource -r ms_drbd-$vm -C"); sleep 1; &runcmd("crm_resource -r clone_lvm-$vm -C"); sleep 3; my $prim_check = join("",&runcmd("drbdadm role $vm")); print "DRBD resource: $prim_check\n"; ## if this was migrations, move it back if ($migration) { if ($prim_check =~ /primary\/primary/i) { ## migrate back my $local_test = join("",&runcmd("virsh list")); if ($local_test =~ /$vm.*running/i) { print "$vm running locally - migration to $migrate_from from $migrate_to\n"; my $pvd = &GetPVD($vm); &runcmd("crm resource migrate $pvd $migrate_from"); sleep 1; my$remote_test = join("",&runcmd("ssh $migrate_to -C virsh list | grep -i $vm")); my $status = 'unknown'; while($local_test =~ /(.*$vm.*running)/i) { if ($local_test =~ /(.*$vm.*)/i) { $status = $1; } print " $migrate_to:\t" . $status . "\n"; print "(r)$migrate_from:\t$remote_test\n"; sleep 5; $local_test = join("",&runcmd("virsh list",1)); $remote_test = join("",&runcmd("ssh $migrate_from -C virsh list | grep -i $vm",1)); } print "Migration is Done!\n"; print "(r)$migrate_from:\t$local_test\n"; } } } ## done # And remove the lock file, unless the --keep-lock flag is present unlock_vm() unless ($opts{keeplock}); } sub usage{ print "usage:\n$0 --action=[dump|cleanup|chunkmount|unlock] --vm=vm1[,vm2,vm3] [--debug] [--exclude=hda,hdb] [--compress] ". "[--state] [--no-snapshot] [--snapsize=<size>] [--backupdir=/path/to/dir] [--connect=<URI>] ". "[--keep-lock] [--bs=<block size>]\n" . "\n\n" . "\t--action: What action the script will run. Valid actions are\n\n" . "\t\t- dump: Run the dump routine (dump disk image to temp dir, pausing the VM if needed). It's the default action\n" . "\t\t- unlock: just remove the lock file, but don't cleanup the backup dir\n\n" . "\t--vm=name: The VM you want to work on (as known by libvirt). You can backup several VMs in one shot " . "if you separate them with comma, or with multiple --vm argument. You have to use the name of the domain, ". "ID and UUID are not supported at the moment\n\n" . "\n\nOther options:\n\n" . "\t--snapsize=<snapsize>: The amount of space to use for snapshots. Use the same format as -L option of lvcreate. " . "eg: --snapsize=15G. Default is 5G\n\n" . "\t--compress[=[gzip|bzip2|pbzip2|lzop|xz|lzip|plzip]]: On the fly compress the disks images during the dump. If you " . "don't specify a compression algo, gzip will be used.\n\n" . "\t--backupdir=/path/to/backup: Use an alternate backup dir. The directory must exists and be writable. " . "The default is /var/lib/libvirt/backup\n\n" . "\t--keep-lock: Let the lock file present. This prevent another " . "dump to run while an third party backup software (BackupPC for example) saves the dumped files.\n\n"; } # Dump the domain description as XML sub save_drbd_res{ my $res = shift; print "\nSaving XML description for $vm to $backupdir/$vm.res\n" if ($opts{debug}); open(XML, ">$backupdir/$vm" . ".res") || die $!; print XML $res; close XML; } # Create an LVM snapshot # Pass the original logical volume and the suffix # to be added to the snapshot name as arguments sub create_snapshot{ my ($blk,$suffix) = @_; my $ret = 0; print "Running: $opts{lvcreate} -p r -s -n " . $blk . $suffix . " -L $opts{snapsize} $blk > /dev/null 2>&1\n" if $opts{debug}; if ( system("$opts{lvcreate} -s -n " . $blk . $suffix . " -L $opts{snapsize} $blk > /dev/null 2>&1") == 0 ) { $ret = 1; open SNAPLIST, ">>$backupdir.meta/snapshots" or die "Error, couldn't open snapshot list file\n"; print SNAPLIST $blk.$suffix ."\n"; close SNAPLIST; } return $ret; } # Remove an LVM snapshot sub destroy_snapshot{ my $ret = 0; my ($snap) = @_; print `lvs drbd_$vm`; print "Removing snapshot $snap\n" if $opts{debug}; if (system ("$opts{lvremove} -f $snap > /dev/null 2>&1") == 0 ){ $ret = 1; } return $ret; } # Lock a VM backup dir # Just creates an empty lock file sub lock_vm{ print "Locking $vm\n" if $opts{debug}; open ( LOCK, ">$backupdir.meta/$vm.lock" ) || die $!; print LOCK ""; close LOCK; } # Unlock the VM backup dir # Just removes the lock file sub unlock_vm{ print "Removing lock file for $vm\n\n" if $opts{debug}; unlink <$backupdir.meta/$vm.lock>; } sub runcmd() { my $cmd = shift; my $quiet = shift; my $ignore; ## ignore exit code 1 with greps -- not found is OK.. if ($cmd =~ /grep/) { $ignore = 1; } if (!$quiet) { print "exec: $cmd ... ";} my @output = `$cmd`; if ($?) { my $e = sprintf("%d", $? >> 8); if ($ignore && $ignore == $e) { print "grep - ignore exit code $e\n"; } else { printf "\n******** command $cmd exited with value %d\n", $? >> 8; print @output; exit $? >> 8; } } if (!$quiet) { print "success\n"; } return @output; } ## get primative VirtualDomain sub GetPVD() { my $vm = shift; my $out = join("",&runcmd("crm resource show | grep $vm | grep VirtualDomain")); if ($out =~ /([\d\w\-\_]+)/) { return $1; } else { print "Could not locate Primative VirtualDomain for $vm\n"; } } # Dump the domain description as XML sub save_xml{ print "\nSaving XML description for $vm to $backupdir/$vm.xml\n" if ($opts{debug}); open(XML, ">$backupdir/$vm" . ".xml") || die $!; print XML $dom->get_xml_description(); close XML; }
Cman/Pacemaker Notes[edit]
Firewall[edit]
- Allow UDP 5405 for message layer
cluster.conf[edit]
- just a basic cluster.conf config that works
<cluster name="ipa" config_version="31"> <cman two_node="1" expected_votes="1" cluster_id="1208"> <multicast addr="239.192.2.232"/> </cman> <clusternodes> <clusternode name="blindpig" nodeid="1"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="blindpig"/> </method> </fence> </clusternode> <clusternode name="bigeye" nodeid="3"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="bigeye"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="pcmk" agent="fence_pcmk"/> </fencedevices> <fence_daemon clean_start="1" post_fail_delay="10" post_join_delay="30"> </fence_daemon> <logging to_syslog="yes" syslog_facility="local6" debug="off"> </logging> </cluster>
Monitoring[edit]
- crm_mon -Qrf1
- check_crm
#!/usr/bin/perl # # check_crm_v0_5 # # Copyright © 2011 Philip Garner, Sysnix Consultants Limited # # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>. # # Authors: Phil Garner - phil@sysnix.com & Peter Mottram - peter@sysnix.com # # Acknowledgements: Vadym Chepkov, Sönke Martens # # v0.1 09/01/2011 # v0.2 11/01/2011 # v0.3 22/08/2011 - bug fix and changes suggested by Vadym Chepkov # v0.4 23/08/2011 - update for spelling and anchor regex capture (Vadym Chepkov) # v0.5 29/09/2011 - Add standby warn/crit suggested by Sönke Martens & removal # of 'our' to 'my' to completely avoid problems with ePN # # NOTES: Requires Perl 5.8 or higher & the Perl Module Nagios::Plugin # Nagios user will need sudo acces - suggest adding line below to # sudoers # nagios ALL=(ALL) NOPASSWD: /usr/sbin/crm_mon -1 -r -f # # In sudoers if requiretty is on (off state is default) # you will also need to add the line below # Defaults:nagios !requiretty # use warnings; use strict; use Nagios::Plugin; # Lines below may need changing if crm_mon or sudo installed in a # different location. my $sudo = '/usr/bin/sudo'; my $crm_mon = '/usr/sbin/crm_mon'; my $np = Nagios::Plugin->new( shortname => 'check_crm', version => '0.5', usage => "Usage: %s <ARGS>\n\t\t--help for help\n", ); $np->add_arg( spec => 'warning|w', help => 'If failed Nodes, stopped Resources detected or Standby Nodes sends Warning instead of Critical (default) as long as there are no other errors and there is Quorum', required => 0, ); $np->add_arg( spec => 'standbyignore|s', help => 'Ignore any node(s) in standby, by default sends Critical', required => 0, ); $np->getopts; my @standby; # Check for -w option set warn if this is case instead of crit my $warn_or_crit = 'CRITICAL'; $warn_or_crit = 'WARNING' if $np->opts->warning; my $fh; open( $fh, "$sudo $crm_mon -1 -r -f|" ) or $np->nagios_exit( CRITICAL, "Running sudo has failed" ); foreach my $line (<$fh>) { if ( $line =~ m/Connection to cluster failed\:(.*)/i ) { # Check Cluster connected $np->nagios_exit( CRITICAL, "Connection to cluster FAILED: $1" ); } elsif ( $line =~ m/Current DC:/ ) { # Check for Quorum if ( $line =~ m/partition with quorum$/ ) { # Assume cluster is OK - we only add warn/crit after here $np->add_message( OK, "Cluster OK" ); } else { $np->add_message( CRITICAL, "No Quorum" ); } } elsif ( $line =~ m/^offline:\s*\[\s*(\S.*?)\s*\]/i ) { next if $line =~ /\/dev\/block\//i; # Count offline nodes my @offline = split( /\s+/, $1 ); my $numoffline = scalar @offline; $np->add_message( $warn_or_crit, ": $numoffline Nodes Offline" ); } elsif ( $line =~ m/^node\s+(\S.*):\s*standby/i ) { # Check for standby nodes (suggested by Sönke Martens) # See later in code for message created from this push @standby, $1; } elsif ( $line =~ m/\s*([\w-]+)\s+\(\S+\)\:\s+Stopped/ ) { #next if $line =~ /hopvpn/i; # Check Resources Stopped $np->add_message( $warn_or_crit, ": $1 Stopped" ); } elsif ( $line =~ m/\s*stopped\:\s*\[(.*)\]/i ) { next if $line =~ /openvz/i; # Check Master/Slave stopped $np->add_message( $warn_or_crit, ": $1 Stopped" ); } elsif ( $line =~ m/^Failed actions\:/ ) { # Check Failed Actions ### rob fix this next; $np->add_message( CRITICAL, ": FAILED actions detected or not cleaned up" ); } elsif ( $line =~ m/\s*(\S+?)\s+ \(.*\)\:\s+\w+\s+\w+\s+\(unmanaged\)\s+FAILED/ ) { # Check Unmanaged $np->add_message( CRITICAL, ": $1 unmanaged FAILED" ); } elsif ( $line =~ m/\s*(\S+?)\s+ \(.*\)\:\s+not installed/i ) { # Check for errors $np->add_message( CRITICAL, ": $1 not installed" ); } elsif ( $line =~ m/\s*(\S+?):.*(fail-count=\d+)/i ) { my $one = $1; my $two = $2; if (-f "/tmp/backup.$1") { last; } $np->add_message( WARNING, ": $1 failure detected, $2" ); } } # If found any Nodes in standby & no -s option used send warn/crit if ( scalar @standby > 0 && !$np->opts->standbyignore ) { $np->add_message( $warn_or_crit, ": " . join( ', ', @standby ) . " in Standby" ); } close($fh) or $np->nagios_exit( CRITICAL, "Running crm_mon FAILED" ); $np->nagios_exit( $np->check_messages() );
Troubleshooting Tips[edit]
ocf:heartbeat:LVM patch[edit]
https://github.com/ljunkie/resource-agents/commit/4ed858bf4184c1b186cdae6efa428814ce4a02f0
- When deactivating multiple clustered (maybe non-cluster too) at once, sometimes vgchange -a ln fails.
- Instead We should continue trying until pacemaker calls it quits (logic used from linbits drbd RA)
failure[edit]
- vgchange -a ln fails, then the operation completes with a failure
<source lang=text>
lrmd: notice: operation_finished: p_lvm-vhosts_stop_0:845 [ 2013/04/17_23:48:03 INFO: Deactivating volume group drbd_vhosts ] lrmd: notice: operation_finished: p_lvm-vhosts_stop_0:845 [ 2013/04/17_23:48:03 ERROR: Can't deactivate volume group "drbd_vhosts" with 1 open logical volume(s) ] crmd: notice: process_lrm_event: LRM operation p_lvm-vhosts_stop_0 (call=518, rc=1, cib-update=113, confirmed=true) unknown error
</source>
success: with patch[edit]
- with the patch - it will try again and succeed
<source lang=text>
lrmd: notice: operation_finished: p_lvm-vhosts_stop_0:24355 [ 2013/04/18_11:05:31 INFO: Deactivating volume group drbd_vhosts ] lrmd: notice: operation_finished: p_lvm-vhosts_stop_0:24355 [ 2013/04/18_11:05:31 ERROR: Can't deactivate volume group "drbd_vhosts" with 1 open logical volume(s) ] lrmd: notice: operation_finished: p_lvm-vhosts_stop_0:24355 [ 2013/04/18_11:05:31 WARNING: drbd_vhosts still Active, Deactivating volume group drbd_vhosts. ] lrmd: notice: operation_finished: p_lvm-vhosts_stop_0:24355 [ 2013/04/18_11:05:31 INFO: Deactivating volume group drbd_vhosts ] lrmd: notice: operation_finished: p_lvm-vhosts_stop_0:24355 [ 2013/04/18_11:05:31 INFO: 0 logical volume(s) in volume group "drbd_vhosts" now active ] crmd: notice: process_lrm_event: LRM operation p_lvm-vhosts_stop_0 (call=963, rc=0, cib-update=200, confirmed=true) ok
</source>
patch[edit]
<source> --- /usr/lib/ocf/resource.d/heartbeat/LVM.orig 2013-04-18 10:08:57.333596804 -0700 +++ /usr/lib/ocf/resource.d/heartbeat/LVM 2013-04-18 10:36:39.741388039 -0700 @@ -229,24 +229,34 @@
# Disable the LVM volume # LVM_stop() {
+ local first_try=true + rc=$OCF_ERR_GENERIC
vgdisplay "$1" 2>&1 | grep 'Volume group .* not found' >/dev/null && { ocf_log info "Volume group $1 not found" return 0 }
- ocf_log info "Deactivating volume group $1" - ocf_run vgchange -a ln $1 || return 1
- if - LVM_status $1 - then - ocf_log err "LVM: $1 did not stop correctly" - return $OCF_ERR_GENERIC - fi + # try to deactivate first time + ocf_log info "Deactivating volume group $1" + ocf_run vgchange -a ln $1
- # TODO: This MUST run vgexport as well + # Keep trying to bring down the resource; + # wait for the CRM to time us out if this fails + while :; do + if LVM_status $1; then + ocf_log warn "$1 still Active, Deactivating volume group $1." + ocf_log info "Deactivating volume group $1" + ocf_run vgchange -a ln $1 + else + rc=$OCF_SUCCESS + break; + fi + $first_try || sleep 1 + first_try=false + done
- return $OCF_SUCCESS + return $rc
}
</source>
CMAN+CLVMD+Pacemaker 1.1.8+[edit]
- Pacemaker now starts and stop CMAN. Issue is that it doesn't account for CLVMD
- Fix INIT script to also start/stop CLVMD
- /etc/init.d/pacemaker
<source> --- pacemaker.orig 2013-04-15 12:40:53.085307309 -0700 +++ pacemaker 2013-04-16 10:17:10.359833467 -0700 @@ -119,6 +119,9 @@
success echo
+ ## stop clvmd before leaving fence domain + [ -f /etc/rc.d/init.d/clvmd ] && service clvmd stop +
echo -n "Leaving fence domain" fence_tool leave -w 10 checkrc
@@ -163,6 +166,7 @@
start) # For consistency with stop [ -f /etc/rc.d/init.d/cman ] && service cman start
+ [ -f /etc/rc.d/init.d/clvmd ] && service clvmd start
start ;; restart|reload|force-reload)
</source>
Live Migrations Fail[edit]
error: Unsafe migration: Migration may lead to data corruption if disks use cache != none
- Make sure you set your KVM disk cache to none
Verify you can with virsh first[edit]
<source>
virsh migrate --live <hostname> qemu+ssh://<other_server_name>/system
</source> <source>
virsh migrate --live vhosts-ha qemu+ssh://blindpig/system
</source>
Pacemaker 1.1.8 / libvirt-0.10.2-18[edit]
- Live migration fails going standby / online
- seems with pacemaker 1.1.8, all live migrations are concurrent (update - use migration-limit) http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Pacemaker_Explained/_available_cluster_options.html
- libvirt used to set live migrations ~30MB/s, but now it's infinite (8796093022207 MB/s) http://www.mail-archive.com/libvir-list@redhat.com/msg60259.html
Libvirt Fix (migrate-setspeed)[edit]
- libvirt - migrate-setspeed (this is not persistent though across service restarts)
- persistent fix: edit /etc/init.d/libvirtd
<source> --- libvirtd 2013-04-16 09:28:53.257824206 -0700 +++ libvirtd.orig 2013-04-16 10:25:31.358915941 -0700 @@ -85,22 +85,6 @@
RETVAL=$? echo [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$SERVICE
- - ## hook to set bandwidth for live migration - BW=50 - VIRSH=`which virsh` - LIST_VM=`virsh list --all | grep -v Name | awk '{print $2}' | egrep "\w"` - DATE=`date -R` - LOGFILE="/var/log/kvm_setspeed.log" - for vm in $LIST_VM - do - BWprev=`/usr/bin/virsh migrate-getspeed $vm` - /usr/bin/virsh migrate-setspeed --bandwidth $BW $vm > /dev/null - BWcur=`/usr/bin/virsh migrate-getspeed $vm` - echo "$DATE : $VIRSH migrate-setspeed --bandwidth $BW $vm [cur: $BWcur -- prev: $BWprev]" >> $LOGFILE - - done - # end BW hook
} stop() {
</source>
Pacemaker Fix (migration-limit)[edit]
- Pacemaker - set the migration-limit (default -1 unlimited)
<source> crm_attribute --attr-name migration-limit --attr-value 2 crm_attribute --attr-name migration-limit --get-value scope=crm_config name=migration-limit value=2 </source>
virsh commands used[edit]
- setting this to 30mb/s per Virt
<source> virsh migrate-setspeed --bandwidth 30 <VIRTNAME> virsh migrate-getspeed <VIRTNAME> </source> <source>
- Default is infinite
virsh migrate-getspeed vhosts-ha 8796093022207
- set the speed to 30MB/s
virsh migrate-setspeed --bandwidth 30 vhosts-ha
- now it's limited
virsh migrate-getspeed vhosts-ha 30 </source>
pacemaker config - dump[edit]
- This has other examples - not just drbd/kvm
node bigeye \ attributes standby="off" node blindpig \ attributes standby="off" primitive p_cluster_mon ocf:pacemaker:ClusterMon \ params pidfile="/var/run/crm_mon.pid" htmlfile="/var/www/html/index.html" \ op start interval="0" timeout="20s" \ op stop interval="0" timeout="20s" \ op monitor interval="10s" timeout="20s" primitive p_drbd-backuppc ocf:linbit:drbd \ params drbd_resource="backuppc" \ operations $id="p_drbd_backuppc-operations" \ op monitor interval="20" role="Slave" timeout="20" \ op monitor interval="10" role="Master" timeout="20" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" start-delay="0" primitive p_drbd-dogfish-ha ocf:linbit:drbd \ params drbd_resource="dogfish-ha" \ operations $id="p_drbd_dogfish-ha-operations" \ op monitor interval="20" role="Slave" timeout="20" \ op monitor interval="10" role="Master" timeout="20" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" start-delay="0" primitive p_drbd-hopmon ocf:linbit:drbd \ params drbd_resource="hopmon" \ operations $id="p_drbd_hopmon-operations" \ op monitor interval="20" role="Slave" timeout="20" \ op monitor interval="10" role="Master" timeout="20" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" start-delay="0" primitive p_drbd-hoptical ocf:linbit:drbd \ params drbd_resource="hoptical" \ operations $id="p_drbd_hoptical-operations" \ op monitor interval="20" role="Slave" timeout="20" \ op monitor interval="10" role="Master" timeout="20" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" start-delay="0" primitive p_drbd-hopvpn ocf:linbit:drbd \ params drbd_resource="hopvpn" \ operations $id="p_drbd_hopvpn-operations" \ op monitor interval="20" role="Slave" timeout="30" \ op monitor interval="10" role="Master" timeout="30" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" start-delay="0" primitive p_drbd-musicbrainz ocf:linbit:drbd \ params drbd_resource="musicbrainz" \ operations $id="p_drbd_musicbrainz-operations" \ op monitor interval="20" role="Slave" timeout="20" \ op monitor interval="10" role="Master" timeout="20" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" start-delay="0" primitive p_drbd-spacewalk ocf:linbit:drbd \ params drbd_resource="spacewalk" \ operations $id="p_drbd_spacewalk-operations" \ op monitor interval="20" role="Slave" timeout="20" \ op monitor interval="10" role="Master" timeout="20" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="240" start-delay="0" primitive p_drbd-vhosts ocf:linbit:drbd \ params drbd_resource="vhosts" \ operations $id="p_drbd_vhosts-operations" \ op monitor interval="20" role="Slave" timeout="20" \ op monitor interval="10" role="Master" timeout="20" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" start-delay="0" primitive p_drbd-vz ocf:linbit:drbd \ params drbd_resource="vz" \ operations $id="p_drbd_vz-operations" \ op monitor interval="20" role="Slave" timeout="20" \ op monitor interval="10" role="Master" timeout="20" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" start-delay="0" primitive p_drbd-win7 ocf:linbit:drbd \ params drbd_resource="win7" \ operations $id="p_drbd_win7-operations" \ op monitor interval="20" role="Slave" timeout="20" \ op monitor interval="10" role="Master" timeout="20" \ op start interval="0" timeout="240" \ op stop interval="0" timeout="100" start-delay="0" primitive p_gfs2-vz-config ocf:heartbeat:Filesystem \ params device="/dev/mapper/vg_drbd_vz-gfs_vz_config" directory="/etc/vz" fstype="gfs2" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ op monitor interval="120s" primitive p_gfs2-vz-storage ocf:heartbeat:Filesystem \ params device="/dev/mapper/vg_drbd_vz-gfs_vz_storage" directory="/vz" fstype="gfs2" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ op monitor interval="120s" primitive p_ip-10.69.1.1 ocf:heartbeat:IPaddr2 \ params ip="10.69.1.1" cidr_netmask="32" nic="lo" \ meta target-role="Started" primitive p_ip-10.69.1.2 ocf:heartbeat:IPaddr2 \ params ip="10.69.1.2" cidr_netmask="32" nic="lo" \ meta target-role="Started" primitive p_lvm-backuppc ocf:heartbeat:LVM \ operations $id="backuppc-LVM-operations" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ params volgrpname="drbd_backuppc" primitive p_lvm-dogfish-ha ocf:heartbeat:LVM \ operations $id="dogfish-ha-LVM-operations" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ params volgrpname="drbd_dogfish-ha" primitive p_lvm-hopmon ocf:heartbeat:LVM \ operations $id="hopmon-LVM-operations" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ params volgrpname="drbd_hopmon" primitive p_lvm-hoptical ocf:heartbeat:LVM \ operations $id="hoptical-LVM-operations" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ params volgrpname="drbd_hoptical" primitive p_lvm-hopvpn ocf:heartbeat:LVM \ operations $id="hopvpn-LVM-operations" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ params volgrpname="drbd_hopvpn" primitive p_lvm-musicbrainz ocf:heartbeat:LVM \ operations $id="musicbrainz-LVM-operations" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ params volgrpname="drbd_musicbrainz" primitive p_lvm-spacewalk ocf:heartbeat:LVM \ operations $id="spacewalk-LVM-operations" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ params volgrpname="drbd_spacewalk" primitive p_lvm-vhosts ocf:heartbeat:LVM \ operations $id="vhosts-LVM-operations" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ params volgrpname="drbd_vhosts" primitive p_lvm-vz ocf:heartbeat:LVM \ operations $id="vz-LVM-operations" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ params volgrpname="vg_drbd_vz" primitive p_lvm-win7 ocf:heartbeat:LVM \ operations $id="win7-LVM-operations" \ op start interval="0" timeout="120" \ op stop interval="0" timeout="120" \ params volgrpname="drbd_win7" primitive p_vd-backuppc-ha ocf:heartbeat:VirtualDomain \ params config="/etc/libvirt/qemu/backuppc-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \ operations $id="p_vd-backuppc-operations" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="90" \ op migrate_from interval="0" timeout="240" \ op migrate_to interval="0" timeout="240" \ op monitor interval="10" timeout="30" start-delay="0" \ meta allow-migrate="true" failure-timeout="3min" target-role="Started" is-managed="true" resource-stickiness="100" primitive p_vd-dogfish-ha ocf:heartbeat:VirtualDomain \ params config="/etc/libvirt/qemu/dogfish-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \ operations $id="p_vd-dogfish-ha-operations" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="90" \ op migrate_from interval="0" timeout="600" \ op migrate_to interval="0" timeout="600" \ op monitor interval="10" timeout="30" start-delay="10" \ meta allow-migrate="true" failure-timeout="3min" target-role="Started" resource-stickiness="100" is-managed="true" primitive p_vd-hopmon-ha ocf:heartbeat:VirtualDomain \ params config="/etc/libvirt/qemu/hopmon-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \ operations $id="p_vd-hopmon-operations" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="90" \ op migrate_from interval="0" timeout="240" \ op migrate_to interval="0" timeout="240" \ op monitor interval="10" timeout="30" start-delay="10" \ meta allow-migrate="true" failure-timeout="3min" target-role="Started" primitive p_vd-hoptical-ha ocf:heartbeat:VirtualDomain \ params config="/etc/libvirt/qemu/hoptical-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \ operations $id="p_vd-hoptical-operations" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="90" \ op migrate_from interval="0" timeout="240" \ op migrate_to interval="0" timeout="240" \ op monitor interval="10" timeout="30" start-delay="10" \ meta allow-migrate="true" failure-timeout="3min" target-role="Started" resource-stickiness="100" primitive p_vd-hopvpn-ha ocf:heartbeat:VirtualDomain \ params config="/etc/libvirt/qemu/hopvpn-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \ operations $id="p_vd-hopvpn-operations" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="90" \ op migrate_from interval="0" timeout="240" \ op migrate_to interval="0" timeout="240" \ op monitor interval="10" timeout="30" start-delay="10" \ meta allow-migrate="true" failure-timeout="3min" target-role="Started" resource-stickiness="100" is-managed="true" primitive p_vd-musicbrainz-ha ocf:heartbeat:VirtualDomain \ params config="/etc/libvirt/qemu/musicbrainz-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \ operations $id="p_vd-musicbrainz-operations" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="90" \ op migrate_from interval="0" timeout="240" \ op migrate_to interval="0" timeout="240" \ op monitor interval="10" timeout="30" start-delay="10" \ meta allow-migrate="true" failure-timeout="3min" target-role="Started" resource-stickiness="100" primitive p_vd-spacewalk-ha ocf:heartbeat:VirtualDomain \ params config="/etc/libvirt/qemu/spacewalk-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \ operations $id="p_vd-spacewalk-operations" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="90" \ op migrate_from interval="0" timeout="240" \ op migrate_to interval="0" timeout="240" \ op monitor interval="10" timeout="30" start-delay="0" \ meta allow-migrate="true" failure-timeout="10min" target-role="Started" is-managed="true" primitive p_vd-vhosts-ha ocf:heartbeat:VirtualDomain \ params config="/etc/libvirt/qemu/vhosts-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \ operations $id="p_vd-vhosts-ha-operations" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="90" \ op migrate_from interval="0" timeout="240" \ op migrate_to interval="0" timeout="240" \ op monitor interval="10" timeout="30" start-delay="10" \ meta allow-migrate="true" failure-timeout="3min" target-role="Started" resource-stickiness="100" primitive p_vd-win7-ha ocf:heartbeat:VirtualDomain \ params config="/etc/libvirt/qemu/win7-ha.xml" migration_transport="ssh" force_stop="0" hypervisor="qemu:///system" \ operations $id="p_vd-win7-operations" \ op start interval="0" timeout="90" \ op stop interval="0" timeout="90" \ op migrate_from interval="0" timeout="600" \ op migrate_to interval="0" timeout="600" \ op monitor interval="10" timeout="30" start-delay="0" \ meta allow-migrate="true" failure-timeout="3min" target-role="Started" primitive st_bigeye stonith:fence_drac5 \ params ipaddr="<hidden>" login="cman" passwd="<hidden>" action="reboot" secure="true" pcmk_host_list="bigeye" pcmk_host_check="static-list" primitive st_blindpig stonith:fence_apc_snmp \ params inet4_only="1" community="<hidden>" port="blindpig" action="reboot" ipaddr="<hidden>" snmp_version="1" pcmk_host_check="static-list" pcmk_host_list="blindpig" pcmk_host_map="blindpig:6" ms ms_drbd-backuppc p_drbd-backuppc \ meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true" ms ms_drbd-dogfish-ha p_drbd-dogfish-ha \ meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true" ms ms_drbd-hopmon p_drbd-hopmon \ meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true" ms ms_drbd-hoptical p_drbd-hoptical \ meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true" ms ms_drbd-hopvpn p_drbd-hopvpn \ meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true" ms ms_drbd-musicbrainz p_drbd-musicbrainz \ meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true" ms ms_drbd-spacewalk p_drbd-spacewalk \ meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true" ms ms_drbd-vhosts p_drbd-vhosts \ meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true" ms ms_drbd-vz p_drbd-vz \ meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true" ms ms_drbd-win7 p_drbd-win7 \ meta master-max="2" clone-max="2" notify="true" migration-threshold="1" allow-migrate="true" target-role="Started" interleave="true" is-managed="true" clone c_cluster_mon p_cluster_mon \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" clone c_st_bigeye st_bigeye \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" clone c_st_blindpig st_blindpig \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" clone clone_gfs2-vz-config p_gfs2-vz-config \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" clone clone_gfs2-vz-storage p_gfs2-vz-storage \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" clone clone_lvm-backuppc p_lvm-backuppc \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" clone clone_lvm-dogfish-ha p_lvm-dogfish-ha \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" clone clone_lvm-hopmon p_lvm-hopmon \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" clone clone_lvm-hoptical p_lvm-hoptical \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" clone clone_lvm-hopvpn p_lvm-hopvpn \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" clone clone_lvm-musicbrainz p_lvm-musicbrainz \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" clone clone_lvm-spacewalk p_lvm-spacewalk \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" clone clone_lvm-vhosts p_lvm-vhosts \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" clone clone_lvm-vz p_lvm-vz \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" clone clone_lvm-win7 p_lvm-win7 \ meta clone-max="2" notify="true" target-role="Started" interleave="true" is-managed="true" location cli-prefer-p_cluster_mon c_cluster_mon \ rule $id="cli-prefer-rule-p_cluster_mon" inf: #uname eq bigeye location cli-prefer-p_ip-10.69.1.1 p_ip-10.69.1.1 \ rule $id="cli-prefer-rule-p_ip-10.69.1.1" inf: #uname eq blindpig location cli-prefer-p_ip-10.69.1.2 p_ip-10.69.1.2 \ rule $id="cli-prefer-rule-p_ip-10.69.1.2" inf: #uname eq bigeye location cli-prefer-p_vd-backuppc-ha p_vd-backuppc-ha \ rule $id="cli-prefer-rule-p_vd-backuppc-ha" inf: #uname eq blindpig location cli-prefer-p_vd-dogfish-ha p_vd-dogfish-ha \ rule $id="cli-prefer-rule-p_vd-dogfish-ha" inf: #uname eq blindpig location cli-prefer-p_vd-hopmon-ha p_vd-hopmon-ha \ rule $id="cli-prefer-rule-p_vd-hopmon-ha" inf: #uname eq bigeye location cli-prefer-p_vd-hoptical-ha p_vd-hoptical-ha \ rule $id="cli-prefer-rule-p_vd-hoptical-ha" inf: #uname eq bigeye location cli-prefer-p_vd-hopvpn-ha p_vd-hopvpn-ha \ rule $id="cli-prefer-rule-p_vd-hopvpn-ha" inf: #uname eq bigeye location cli-prefer-p_vd-musicbrainz-ha p_vd-musicbrainz-ha \ rule $id="cli-prefer-rule-p_vd-musicbrainz-ha" inf: #uname eq blindpig location cli-prefer-p_vd-spacewalk-ha p_vd-spacewalk-ha \ rule $id="cli-prefer-rule-p_vd-spacewalk-ha" inf: #uname eq blindpig location cli-prefer-p_vd-vhosts-ha p_vd-vhosts-ha \ rule $id="cli-prefer-rule-p_vd-vhosts-ha" inf: #uname eq bigeye location cli-prefer-p_vd-win7-ha p_vd-win7-ha \ rule $id="cli-prefer-rule-p_vd-win7-ha" inf: #uname eq blindpig location drbd_backuppc_excl ms_drbd-backuppc \ rule $id="drbd_backuppc_excl-rule" -inf: #uname eq blindpig2 location drbd_dogfish-ha_excl ms_drbd-dogfish-ha \ rule $id="drbd_dogfish-ha_excl-rule" -inf: #uname eq blindpig2 location drbd_hopmon_excl ms_drbd-hopmon \ rule $id="drbd_hopmon_excl-rule" -inf: #uname eq blindpig2 location drbd_hoptical_excl ms_drbd-hoptical \ rule $id="drbd_hoptical_excl-rule" -inf: #uname eq blindpig2 location drbd_hopvpn_excl ms_drbd-hopvpn \ rule $id="drbd_hopvpn_excl-rule" -inf: #uname eq blindpig2 location drbd_musicbrainz_excl ms_drbd-musicbrainz \ rule $id="drbd_musicbrainz_excl-rule" -inf: #uname eq blindpig2 location drbd_vhost_excl ms_drbd-vhosts \ rule $id="drbd_vhosts_excl-rule" -inf: #uname eq blindpig2 colocation c_gfs-vz-config_on_master inf: clone_gfs2-vz-config ms_drbd-vz:Master colocation c_gfs-vz-storage_on_master inf: clone_gfs2-vz-storage ms_drbd-vz:Master colocation c_lvm-backuppc_on_drbd-backuppc inf: clone_lvm-backuppc ms_drbd-backuppc:Master colocation c_lvm-dogfish-ha_on_drbd-dogfish-ha inf: clone_lvm-dogfish-ha ms_drbd-dogfish-ha:Master colocation c_lvm-hopmon_on_drbd-hopmon inf: clone_lvm-hopmon ms_drbd-hopmon:Master colocation c_lvm-hoptical_on_drbd-hoptical inf: clone_lvm-hoptical ms_drbd-hoptical:Master colocation c_lvm-hopvpn_on_drbd-hopvpn inf: clone_lvm-hopvpn ms_drbd-hopvpn:Master colocation c_lvm-musicbrainz_on_drbd-musicbrainz inf: clone_lvm-musicbrainz ms_drbd-musicbrainz:Master colocation c_lvm-spacewalk_on_drbd-spacewalk inf: clone_lvm-spacewalk ms_drbd-spacewalk:Master colocation c_lvm-vhosts_on_drbd-vhosts inf: clone_lvm-vhosts ms_drbd-vhosts:Master colocation c_lvm-vz_on_drbd-vz inf: clone_lvm-vz ms_drbd-vz:Master colocation c_lvm-win7_on_drbd-win7 inf: clone_lvm-win7 ms_drbd-win7:Master colocation c_vd-backuppc-on-master inf: p_vd-backuppc-ha ms_drbd-backuppc:Master colocation c_vd-dogfish-ha-on-master inf: p_vd-dogfish-ha ms_drbd-dogfish-ha:Master colocation c_vd-hopmon-on-master inf: p_vd-hopmon-ha ms_drbd-hopmon:Master colocation c_vd-hoptical-on-master inf: p_vd-hoptical-ha ms_drbd-hoptical:Master colocation c_vd-hopvpn-on-master inf: p_vd-hopvpn-ha ms_drbd-hopvpn:Master colocation c_vd-musicbrainz-on-master inf: p_vd-musicbrainz-ha ms_drbd-musicbrainz:Master colocation c_vd-spacewalk-on-master inf: p_vd-spacewalk-ha ms_drbd-spacewalk:Master colocation c_vd-vhosts-on-master inf: p_vd-vhosts-ha ms_drbd-vhosts:Master colocation c_vd-win7-on-master inf: p_vd-win7-ha ms_drbd-win7:Master order o_drbm-lvm-gfs2-vz-config-storage inf: ms_drbd-vz:promote clone_lvm-vz:start clone_gfs2-vz-config:start clone_gfs2-vz-storage:start order o_drbm-lvm-vd-start-backuppc inf: ms_drbd-backuppc:promote clone_lvm-backuppc:start p_vd-backuppc-ha:start order o_drbm-lvm-vd-start-dogfish-ha inf: ms_drbd-dogfish-ha:promote clone_lvm-dogfish-ha:start p_vd-dogfish-ha:start order o_drbm-lvm-vd-start-hopmon inf: ms_drbd-hopmon:promote clone_lvm-hopmon:start p_vd-hopmon-ha:start order o_drbm-lvm-vd-start-hoptical inf: ms_drbd-hoptical:promote clone_lvm-hoptical:start p_vd-hoptical-ha:start order o_drbm-lvm-vd-start-hopvpn inf: ms_drbd-hopvpn:promote clone_lvm-hopvpn:start p_vd-hopvpn-ha:start order o_drbm-lvm-vd-start-musicbrainz inf: ms_drbd-musicbrainz:promote clone_lvm-musicbrainz:start p_vd-musicbrainz-ha:start order o_drbm-lvm-vd-start-spacewalk inf: ms_drbd-spacewalk:promote clone_lvm-spacewalk:start p_vd-spacewalk-ha:start order o_drbm-lvm-vd-start-vhosts inf: ms_drbd-vhosts:promote clone_lvm-vhosts:start p_vd-vhosts-ha:start order o_drbm-lvm-vd-start-win7 inf: ms_drbd-win7:promote clone_lvm-win7:start p_vd-win7-ha:start order o_gfs_before_openvz inf: _rsc_set_ clone_gfs2-vz-config clone_gfs2-vz-storage property $id="cib-bootstrap-options" \ dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \ cluster-infrastructure="cman" \ expected-quorum-votes="2" \ stonith-enabled="true" \ no-quorum-policy="ignore" \ default-resource-stickiness="1" \ last-lrm-refresh="1362432862" \ maintenance-mode="off" rsc_defaults $id="rsc-options" \ resource-stickiness="1" \ failure-timeout="60s"