Cluster Cheat Sheet
Config files
- /etc/corosync/corosync.conf – config file for corosync cluster membership and quorum
- /var/lib/pacemaker/crm/cib.xml – config file for cluster nodes and resources
Log files
- /var/log/cluster/corosync.log
- /var/log/pacemaker.log
- /var/log/pcsd/pcsd.log
- /var/log/messages – used for some other services including crmd and pengine etc.
Resources and Resource Groups
A cluster resource refers to any object or service which is managed by the Pacemaker cluster.
A number of different resources are defined by Pacemaker:
Primitive | this is the basic resource managed by the cluster | |
Clone | a resource which can run on multiple nodes simultaneously | |
MultiStake or Master/Slave | a resource in which one instance serves as master and the other as slave. A common example of this is DRBD | |
Resource Group | this is a set of primitives or clone which is used to group resources together for easier admin | |
Resource Classes | OCF or Open Cluster Framework | this is the most commonly used resource class for Pacemaker clusters |
Service | used for implementing systemd, upstart, and lsb commands | |
Systemd | used for systemd commands | |
Fencing | used for Stonith fencing resources | |
Nagios | used for Nagios plugins | |
LSB or Linux Standard Base | these are for the older Linux init script operations. Now deprecated | |
Resource stickiness | this refers to running a resource on the same cluster node even after some problem occurs with the node which is later rectified. This is advised since migrating resources to other nodes should generally be avoided |
Constraints
A set of rules that sets out how resources or resource groups should be started.
Constraint Types:
Location | defines on which node a resource should run – or not run, if the priority is set to minus -INFINITY |
Colocation | defines which resources should be started together – or not started together in the case of -INFINITY |
Order | defines in which order resources should be started. This is to allow for pre-conditional services to be started first |
Resource Order Priority Scores
These are used with the constraint types above.
The priority score can be set to a value between -1,000,000 (-INFINITY = the event will never happen) right up to INFINITY (1,000,000 = the event must happen).
Any negative priority score will prevent the resource from running.
Cluster Admin Commands
On RedHat Pacemaker Clusters, the pcs command is used to manage the cluster. pcs stands for “Pacemaker Configuration System”:
pcs status – View cluster status.
pcs config – View and manage cluster configuration. pcs cluster – Configure cluster options and nodes. pcs resource – Manage cluster resources. pcs stonith – Manage fence devices. pcs constraint – Manage resource constraints. pcs property – Manage pacemaker properties. pcs node – Manage cluster nodes. pcs quorum – Manage cluster quorum settings. pcs alert – Manage pacemaker alerts. pcs pcsd – Manage pcs daemon. pcs acl – Manage pacemaker access control lists.
Installation
To install packages:
yum install pcs -y
yum install fence-agents-all -y echo CHANGE_ME | passwd –stdin hacluster systemctl start pcsd systemctl enable pcsd
Authenticate new nodes
pcs cluster auth \
node1.example.com node2.example.com node3.example.com Username: hacluster Password: node1.example.com: Authorized node2.example.com: Authorized node3.example.com: Authorized
create and start a new cluster
pcs cluster setup <option> <member> …
eg
pcs cluster setup –start –enable –name mycluster \
node1.example.com node2.example.com node3.example.com
To enable cluster services to start on reboot:
pcs cluster enable –all
To enable cluster service on a specific node[s]:
pcs cluster enable [–all] [node] […]
To disable cluster services on a node[s]:
pcs cluster disable [–all] [node] […]
To display cluster status:
pcs status pcs config pcs cluster status pcs quorum status pcs resource show crm_verify -L -V crm_mon – this is used as equivalent for the crmsh/crmd version of Pacemaker
To delete a cluster:
pcs cluster destroy <cluster>
To start/stop a cluster:
pcs cluster start –all pcs cluster stop –all
To start/stop a cluster node:
pcs cluster start <node> pcs cluster stop <node>
To carry out mantainance on a specific node:
pcs cluster standby <node>
Then to restore the node to the cluster service:
pcs cluster unstandby <node>
To switch a node to standby mode:
pcs cluster standby <node1>
To restore a node from standby mode:
pcs cluster unstandby <node1>
To set a cluster property
pcs property set <property>=<value>
To disable stonith fencing: NOTE: you should usually not do this on a live production cluster!
pcs property set stonith-enabled=false
To reenable the stonith fencing:
pcs property set stonith-enabled=true
To configure firewalling for the cluster:
firewall-cmd –permanent –add-service=high-availability
firewall-cmd –reload
To add a node to the cluster:
check hacluster user and password
systemctl status pcsd
Then on an active node:
pcs cluster auth node4.example.com pcs cluster node add node4.example.com
Then, on the new node:
pcs cluster start pcs cluster enable
To display the xml configuration
pcs cluster cib
To display current cluster status:
pcs status
To manage cluster resources:
pcs resource <tab>
To enable, disable and relocate resource groups:
pcs resource move <resource>
or alternatively with:
pcs resource relocate <resource>
to locate the resource back to its original node:
pcs resource clear <resource>
pcs contraint <type> <option>
To create a new resource:
pcs resource create <resource_name> <resource_type> <resource_options>
To create new resources, reference the appropriate resource agents or RAs.
To list ocf resource types (example below with ocf:heartbeat)
pcs resource list heartbeat ocf:heartbeat:IPaddr2 ocf:heartbeat:LVM ocf:heartbeat:Filesystem ocf:heartbeat:oracle ocf:heartbeat:apache options detail of a resource type or agent: pcs resource describe <resource_type> pcs resource describe ocf:heartbeat:IPaddr2 pcs resource create vip_cluster ocf:heartbeat:IPaddr2 ip=192.168.125.10 –group myservices pcs resource create apache-ip ocf:heartbeat:IPaddr2 ip=192.168.125.20 cidr_netmask=24
To display a resource:
pcs resource show
Cluster Troubleshooting
Logging functions:
journalctl
tail -f /var/log/messages tail -f /var/log/cluster/corosync.log
Debug information commands:
pcs resource debug-start <resource> pcs resource debug-stop <resource> pcs resource debug-monitor <resource> pcs resource failcount show <resource>
To update a resource after modification:
pcs resource update <resource> <options>
To reset the failcount:
pcs resource cleanup <resource>
To remove a resource from a node:
pcs resource move <resource> [ <node> ]
To start a resource or a resource group:
pcs resource enable <resource>
To stop a resource or resource group:
pcs resource disable <resource>
To create a resource group and add a new resource:
pcs resource create <resource_name> <resource_type> <resource_options> –group <group>
To delete a resource:
pcs resource delete <resource>
To add a resource to a group:
pcs resource group add <group> <resource> pcs resource group list pcs resource list
To add a constraint to a resource group:
pcs constraint colocation add apache-group with ftp-group -100000 pcs constraint order apache-group then ftp-group
To reset a constraint for a resource or a resource group:
pcs resource clear <resource>
To list resource agent (RA) classes:
pcs resource standards
To list available RAs:
pcs resource agents ocf | service | stonith
To list specific resource agents of a specific RA provider:
pcs resource agents ocf:pacemaker
To list RA information:
pcs resource describe RA pcs resource describe ocf:heartbeat:RA
To create a resource:
pcs resource create ClusterIP IPaddr2 ip=192.168.100.125 cidr_netmask=24 params ip=192.168.125.100 cidr_netmask=32 op monitor interval=60s
To delete a resource:
pcs resource delete resourceid
To display a resource (example with ClusterIP):
pcs resource show ClusterIP
To start a resource:
pcs resource enable ClusterIP
To stop a resource:
pcs resource disable ClusterIP
To remove a resource:
pcs resource delete ClusterIP
To modify a resource:
pcs resource update ClusterIP clusterip_hash=sourceip
To delete parameters for a resource (resource specific, here for ClusterIP):
pcs resource update ClusterIP ip=192.168.100.25
To list the current resource defaults:
pcs resource rsc default
To set resource defaults:
pcs resource rsc defaults resource-stickiness=100
To list current operation defaults:
pcs resource op defaults
To set operation defaults:
pcs resource op defaults timeout=240s
To set colocation:
pcs constraint colocation add ClusterIP with WebSite INFINITY
To set colocation with roles:
pcs constraint colocation add Started AnotherIP with Master WebSite INFINITY
To set constraint ordering:
pcs constraint order ClusterIP then WebSite
To display constraint list:
pcs constraint list –full
To show a resource failure count:
pcs resource failcount show RA
To reset a resource failure count:
pcs resource failcount reset RA
To create a resource clone:
pcs resource clone ClusterIP globally-unique=true clone-max=2 clone-node-max=2
To manage a resource:
pcs resource manage RA
To unmanage a resource:
pcs resource unmanage RA
Fencing (Stonith)
ipmitool -H rh7-node1-irmc -U admin -P password power on fence_ipmilan –ip=rh7-node1-irmc.localdomain –username=admin –password=password –action=status Status: ON pcs stonith pcs stonith describe fence_ipmilan pcs stonith create ipmi-fencing1 fence_ipmilan \ pcmk_host_list=”rh7-node1.localdomain” \ ipaddr=192.168.100.125 \ login=admin passwd=password \ op monitor interval=60s pcs property set stonith-enabled=true pcs stonith fence pcmk-2 stonith_admin –reboot pcmk-2
To display fencing resources:
pcs stonith show
To display Stonith RA information:
pcs stonith describe fence_ipmilan
To list available fencing agents:
pcs stonith list
To add a filter to list available resource agents for Stonith:
pcs stonith list <string>
To setup properties for Stonith:
pcs property set no-quorum-policy=ignore pcs property set stonith-action=poweroff # default is reboot
To create a fencing device:
pcs stonith create stonith-rsa-node1 fence_rsa action=off ipaddr=”node1_rsa” login=<user> passwd=<pass> pcmk_host_list=node1 secure=true
To display fencing devices:
pcs stonith show
To fence a node off from the rest of the cluster:
pcs stonith fence <node>
To modify a fencing device:
pcs stonith update stonithid [options]
To display fencing device options:
pcs stonith describe <stonith_ra>
To delete a fencing device:
pcs stonith delete stonithd