|
rpm-build |
3ee90c |
:compat-mode: legacy
|
|
rpm-build |
3ee90c |
= Configure Fencing =
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
== What is Fencing? ==
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Fencing protects your data from being corrupted, and your application from
|
|
rpm-build |
3ee90c |
becoming unavailable, due to unintended concurrent access by rogue nodes.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Just because a node is unresponsive doesn't mean it has stopped
|
|
rpm-build |
3ee90c |
accessing your data. The only way to be 100% sure that your data is
|
|
rpm-build |
3ee90c |
safe, is to use fencing to ensure that the node is truly
|
|
rpm-build |
3ee90c |
offline before allowing the data to be accessed from another node.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Fencing also has a role to play in the event that a clustered service
|
|
rpm-build |
3ee90c |
cannot be stopped. In this case, the cluster uses fencing to force the
|
|
rpm-build |
3ee90c |
whole node offline, thereby making it safe to start the service
|
|
rpm-build |
3ee90c |
elsewhere.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Fencing is also known as STONITH, an acronym for "Shoot The Other Node In The
|
|
rpm-build |
3ee90c |
Head", since the most popular form of fencing is cutting a host's power.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
In order to guarantee the safety of your data,
|
|
rpm-build |
3ee90c |
footnote:[If the data is corrupt, there is little point in continuing to make it available]
|
|
rpm-build |
3ee90c |
fencing is enabled by default.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
[NOTE]
|
|
rpm-build |
3ee90c |
====
|
|
rpm-build |
3ee90c |
It is possible to tell the cluster not to use fencing, by setting the
|
|
rpm-build |
3ee90c |
*stonith-enabled* cluster option to false:
|
|
rpm-build |
3ee90c |
----
|
|
rpm-build |
3ee90c |
[root@pcmk-1 ~]# pcs property set stonith-enabled=false
|
|
rpm-build |
3ee90c |
[root@pcmk-1 ~]# crm_verify -L
|
|
rpm-build |
3ee90c |
----
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
However, this is completely inappropriate for a production cluster. It tells
|
|
rpm-build |
3ee90c |
the cluster to simply pretend that failed nodes are safely powered off. Some
|
|
rpm-build |
3ee90c |
vendors will refuse to support clusters that have fencing disabled. Even
|
|
rpm-build |
3ee90c |
disabling it for a test cluster means you won't be able to test real failure
|
|
rpm-build |
3ee90c |
scenarios.
|
|
rpm-build |
3ee90c |
====
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
== Choose a Fence Device ==
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
The two broad categories of fence device are power fencing, which cuts off
|
|
rpm-build |
3ee90c |
power to the target, and fabric fencing, which cuts off the target's access to
|
|
rpm-build |
3ee90c |
some critical resource, such as a shared disk or access to the local network.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Power fencing devices include:
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
* Intelligent power switches
|
|
rpm-build |
3ee90c |
* IPMI
|
|
rpm-build |
3ee90c |
* Hardware watchdog device (alone, or in combination with shared storage used
|
|
rpm-build |
3ee90c |
as a "poison pill" mechanism)
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Fabric fencing devices include:
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
* Shared storage that can be cut off for a target host by another host (for
|
|
rpm-build |
3ee90c |
example, an external storage device that supports SCSI-3 persistent
|
|
rpm-build |
3ee90c |
reservations)
|
|
rpm-build |
3ee90c |
* Intelligent network switches
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Using IPMI as a power fencing device may seem like a good choice. However,
|
|
rpm-build |
3ee90c |
if the IPMI shares power and/or network access with the host (such as most
|
|
rpm-build |
3ee90c |
onboard IPMI controllers), a power or network failure will cause both the
|
|
rpm-build |
3ee90c |
host and its fencing device to fail. The cluster will be unable to recover,
|
|
rpm-build |
3ee90c |
and must stop all resources to avoid a possible split-brain situation.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Likewise, any device that relies on the machine being active (such as
|
|
rpm-build |
3ee90c |
SSH-based "devices" sometimes used during testing) is inappropriate,
|
|
rpm-build |
3ee90c |
because fencing will be required when the node is completely unresponsive.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
== Configure the Cluster for Fencing ==
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
. Install the fence agent(s). To see what packages are available, run `yum
|
|
rpm-build |
3ee90c |
search fence-`. Be sure to install the package(s) on all cluster nodes.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
. Configure the fence device itself to be able to fence your nodes and accept
|
|
rpm-build |
3ee90c |
fencing requests. This includes any necessary configuration on the device and
|
|
rpm-build |
3ee90c |
on the nodes, and any firewall or SELinux changes needed. Test the
|
|
rpm-build |
3ee90c |
communication between the device and your nodes.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
. Find the name of the correct fence agent: `pcs stonith list`
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
. Find the parameters associated with the device:
|
|
rpm-build |
3ee90c |
+pcs stonith describe pass:[<replaceable>agent_name</replaceable>]+
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
. Create a local copy of the CIB: `pcs cluster cib stonith_cfg`
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
. Create the fencing resource: +pcs -f stonith_cfg stonith create pass:[<replaceable>stonith_id
|
|
rpm-build |
3ee90c |
stonith_device_type [stonith_device_options]</replaceable>]+
|
|
rpm-build |
3ee90c |
+
|
|
rpm-build |
3ee90c |
Any flags that do not take arguments, such as +--ssl+, should be passed as +ssl=1+.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
. Enable fencing in the cluster: `pcs -f stonith_cfg property set stonith-enabled=true`
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
. If the device does not know how to fence nodes based on their cluster node
|
|
rpm-build |
3ee90c |
name, you may also need to set the special *pcmk_host_map* parameter. See
|
|
rpm-build |
3ee90c |
`man pacemaker-fenced` for details.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
. If the device does not support the *list* command, you may also need
|
|
rpm-build |
3ee90c |
to set the special *pcmk_host_list* and/or *pcmk_host_check*
|
|
rpm-build |
3ee90c |
parameters. See `man pacemaker-fenced` for details.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
. If the device does not expect the victim to be specified with the
|
|
rpm-build |
3ee90c |
*port* parameter, you may also need to set the special
|
|
rpm-build |
3ee90c |
*pcmk_host_argument* parameter. See `man pacemaker-fenced` for details.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
. Commit the new configuration: `pcs cluster cib-push stonith_cfg`
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
. Once the fence device resource is running, test it (you might want to stop
|
|
rpm-build |
3ee90c |
the cluster on that machine first):
|
|
rpm-build |
3ee90c |
+stonith_admin --reboot pass:[<replaceable>nodename</replaceable>]+
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
== Example ==
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
For this example, assume we have a chassis containing four nodes
|
|
rpm-build |
3ee90c |
and a separately powered IPMI device active on 10.0.0.1. Following the steps
|
|
rpm-build |
3ee90c |
above would go something like this:
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Step 1: Install the *fence-agents-ipmilan* package on both nodes.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Step 2: Configure the IP address, authentication credentials, etc. in the IPMI device itself.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Step 3: Choose the *fence_ipmilan* STONITH agent.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Step 4: Obtain the agent's possible parameters:
|
|
rpm-build |
3ee90c |
----
|
|
rpm-build |
3ee90c |
[root@pcmk-1 ~]# pcs stonith describe fence_ipmilan
|
|
rpm-build |
3ee90c |
fence_ipmilan - Fence agent for IPMI
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
fence_ipmilan is an I/O Fencing agentwhich can be used with machines controlled by IPMI.This agent calls support software ipmitool (http://ipmitool.sf.net/). WARNING! This fence agent might report success before the node is powered off. You should use -m/method onoff if your fence device works correctly with that option.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Stonith options:
|
|
rpm-build |
3ee90c |
ipport: TCP/UDP port to use for connection with device
|
|
rpm-build |
3ee90c |
hexadecimal_kg: Hexadecimal-encoded Kg key for IPMIv2 authentication
|
|
rpm-build |
3ee90c |
port: IP address or hostname of fencing device (together with --port-as-ip)
|
|
rpm-build |
3ee90c |
inet6_only: Forces agent to use IPv6 addresses only
|
|
rpm-build |
3ee90c |
ipaddr: IP Address or Hostname
|
|
rpm-build |
3ee90c |
passwd_script: Script to retrieve password
|
|
rpm-build |
3ee90c |
method: Method to fence (onoff|cycle)
|
|
rpm-build |
3ee90c |
inet4_only: Forces agent to use IPv4 addresses only
|
|
rpm-build |
3ee90c |
passwd: Login password or passphrase
|
|
rpm-build |
3ee90c |
lanplus: Use Lanplus to improve security of connection
|
|
rpm-build |
3ee90c |
auth: IPMI Lan Auth type.
|
|
rpm-build |
3ee90c |
cipher: Ciphersuite to use (same as ipmitool -C parameter)
|
|
rpm-build |
3ee90c |
target: Bridge IPMI requests to the remote target address
|
|
rpm-build |
3ee90c |
privlvl: Privilege level on IPMI device
|
|
rpm-build |
3ee90c |
timeout: Timeout (sec) for IPMI operation
|
|
rpm-build |
3ee90c |
login: Login Name
|
|
rpm-build |
3ee90c |
verbose: Verbose mode
|
|
rpm-build |
3ee90c |
debug: Write debug information to given file
|
|
rpm-build |
3ee90c |
power_wait: Wait X seconds after issuing ON/OFF
|
|
rpm-build |
3ee90c |
login_timeout: Wait X seconds for cmd prompt after login
|
|
rpm-build |
3ee90c |
delay: Wait X seconds before fencing is started
|
|
rpm-build |
3ee90c |
power_timeout: Test X seconds for status change after ON/OFF
|
|
rpm-build |
3ee90c |
ipmitool_path: Path to ipmitool binary
|
|
rpm-build |
3ee90c |
shell_timeout: Wait X seconds for cmd prompt after issuing command
|
|
rpm-build |
3ee90c |
port_as_ip: Make "port/plug" to be an alias to IP address
|
|
rpm-build |
3ee90c |
retry_on: Count of attempts to retry power on
|
|
rpm-build |
3ee90c |
sudo: Use sudo (without password) when calling 3rd party sotfware.
|
|
rpm-build |
3ee90c |
priority: The priority of the stonith resource. Devices are tried in order of highest priority to lowest.
|
|
rpm-build |
3ee90c |
pcmk_host_map: A mapping of host names to ports numbers for devices that do not support host names. Eg. node1:1;node2:2,3 would tell the cluster to use port 1 for node1 and ports 2 and
|
|
rpm-build |
3ee90c |
3 for node2
|
|
rpm-build |
3ee90c |
pcmk_host_list: A list of machines controlled by this device (Optional unless pcmk_host_check=static-list).
|
|
rpm-build |
3ee90c |
pcmk_host_check: How to determine which machines are controlled by the device. Allowed values: dynamic-list (query the device), static-list (check the pcmk_host_list attribute), none
|
|
rpm-build |
3ee90c |
(assume every device can fence every machine)
|
|
rpm-build |
3ee90c |
pcmk_delay_max: Enable a random delay for stonith actions and specify the maximum of random delay. This prevents double fencing when using slow devices such as sbd. Use this to enable a
|
|
rpm-build |
3ee90c |
random delay for stonith actions. The overall delay is derived from this random delay value adding a static delay so that the sum is kept below the maximum delay.
|
|
rpm-build |
3ee90c |
pcmk_delay_base: Enable a base delay for stonith actions and specify base delay value. This prevents double fencing when different delays are configured on the nodes. Use this to enable
|
|
rpm-build |
3ee90c |
a static delay for stonith actions. The overall delay is derived from a random delay value adding this static delay so that the sum is kept below the maximum delay.
|
|
rpm-build |
3ee90c |
pcmk_action_limit: The maximum number of actions can be performed in parallel on this device Pengine property concurrent-fencing=true needs to be configured first. Then use this to
|
|
rpm-build |
3ee90c |
specify the maximum number of actions can be performed in parallel on this device. -1 is unlimited.
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Default operations:
|
|
rpm-build |
3ee90c |
monitor: interval=60s
|
|
rpm-build |
3ee90c |
----
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Step 5: `pcs cluster cib stonith_cfg`
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Step 6: Here are example parameters for creating our fence device resource:
|
|
rpm-build |
3ee90c |
----
|
|
rpm-build |
3ee90c |
[root@pcmk-1 ~]# pcs -f stonith_cfg stonith create ipmi-fencing fence_ipmilan \
|
|
rpm-build |
3ee90c |
pcmk_host_list="pcmk-1 pcmk-2" ipaddr=10.0.0.1 login=testuser \
|
|
rpm-build |
3ee90c |
passwd=acd123 op monitor interval=60s
|
|
rpm-build |
3ee90c |
[root@pcmk-1 ~]# pcs -f stonith_cfg stonith
|
|
rpm-build |
3ee90c |
ipmi-fencing (stonith:fence_ipmilan): Stopped
|
|
rpm-build |
3ee90c |
----
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Steps 7-10: Enable fencing in the cluster:
|
|
rpm-build |
3ee90c |
----
|
|
rpm-build |
3ee90c |
[root@pcmk-1 ~]# pcs -f stonith_cfg property set stonith-enabled=true
|
|
rpm-build |
3ee90c |
[root@pcmk-1 ~]# pcs -f stonith_cfg property
|
|
rpm-build |
3ee90c |
Cluster Properties:
|
|
rpm-build |
3ee90c |
cluster-infrastructure: corosync
|
|
rpm-build |
3ee90c |
cluster-name: mycluster
|
|
rpm-build |
3ee90c |
dc-version: 1.1.18-11.el7_5.3-2b07d5c5a9
|
|
rpm-build |
3ee90c |
have-watchdog: false
|
|
rpm-build |
3ee90c |
stonith-enabled: true
|
|
rpm-build |
3ee90c |
----
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Step 11: `pcs cluster cib-push stonith_cfg --config`
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
Step 12: Test:
|
|
rpm-build |
3ee90c |
----
|
|
rpm-build |
3ee90c |
[root@pcmk-1 ~]# pcs cluster stop pcmk-2
|
|
rpm-build |
3ee90c |
[root@pcmk-1 ~]# stonith_admin --reboot pcmk-2
|
|
rpm-build |
3ee90c |
----
|
|
rpm-build |
3ee90c |
|
|
rpm-build |
3ee90c |
After a successful test, login to any rebooted nodes, and start the cluster
|
|
rpm-build |
3ee90c |
(with `pcs cluster start`).
|