:compat-mode: legacy
= Start and Verify Cluster =
== Start the Cluster ==
Now that corosync is configured, it is time to start the cluster.
The command below will start corosync and pacemaker on both nodes
in the cluster. If you are issuing the start command from a different
node than the one you ran the `pcs cluster auth` command on earlier, you
must authenticate on the current node you are logged into before you will
be allowed to start the cluster.
----
[root@pcmk-1 ~]# pcs cluster start --all
pcmk-1: Starting Cluster...
pcmk-2: Starting Cluster...
----
[NOTE]
======
An alternative to using the `pcs cluster start --all` command
is to issue either of the below command sequences on each node in the
cluster separately:
----
# pcs cluster start
Starting Cluster...
----
or
----
# systemctl start corosync.service
# systemctl start pacemaker.service
----
======
[IMPORTANT]
====
In this example, we are not enabling the corosync and pacemaker services
to start at boot. If a cluster node fails or is rebooted, you will need to run
+pcs cluster start pass:[<replaceable>nodename</replaceable>]+ (or `--all`) to start the cluster on it.
While you could enable the services to start at boot, requiring a manual
start of cluster services gives you the opportunity to do a post-mortem investigation
of a node failure before returning it to the cluster.
====
== Verify Corosync Installation ==
First, use `corosync-cfgtool` to check whether cluster communication is happy:
----
[root@pcmk-1 ~]# corosync-cfgtool -s
Printing ring status.
Local node ID 1
RING ID 0
id = 192.168.122.101
status = ring 0 active with no faults
----
We can see here that everything appears normal with our fixed IP
address (not a 127.0.0.x loopback address) listed as the *id*, and *no
faults* for the status.
If you see something different, you might want to start by checking
the node's network, firewall and SELinux configurations.
Next, check the membership and quorum APIs:
----
[root@pcmk-1 ~]# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.122.101)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.122.102)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
[root@pcmk-1 ~]# pcs status corosync
Membership information
\----------------------
Nodeid Votes Name
1 1 pcmk-1 (local)
2 1 pcmk-2
----
You should see both nodes have joined the cluster.
== Verify Pacemaker Installation ==
Now that we have confirmed that Corosync is functional, we can check
the rest of the stack. Pacemaker has already been started, so verify
the necessary processes are running:
----
[root@pcmk-1 ~]# ps axf
PID TTY STAT TIME COMMAND
2 ? S 0:00 [kthreadd]
...lots of processes...
11635 ? SLsl 0:03 corosync
11642 ? Ss 0:00 /usr/sbin/pacemakerd -f
11643 ? Ss 0:00 \_ /usr/libexec/pacemaker/cib
11644 ? Ss 0:00 \_ /usr/libexec/pacemaker/stonithd
11645 ? Ss 0:00 \_ /usr/libexec/pacemaker/lrmd
11646 ? Ss 0:00 \_ /usr/libexec/pacemaker/attrd
11647 ? Ss 0:00 \_ /usr/libexec/pacemaker/pengine
11648 ? Ss 0:00 \_ /usr/libexec/pacemaker/crmd
----
If that looks OK, check the `pcs status` output:
----
[root@pcmk-1 ~]# pcs status
Cluster name: mycluster
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: pcmk-2 (version 1.1.18-11.el7_5.3-2b07d5c5a9) - partition with quorum
Last updated: Mon Sep 10 16:37:34 2018
Last change: Mon Sep 10 16:30:53 2018 by hacluster via crmd on pcmk-2
2 nodes configured
0 resources configured
Online: [ pcmk-1 pcmk-2 ]
No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
----
Finally, ensure there are no start-up errors from corosync or pacemaker (aside
from messages relating to not having STONITH configured, which are OK at this
point):
----
[root@pcmk-1 ~]# journalctl -b | grep -i error
----
[NOTE]
======
Other operating systems may report startup errors in other locations,
for example +/var/log/messages+.
======
Repeat these checks on the other node. The results should be the same.
== Explore the Existing Configuration ==
For those who are not of afraid of XML, you can see the raw cluster
configuration and status by using the `pcs cluster cib` command.
.The last XML you'll see in this document
======
----
[root@pcmk-1 ~]# pcs cluster cib
----
[source,XML]
----
<cib crm_feature_set="3.0.14" validate-with="pacemaker-2.10" epoch="5" num_updates="4" admin_epoch="0" cib-last-written="Mon Sep 10 16:30:53 2018" update-origin="pcmk-2" update-client="crmd" update-user="hacluster" have-quorum="1" dc-uuid="2">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-have-watchdog" name="have-watchdog" value="false"/>
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.18-11.el7_5.3-2b07d5c5a9"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="corosync"/>
<nvpair id="cib-bootstrap-options-cluster-name" name="cluster-name" value="mycluster"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="1" uname="pcmk-1"/>
<node id="2" uname="pcmk-2"/>
</nodes>
<resources/>
<constraints/>
</configuration>
<status>
<node_state id="1" uname="pcmk-1" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
<lrm id="1">
<lrm_resources/>
</lrm>
</node_state>
<node_state id="2" uname="pcmk-2" in_ccm="true" crmd="online" crm-debug-origin="do_state_transition" join="member" expected="member">
<lrm id="2">
<lrm_resources/>
</lrm>
</node_state>
</status>
</cib>
----
======
Before we make any changes, it's a good idea to check the validity of
the configuration.
----
[root@pcmk-1 ~]# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
----
As you can see, the tool has found some errors. The cluster will not start any
resources until we configure STONITH.