Clustered LVM on DRBD resource in Fedora Linux

(Crossposted from Mirantis Official Blog)

As Florian Haas has pointed out in my previous post’s comment, our shared storage configuration requires special precautions to avoid corruption of data when two hosts connected via DRBD try to manage LVM volumes simultaneously. Generally, these precautions concern locking LVM metadata operations while running DRBD in ‘dual-primary’ mode.

Let’s examine it in detail. The LVM locking mechanism is configured in the [global] section of /etc/lvm/lvm.conf. The ‘locking_type’ parameter is the most important here. It defines which locking LVM is used while changing metadata. It can be equal to:

  • ‘0’: disables locking completely – it’s dangerous to use;
  • ‘1’: default, local file-based locking. It knows nothing about the cluster and possible conflicting metadata changes;
  • ‘2’: uses an external shared library and is defined by the ‘locking_library’ parameter;
  • ‘3’: uses built-in LVM clustered locking;
  • ‘4’: read-only locking which forbids any changes of metadata.

The simplest way is to use local locking on one of the drbd peers and to disable metadata operations on another one. This has a serious drawback though: we won’t have our Volume Groups and Logical Volumes activated automatically upon creation on the other, ‘passive’ peer. The thing is that it’s not good for the production environment and cannot be automated easily.

But there is another, more sophisticated way. We can use the Linux-HA (Heartbeat) coupled with the LVM Resource Agent. It automates activation of the newly created LVM resources on the shared storage, but still provides no locking mechanism suitable for a ‘dual-primary’ DRBD operation.

It should be noted that full support of clustered locking for the LVM can be achieved by the lvm2-cluster Fedora RPM package stored in the repository. It contains the clvmd service which runs on all hosts in the cluster and controls LVM locking on shared storage. In this case, we have only 2 drbd-peers in the cluster.

clvmd requires a cluster engine in order to function properly. It’s provided by the cman service, installed as a dependency of the lvm2-cluster (other dependencies may vary from installation to installation):

(drbd-node1)# yum install clvmd
...
Dependencies Resolved

================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
lvm2-cluster x86_64 2.02.84-1.fc15 fedora 331 k
Installing for dependencies:
clusterlib x86_64 3.1.1-1.fc15 fedora 70 k
cman x86_64 3.1.1-1.fc15 fedora 364 k
fence-agents x86_64 3.1.4-1.fc15 updates 182 k
fence-virt x86_64 0.2.1-4.fc15 fedora 33 k
ipmitool x86_64 1.8.11-6.fc15 fedora 273 k
lm_sensors-libs x86_64 3.3.0-2.fc15 fedora 36 k
modcluster x86_64 0.18.7-1.fc15 fedora 187 k
net-snmp-libs x86_64 1:5.6.1-7.fc15 fedora 1.6 M
net-snmp-utils x86_64 1:5.6.1-7.fc15 fedora 180 k
oddjob x86_64 0.31-2.fc15 fedora 61 k
openais x86_64 1.1.4-2.fc15 fedora 190 k
openaislib x86_64 1.1.4-2.fc15 fedora 88 k
perl-Net-Telnet noarch 3.03-12.fc15 fedora 55 k
pexpect noarch 2.3-6.fc15 fedora 141 k
python-suds noarch 0.3.9-3.fc15 fedora 195 k
ricci x86_64 0.18.7-1.fc15 fedora 584 k
sg3_utils x86_64 1.29-3.fc15 fedora 465 k
sg3_utils-libs x86_64 1.29-3.fc15 fedora 54 k

 

Transaction Summary
================================================================================
Install 19 Package(s)

The only thing we need the cluster for is the use of clvmd; the configuration of cluster itself is pretty basic. Since we don’t need advanced features like automated fencing yet, we specify manual handling. As we have only 2 nodes in the cluster, we can tell cman about it. Configuration for cman resides in the /etc/cluster/cluster.conf file:

<?xml version="1.0"?\>
<cluster name="cluster" config_version="1"\>
  <!-- post_join_delay: number of seconds the daemon will wait before
        fencing any victims after a node joins the domain
  post_fail_delay: number of seconds the daemon will wait before
        fencing any victims after a domain member fails
  clean_start    : prevent any startup fencing the daemon might do.
        It indicates that the daemon should assume all nodes
        are in a clean state to start. --\>
  <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
  <clusternodes>
   <clusternode name="drbd-node1" votes="1" nodeid="1">
    <fence>
    <!-- Handle fencing manually -->
     <method name="human">
      <device name="human" nodename="drbd-node1"/>
     </method>
    </fence>
   </clusternode>
   <clusternode name="drbd-node2" votes="1" nodeid="2">
    <fence>
    <!-- Handle fencing manually -->
     <method name="human">
      <device name="human" nodename="drbd-node2"/>
     </method>
    </fence>
   </clusternode>
  </clusternodes>
  <!-- cman two nodes specification -->
  <cman expected_votes="1" two_node="1"/>
  <fencedevices>
  <!-- Define manual fencing -->
   <fencedevice name="human" agent="fence_manual"/>
  </fencedevices>
</cluster>

clusternode name should be a fully qualified domain name and should be resolved by DNS or be present in /etc/hosts. Number of votes is used to determine quorum of the cluster. In this case, we have two nodes, one vote per node, and expect one vote to make the cluster run (to have a quorum), as configured by cman expected attribute.

The second thing we need to configure is the cluster engine (corosync). Its configuration goes to /etc/corosync/corosync.conf:

compatibility: whitetank
totem {
  version: 2
  secauth: off
  threads: 0
  interface {
    ringnumber: 0
    bindnetaddr: 10.0.0.0
    mcastaddr: 226.94.1.1
    mcastport: 5405
  }
}
logging {
  fileline: off
  to_stderr: no
  to_logfile: yes
  to_syslog: yes
  # the pathname of the log file
  logfile: /var/log/cluster/corosync.log
  debug: off
  timestamp: on
logger_subsys {
  subsys: AMF
  debug: off
}
}
amf {
  mode: disabled
}

The bindinetaddr parameter must contain a network address. We configure corosync to work on eth1 interfaces, connecting our nodes back-to-back on 1Gbps network. Also, we should configure iptables to accept multicast traffic on both hosts.

It’s noteworthy that these configurations should be identical on both cluster nodes.

After the cluster has been prepared, we can change the LVM locking type in /etc/lvm/lvm.conf on both drbd-connected nodes:

global {
  ...
  locking_type = 3
  ...
}

Start cman and clvmd services on drbd-peers and get our cluster ready for the action:

(drbd-node1)# service cman start
Starting cluster:
Checking if cluster has been disabled at boot... [ OK ]
Checking Network Manager... [ OK ]
Global setup... [ OK ]
Loading kernel modules... [ OK ]
Mounting configfs... [ OK ]
Starting cman... [ OK ]
Waiting for quorum... [ OK ]
Starting fenced... [ OK ]
Starting dlm_controld... [ OK ]
Unfencing self... [ OK ]
Joining fence domain... [ OK ]
(drbd-node1)# service clvmd start
Starting clvmd:
Activating VG(s): 2 logical volume(s) in volume group "vg-sys" now active
2 logical volume(s) in volume group "vg_shared" now active
[ OK ]

Now, as we already have a Volume Group on the shared storage, we can easily make it cluster-aware:

(drbd-node1)# vgchange -c y vg_shared

Now we see the ‘c’ flag in VG Attributes:

(drbd-node1)# vgs
VG        #PV #LV #SN Attr    VSize   VFree
vg_shared   1   3   0 wz--nc  1.29t   1.04t
vg_sys      1   2   0 wz--n-  19.97g  5.97g

As a result, Logical Volumes created in the vg_shared volume group will be active on both nodes, and clustered locking is enabled for operations with volumes in this group. LVM commands can be issued on both hosts and clvmd takes care of possible concurrent metadata changes.

Tags:

Trackbacks/Pingbacks

  1.  Clustered LVM on DRBD resource in Fedora Linux « « Boston 2 Phoenix Boston 2 Phoenix
  2.  Clustered LVM on DRBD resource in Fedora Linux - Boston 2 Phoenix
  3.  Lineo’s Warp 2 boots to Fedora on Atom in 4 seconds, MPC Data’s SwiftBoot warms up embedded Linux in an instant

Leave a Reply

Your email address will not be published. Required fields are marked *