{"id":1042,"date":"2011-06-09T07:09:12","date_gmt":"2011-06-09T12:09:12","guid":{"rendered":"http:\/\/www.openstack.org\/blog\/?p=1042"},"modified":"2011-06-09T08:25:55","modified_gmt":"2011-06-09T13:25:55","slug":"clustered-lvm-on-drbd-resource-in-fedora-linux","status":"publish","type":"post","link":"https:\/\/www.openstack.org\/blog\/clustered-lvm-on-drbd-resource-in-fedora-linux\/","title":{"rendered":"Clustered LVM on DRBD resource in Fedora Linux"},"content":{"rendered":"<p class=\"lead\"><em>(Crossposted from <a href=\"http:\/\/mirantis.blogspot.com\/2011\/06\/clustered-lvm-on-drbd-resource-in.html\">Mirantis Official Blog<\/a>)<\/em><\/p>\n<p>As <a href=\"http:\/\/fghaas.wordpress.com\/\">Florian Haas<\/a> has <a href=\"http:\/\/mirantis.blogspot.com\/2011\/05\/shared-storage-for-openstack-based-on.html?showComment=1306528806951#c5028478804503884300\">pointed out<\/a> in my previous post&#8217;s comment, our shared storage configuration requires special precautions to avoid corruption of data when two hosts connected via DRBD try to manage LVM volumes simultaneously. Generally, these precautions concern locking LVM metadata operations while running DRBD in &#8216;dual-primary&#8217; mode.<\/p>\n<p>Let&#8217;s examine it in detail. The LVM locking mechanism is configured in the [global] section of <em>\/etc\/lvm\/lvm.conf<\/em>. The &#8216;locking_type&#8217; parameter is the most important  here. It defines which locking LVM is used while changing metadata. It can be equal to:<\/p>\n<ul>\n<li>&#8216;0&#8217;: disables locking completely &#8211; it&#8217;s dangerous to use;<\/li>\n<li>&#8216;1&#8217;: default, local file-based locking. It knows nothing about the cluster and possible conflicting metadata changes;<\/li>\n<li>&#8216;2&#8217;: uses an external shared library and is defined by the &#8216;locking_library&#8217; parameter;<\/li>\n<li>&#8216;3&#8217;: uses built-in LVM clustered locking;<\/li>\n<li>&#8216;4&#8217;: read-only locking which forbids any changes of metadata.<\/li>\n<\/ul>\n<p>The simplest way is to use local locking on one of the drbd peers and to disable metadata operations on another one. This has a serious drawback though: we won&#8217;t have our Volume Groups and Logical Volumes activated automatically upon creation on the other, &#8216;passive&#8217; peer. The thing is that it&#8217;s not good for the production environment and cannot be automated easily.<\/p>\n<p>But there is another, more sophisticated way. We can use the <a href=\"http:\/\/www.linux-ha.org\/doc\/users-guide\/users-guide.html\">Linux-HA<\/a> (Heartbeat) coupled with the <a href=\"http:\/\/linux-ha.org\/doc\/man-pages\/re-ra-LVM.html\">LVM Resource Agent<\/a>. It automates activation of the newly created LVM resources on the shared storage, but still provides no locking mechanism suitable for a &#8216;dual-primary&#8217; DRBD operation.<\/p>\n<p>It should be noted that full support of clustered locking for the LVM can be achieved by the <strong>lvm2-cluster<\/strong> Fedora RPM package stored in the repository. It contains the <strong>clvmd<\/strong> service which runs on all hosts in the cluster and controls LVM locking on shared storage. In this case, we have only 2 drbd-peers in the cluster.<\/p>\n<p><strong>clvmd<\/strong> requires a cluster engine in order to function properly. It&#8217;s provided by the <strong>cman<\/strong> service, installed as a dependency of the <strong>lvm2-cluster<\/strong> (other dependencies may vary from installation to installation):<\/p>\n<p><code>(drbd-node1)# yum install clvmd<br \/>\n...<br \/>\nDependencies Resolved<\/code><br \/>\n<code>================================================================================<br \/>\nPackage               Arch         Version                 Repository     Size<br \/>\n================================================================================<br \/>\nInstalling:<br \/>\nlvm2-cluster          x86_64       2.02.84-1.fc15          fedora        331 k<br \/>\nInstalling for dependencies:<br \/>\nclusterlib            x86_64       3.1.1-1.fc15            fedora         70 k<br \/>\ncman                  x86_64       3.1.1-1.fc15            fedora        364 k<br \/>\nfence-agents          x86_64       3.1.4-1.fc15            updates       182 k<br \/>\nfence-virt            x86_64       0.2.1-4.fc15            fedora         33 k<br \/>\nipmitool              x86_64       1.8.11-6.fc15           fedora        273 k<br \/>\nlm_sensors-libs       x86_64       3.3.0-2.fc15            fedora         36 k<br \/>\nmodcluster            x86_64       0.18.7-1.fc15           fedora        187 k<br \/>\nnet-snmp-libs         x86_64       1:5.6.1-7.fc15          fedora        1.6 M<br \/>\nnet-snmp-utils        x86_64       1:5.6.1-7.fc15          fedora        180 k<br \/>\noddjob                x86_64       0.31-2.fc15             fedora         61 k<br \/>\nopenais               x86_64       1.1.4-2.fc15            fedora        190 k<br \/>\nopenaislib            x86_64       1.1.4-2.fc15            fedora         88 k<br \/>\nperl-Net-Telnet       noarch       3.03-12.fc15            fedora         55 k<br \/>\npexpect               noarch       2.3-6.fc15              fedora        141 k<br \/>\npython-suds           noarch       0.3.9-3.fc15            fedora        195 k<br \/>\nricci                 x86_64       0.18.7-1.fc15           fedora        584 k<br \/>\nsg3_utils             x86_64       1.29-3.fc15             fedora        465 k<br \/>\nsg3_utils-libs        x86_64       1.29-3.fc15             fedora         54 k<\/code><\/p>\n<p><code> <\/code><\/p>\n<p>&nbsp;<\/p>\n<p><code>Transaction Summary<br \/>\n================================================================================<br \/>\nInstall      19 Package(s)<\/code><\/p>\n<p>The only thing we need the cluster for is the use of clvmd; the configuration of cluster itself is pretty basic. Since we don&#8217;t need advanced features like automated <a href=\"https:\/\/fedorahosted.org\/cluster\/wiki\/Fence\">fencing<\/a> yet, we specify manual handling. As we have only 2 nodes in the cluster, we can tell cman about it. Configuration for <strong>cman<\/strong> resides in the <em>\/etc\/cluster\/cluster.conf<\/em> file:<\/p>\n<p><code>&lt;?xml version=\"1.0\"?\\&gt;<br \/>\n&lt;cluster name=\"cluster\" config_version=\"1\"\\&gt;<br \/>\n&nbsp;&nbsp;&lt;!-- post_join_delay: number of seconds the daemon will wait before<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;fencing any victims after a node joins the domain<br \/>\n&nbsp;&nbsp;post_fail_delay: number of seconds the daemon will wait before<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;fencing any victims after a domain member fails<br \/>\n&nbsp;&nbsp;clean_start \u00a0 \u00a0: prevent any startup fencing the daemon might do.<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;It indicates that the daemon should assume all nodes<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;are in a clean state to start. --\\&gt;<br \/>\n&nbsp;&nbsp;&lt;fence_daemon clean_start=\"0\" post_fail_delay=\"0\" post_join_delay=\"3\"\/&gt;<br \/>\n&nbsp;&nbsp;&lt;clusternodes&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&lt;clusternode name=\"drbd-node1\" votes=\"1\" nodeid=\"1\"&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;fence&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;!-- Handle fencing manually --&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;method name=\"human\"&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;device name=\"human\" nodename=\"drbd-node1\"\/&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/method&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/fence&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&lt;\/clusternode&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&lt;clusternode name=\"drbd-node2\" votes=\"1\" nodeid=\"2\"&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;fence&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;!-- Handle fencing manually --&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;method name=\"human\"&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;device name=\"human\" nodename=\"drbd-node2\"\/&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/method&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;&lt;\/fence&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&lt;\/clusternode&gt;<br \/>\n&nbsp;&nbsp;&lt;\/clusternodes&gt;<br \/>\n&nbsp;&nbsp;&lt;!-- cman two nodes specification --&gt;<br \/>\n&nbsp;&nbsp;&lt;cman expected_votes=\"1\" two_node=\"1\"\/&gt;<br \/>\n&nbsp;&nbsp;&lt;fencedevices&gt;<br \/>\n&nbsp;&nbsp;&lt;!-- Define manual fencing --&gt;<br \/>\n&nbsp;&nbsp;&nbsp;&lt;fencedevice name=\"human\" agent=\"fence_manual\"\/&gt;<br \/>\n&nbsp;&nbsp;&lt;\/fencedevices&gt;<br \/>\n&lt;\/cluster&gt;<br \/>\n<\/code><\/p>\n<p><strong>clusternode name<\/strong> should be a fully qualified domain name and should be resolved by DNS or be present in <em>\/etc\/hosts<\/em>. Number of <strong>votes<\/strong> is used to determine <strong>quorum<\/strong> of the cluster. In this case, we have two nodes, one vote per node, and expect one vote to make the cluster run (to have a quorum), as configured by <strong>cman expected<\/strong> attribute.<\/p>\n<p>The second thing we need to configure is the cluster engine (<strong>corosync<\/strong>). Its configuration goes to <em>\/etc\/corosync\/corosync.conf<\/em>:<\/p>\n<p><code>compatibility: whitetank<br \/>\ntotem {<br \/>\n&nbsp;&nbsp;version: 2<br \/>\n&nbsp;&nbsp;secauth: off<br \/>\n&nbsp;&nbsp;threads: 0<br \/>\n&nbsp;&nbsp;interface {<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;ringnumber: 0<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;bindnetaddr: 10.0.0.0<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;mcastaddr: 226.94.1.1<br \/>\n&nbsp;&nbsp;&nbsp;&nbsp;mcastport: 5405<br \/>\n&nbsp;&nbsp;}<br \/>\n}<br \/>\nlogging {<br \/>\n&nbsp;&nbsp;fileline: off<br \/>\n&nbsp;&nbsp;to_stderr: no<br \/>\n&nbsp;&nbsp;to_logfile: yes<br \/>\n&nbsp;&nbsp;to_syslog: yes<br \/>\n&nbsp;&nbsp;# the pathname of the log file<br \/>\n&nbsp;&nbsp;logfile: \/var\/log\/cluster\/corosync.log<br \/>\n&nbsp;&nbsp;debug: off<br \/>\n&nbsp;&nbsp;timestamp: on<br \/>\nlogger_subsys {<br \/>\n&nbsp;&nbsp;subsys: AMF<br \/>\n&nbsp;&nbsp;debug: off<br \/>\n}<br \/>\n}<br \/>\namf {<br \/>\n&nbsp;&nbsp;mode: disabled<br \/>\n}<\/code><\/p>\n<p>The <strong>bindinetaddr<\/strong> parameter must contain a <em><strong>network<\/strong><\/em> address. We configure <strong>corosync<\/strong> to work on <strong>eth1<\/strong> interfaces, connecting our nodes back-to-back on 1Gbps network. Also, we should  configure <strong>iptables<\/strong> to accept multicast traffic on both hosts.<\/p>\n<p>It&#8217;s noteworthy that these configurations should be identical on both cluster nodes.<\/p>\n<p>After the cluster has been prepared, we can change the LVM locking type in <em>\/etc\/lvm\/lvm.conf<\/em> on both drbd-connected nodes:<\/p>\n<p><code>global {<br \/>\n&nbsp;&nbsp;...<br \/>\n&nbsp;&nbsp;locking_type = 3<br \/>\n&nbsp;&nbsp;...<br \/>\n}<\/code><\/p>\n<p>Start <strong>cman<\/strong> and <strong>clvmd<\/strong> services on drbd-peers and get our cluster ready for the action:<\/p>\n<p><code>(drbd-node1)# service cman start<br \/>\nStarting cluster:<br \/>\nChecking if cluster has been disabled at boot...        [  OK  ]<br \/>\nChecking Network Manager...                             [  OK  ]<br \/>\nGlobal setup...                                         [  OK  ]<br \/>\nLoading kernel modules...                               [  OK  ]<br \/>\nMounting configfs...                                    [  OK  ]<br \/>\nStarting cman...                                        [  OK  ]<br \/>\nWaiting for quorum...                                   [  OK  ]<br \/>\nStarting fenced...                                      [  OK  ]<br \/>\nStarting dlm_controld...                                [  OK  ]<br \/>\nUnfencing self...                                       [  OK  ]<br \/>\nJoining fence domain...                                 [  OK  ]<br \/>\n(drbd-node1)# service clvmd start<br \/>\nStarting clvmd:<br \/>\nActivating VG(s):   2 logical volume(s) in volume group \"vg-sys\" now active<br \/>\n2 logical volume(s) in volume group \"vg_shared\" now active<br \/>\n[  OK  ]<\/code><\/p>\n<p>Now, as we already have a Volume Group on the shared storage, we can easily make it cluster-aware:<\/p>\n<p><code>(drbd-node1)# vgchange -c y vg_shared<\/code><\/p>\n<p>Now we see the &#8216;c&#8217; flag in VG Attributes:<\/p>\n<p><code>(drbd-node1)# vgs<br \/>\nVG \u00a0 \u00a0 \u00a0 \u00a0#PV #LV #SN Attr \u00a0 \u00a0VSize \u00a0 VFree<br \/>\nvg_shared \u00a0 1 \u00a0 3 \u00a0 0 wz--nc \u00a01.29t \u00a0 1.04t<br \/>\nvg_sys \u00a0 \u00a0 \u00a01 \u00a0 2 \u00a0 0 wz--n- \u00a019.97g \u00a05.97g<\/code><\/p>\n<p>As a result, Logical Volumes created in the <em>vg_shared<\/em> volume group will be active on both nodes, and clustered locking is enabled for operations with volumes in this group. LVM commands can be issued on both hosts and <strong>clvmd<\/strong> takes care of possible concurrent metadata changes.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(Crossposted from Mirantis Official Blog) As Florian Haas has pointed out in my previous post&#8217;s comment, our shared storage configuration requires special precautions to avoid corruption of data when two hosts connected via DRBD try to manage LVM volumes simultaneously. Generally, these precautions concern locking LVM metadata operations while running DRBD in &#8216;dual-primary&#8217; mode. Let&#8217;s&#8230;  <a href=\"https:\/\/www.openstack.org\/blog\/clustered-lvm-on-drbd-resource-in-fedora-linux\/\" class=\"more-link\" title=\"Read Clustered LVM on DRBD resource in Fedora Linux\">Read more &raquo;<\/a><\/p>\n","protected":false},"author":13,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[21,60],"tags":[201,214],"_links":{"self":[{"href":"https:\/\/www.openstack.org\/blog\/wp-json\/wp\/v2\/posts\/1042"}],"collection":[{"href":"https:\/\/www.openstack.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.openstack.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.openstack.org\/blog\/wp-json\/wp\/v2\/users\/13"}],"replies":[{"embeddable":true,"href":"https:\/\/www.openstack.org\/blog\/wp-json\/wp\/v2\/comments?post=1042"}],"version-history":[{"count":7,"href":"https:\/\/www.openstack.org\/blog\/wp-json\/wp\/v2\/posts\/1042\/revisions"}],"predecessor-version":[{"id":1049,"href":"https:\/\/www.openstack.org\/blog\/wp-json\/wp\/v2\/posts\/1042\/revisions\/1049"}],"wp:attachment":[{"href":"https:\/\/www.openstack.org\/blog\/wp-json\/wp\/v2\/media?parent=1042"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.openstack.org\/blog\/wp-json\/wp\/v2\/categories?post=1042"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.openstack.org\/blog\/wp-json\/wp\/v2\/tags?post=1042"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}