From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8495 invoked by alias); 23 Mar 2009 14:50:28 -0000 Received: (qmail 8382 invoked by alias); 23 Mar 2009 14:50:28 -0000 X-SWARE-Spam-Status: No, hits=-1.6 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_23,SPF_HELO_PASS X-Spam-Status: No, hits=-1.6 required=5.0 tests=AWL,BAYES_00,J_CHICKENPOX_23,SPF_HELO_PASS X-Spam-Check-By: sourceware.org X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bastion.fedora.phx.redhat.com Subject: cluster: RHEL53 - rgmanager: Fix VM restart issue To: cluster-cvs-relay@redhat.com X-Project: Cluster Project X-Git-Module: cluster.git X-Git-Refname: refs/heads/RHEL53 X-Git-Reftype: branch X-Git-Oldrev: 2e7ffe3afad987f26ea1cbd8f2d9594af5004c3e X-Git-Newrev: 6581d5aef1ed4e316dc9453ee7ca912437f97a1f From: Lon Hohberger Message-Id: <20090323145003.3F33C12015A@lists.fedorahosted.org> Date: Mon, 23 Mar 2009 14:50:00 -0000 X-Scanned-By: MIMEDefang 2.58 on 172.16.52.254 Mailing-List: contact cluster-cvs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: cluster-cvs-owner@sourceware.org X-SW-Source: 2009-q1/txt/msg00871.txt.bz2 Gitweb: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=6581d5aef1ed4e316dc9453ee7ca912437f97a1f Commit: 6581d5aef1ed4e316dc9453ee7ca912437f97a1f Parent: 2e7ffe3afad987f26ea1cbd8f2d9594af5004c3e Author: Lon Hohberger AuthorDate: Thu Mar 19 11:56:57 2009 -0400 Committer: Lon Hohberger CommitterDate: Mon Mar 23 10:49:54 2009 -0400 rgmanager: Fix VM restart issue Problem description: * node A starts vm:foo. Before starting vm:foo, it asks the rest of the cluster if they have seen vm:foo * node B receives a status inquiry request from node A. It then executes a status check on that VM to see if it is running. It's not, so status returns 1. At this point, node B sets a NEEDSTOP flag. * Suppose you disable the VM on node A and start it on node B now. At this point, the NEEDSTOP flag is still persisted on node B, but is ignored by the start/status checks. * If you then do a configuration update, the NEEDSTOP flag is -still- there. After a configuration update (or during a special "recover" operation", the NEEDSTOP flag is used by rgmanager to decide what resources need to be stopped or not. Presence of this flag does NOT alter service state. * Rgmanager does its reconfiguration, sees the NEESTOP flag, and stops the virtual machine. Because the state has not actually changed according to rgmanager (NEEDSTOP is succeeded by NEEDSTART if a resource's parameters have changed, for example), the next status check causes a recovery of the VM and then the VM is restarted. Solution: * Don't set NEEDSTOP during STATUS_INQUIRY Signed-off-by: Lon Hohberger --- rgmanager/include/res-ocf.h | 1 + rgmanager/src/daemons/groups.c | 3 +++ rgmanager/src/daemons/restree.c | 28 +++++++++++++++++++++++++++- rgmanager/src/daemons/rg_state.c | 2 +- 4 files changed, 32 insertions(+), 2 deletions(-) diff --git a/rgmanager/include/res-ocf.h b/rgmanager/include/res-ocf.h index 89a7d56..ce8e1f3 100644 --- a/rgmanager/include/res-ocf.h +++ b/rgmanager/include/res-ocf.h @@ -64,5 +64,6 @@ #define RS_VALIDATE (12) #define RS_MIGRATE (13) #define RS_RECONFIG (14) +#define RS_STATUS_INQUIRY (15) /** Quick status */ #endif diff --git a/rgmanager/src/daemons/groups.c b/rgmanager/src/daemons/groups.c index 627953b..d973e73 100644 --- a/rgmanager/src/daemons/groups.c +++ b/rgmanager/src/daemons/groups.c @@ -982,6 +982,9 @@ group_op(char *groupname, int op) case RG_STATUS: ret = res_status(&_tree, res, NULL); break; + case RG_STATUS_INQUIRY: + ret = res_status_inquiry(&_tree, res, NULL); + break; case RG_CONDSTOP: ret = res_condstop(&_tree, res, NULL); break; diff --git a/rgmanager/src/daemons/restree.c b/rgmanager/src/daemons/restree.c index 802685e..e88fbe2 100644 --- a/rgmanager/src/daemons/restree.c +++ b/rgmanager/src/daemons/restree.c @@ -1359,7 +1359,19 @@ _res_op_internal(resource_node_t __attribute__ ((unused)) **tree, ++node->rn_resource->r_incarnations; node->rn_state = RES_STARTED; } - } else if (me && (op == RS_STATUS)) { + } else if (me && (op == RS_STATUS || op == RS_STATUS_INQUIRY)) { + + /* Special quick-check for status inquiry */ + if (op == RS_STATUS_INQUIRY) { + if (res_exec(node, RS_STATUS, NULL, 0) != 0) + return SFL_FAILURE; + + /* XXX: A migratable service (the only place this + * check can be used) cannot have child dependencies + * anyway, so this is a short-circuit. */ + return 0; + } + /* Check status before children*/ rv = do_status(node); if (rv != 0) { @@ -1545,6 +1557,20 @@ res_status(resource_node_t **tree, resource_t *res, void *ret) /** + Check status of all occurrences of a resource in a tree + + @param tree Tree to search for our resource. + @param res Resource to start/stop + @param ret Unused + */ +int +res_status_inquiry(resource_node_t **tree, resource_t *res, void *ret) +{ + return _res_op(tree, res, NULL, ret, RS_STATUS_INQUIRY); +} + + +/** Grab resource info for all occurrences of a resource in a tree @param tree Tree to search for our resource. diff --git a/rgmanager/src/daemons/rg_state.c b/rgmanager/src/daemons/rg_state.c index d0e0da6..3b4df10 100644 --- a/rgmanager/src/daemons/rg_state.c +++ b/rgmanager/src/daemons/rg_state.c @@ -1271,7 +1271,7 @@ svc_status_inquiry(char *svcName) if (svcStatus.rs_flags & RG_FLAG_FROZEN) return 0; - return group_op(svcName, RG_STATUS); + return group_op(svcName, RG_STATUS_INQUIRY); }