public inbox for cluster-cvs@sourceware.org
help / color / mirror / Atom feed
From: Christine Caulfield <chrissie@fedoraproject.org>
To: cluster-cvs-relay@redhat.com
Subject: cluster: RHEL55 - cman: Fix a situation where cman could kill the wrong nodes
Date: Fri, 14 Aug 2009 08:24:00 -0000	[thread overview]
Message-ID: <20090814081813.5BEB012037F@lists.fedorahosted.org> (raw)

Gitweb:        http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=34bccfffdb35f368a72e2fa6859f15f6e8f9ebb8
Commit:        34bccfffdb35f368a72e2fa6859f15f6e8f9ebb8
Parent:        5afd2776b87869189497eb722fe631de60cd09c1
Author:        Christine Caulfield <ccaulfie@redhat.com>
AuthorDate:    Wed Jul 29 11:17:47 2009 +0100
Committer:     Christine Caulfield <ccaulfie@redhat.com>
CommitterDate: Fri Aug 14 09:06:02 2009 +0100

cman: Fix a situation where cman could kill the wrong nodes

hmm, how to describe this .... Hmmmm. OK lets try:

There were a couple of places in the cman code where the transition message
assumed that the node in question (either this node or the sending node) was
joining the cluster, rather than just sending it's current post-transition
state. This was wrong. It's a common problem we have with openais/corosync
in that it always merges clusters rather than joining from scratch so we
need to detect that in some way.

The code in ais.c has a flag called 'first_trans' which it sets when it first
encounters another node in the cluster. We should use this more often as it's
really helpful. So this is what we now do. The comments in the existing code
make it clear that it assumed the node was joining and not just part of an
existing transition, but the first_trans flag was not checked, so it was
fairly obvious what was going on.

So, now we check the first_trans flag in all places where the code assumes
that the node is joining a new cluster.

rhbz#513260

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
---
 cman/daemon/commands.c |   16 +++++++++-------
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/cman/daemon/commands.c b/cman/daemon/commands.c
index 710449e..402efbb 100644
--- a/cman/daemon/commands.c
+++ b/cman/daemon/commands.c
@@ -80,6 +80,7 @@ static openais_timer_handle quorum_device_timer;
 static openais_timer_handle ccsd_timer;
 static unsigned int wanted_config_version;
 static int config_error;
+static int local_first_trans;
 
 static openais_timer_handle shutdown_timer;
 static struct connection *shutdown_con;
@@ -1544,6 +1545,7 @@ void send_transition_msg(int last_memb_count, int first_trans)
 	int len = sizeof(struct cl_transmsg);
 
 	we_are_a_cluster_member = 1;
+	local_first_trans = first_trans;
 
 	P_MEMB("sending TRANSITION message. cluster_name = %s\n", cluster_name);
 	msg->cmd = CLUSTER_MSG_TRANSITION;
@@ -1720,9 +1722,9 @@ static void do_process_transition(int nodeid, char *data, int len)
 		return; // PJC ???
 	}
 
-	/* If the remote node can see AISONLY nodes then we can't join as we don't
-	   know the full state */
-	if (msg->flags & NODE_FLAGS_SEESDISALLOWED && !have_disallowed()) {
+	/* If the remote node can see AISONLY nodes and we want to join,
+	   then we can't, as we don't know the full state */
+	if (local_first_trans && msg->flags & NODE_FLAGS_SEESDISALLOWED && !have_disallowed()) {
 		/* Must use syslog directly here or the message will never arrive */
 		syslog(LOG_CRIT, "CMAN: Joined a cluster with disallowed nodes. must die");
 		openais_shutdown(CMAN_EXIT_AISEXEC_DISALLOWED);
@@ -1785,10 +1787,10 @@ static void do_process_transition(int nodeid, char *data, int len)
 		add_ais_node(nodeid, incarnation, num_ais_nodes);
 	}
 
-	/* If the cluster already has some AISONLY nodes then we can't make
-	   sense of the membership. So the new node has to also be AISONLY
-	   until we are consistent again */
-	if (have_disallowed() && !node->us)
+	/* If the new node is joining and the existing cluster already has some AISONLY
+	   nodes then we can't make sense of the membership.
+	   So the new node has to also be AISONLY until we are consistent again */
+	if (msg->first_trans && !node->us && have_disallowed())
 		node->state = NODESTATE_AISONLY;
 
 	node->flags = msg->flags; /* This will clear the BEENDOWN flag of course */


                 reply	other threads:[~2009-08-14  8:24 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090814081813.5BEB012037F@lists.fedorahosted.org \
    --to=chrissie@fedoraproject.org \
    --cc=cluster-cvs-relay@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).