From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13298 invoked by alias); 11 Mar 2008 17:20:53 -0000 Received: (qmail 13266 invoked by uid 9453); 11 Mar 2008 17:20:52 -0000 Date: Tue, 11 Mar 2008 17:20:00 -0000 Message-ID: <20080311172052.13251.qmail@sourceware.org> From: teigland@sourceware.org To: cluster-cvs@sources.redhat.com, cluster-devel@redhat.com Subject: Cluster Project branch, RHEL5, updated. cmirror_1_1_15-7-gb70ad6f X-Git-Refname: refs/heads/RHEL5 X-Git-Reftype: branch X-Git-Oldrev: d5e690aa185be4fcaa411f82016c59921a729e5e X-Git-Newrev: b70ad6fe5a5795a699ad208ab009a4c952e9078f Mailing-List: contact cluster-cvs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: cluster-cvs-owner@sourceware.org X-SW-Source: 2008-q1/txt/msg00295.txt.bz2 This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "Cluster Project". http://sources.redhat.com/git/gitweb.cgi?p=cluster.git;a=commitdiff;h=b70ad6fe5a5795a699ad208ab009a4c952e9078f The branch, RHEL5 has been updated via b70ad6fe5a5795a699ad208ab009a4c952e9078f (commit) from d5e690aa185be4fcaa411f82016c59921a729e5e (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- commit b70ad6fe5a5795a699ad208ab009a4c952e9078f Author: David Teigland Date: Tue Mar 11 12:18:23 2008 -0500 groupd: purge messages from dead nodes bz 436984 In the fix for bug 258121, 70294dd8b717de89f2d168c0837c011648908558, we began taking nodedown events via the groupd cpg, instead of via the per group cpg. Messages still come in via the per group cpg. I believe that that opened the possibility of processing a message from a node after processing the nodedown for it. In Nate's revolver test, we saw it happen; revolver killed nodes 1,2,3, leaving just node 4: 1205198713 0:default confchg left 3 joined 0 total 1 1205198713 0:default confchg removed node 1 reason 3 1205198713 0:default confchg removed node 2 reason 3 1205198713 0:default confchg removed node 3 reason 3 ... 1205198713 0:default mark_node_started: event not starting 12 from 2 ----------------------------------------------------------------------- Summary of changes: group/daemon/app.c | 25 +++++++++++++++++++++++++ group/daemon/cpg.c | 1 + group/daemon/gd_internal.h | 1 + 3 files changed, 27 insertions(+), 0 deletions(-) diff --git a/group/daemon/app.c b/group/daemon/app.c index db5f88b..df17896 100644 --- a/group/daemon/app.c +++ b/group/daemon/app.c @@ -702,6 +702,31 @@ int queue_app_message(group_t *g, struct save_msg *save) return 0; } +/* This is called when we get the nodedown for the per-group cpg; we know + that after the cpg nodedown we won't get any further messages. bz 436984 + It's conceivable but unlikely that the nodedown processing (initiated by + the groupd cpg nodedown) could begin before the per-group cpg nodedown + is received where this purging occurs. If it does, then we may need to + add code to wait for the nodedown to happen in both the groupd cpg and the + per-group cpg before processing the nodedown. */ + +void purge_node_messages(group_t *g, int nodeid) +{ + struct save_msg *save, *tmp; + + list_for_each_entry_safe(save, tmp, &g->messages, list) { + if (save->nodeid != nodeid) + continue; + + log_group(g, "purge msg from dead node %d", nodeid); + + list_del(&save->list); + if (save->msg_long) + free(save->msg_long); + free(save); + } +} + static void del_app_nodes(app_t *a) { node_t *node, *tmp; diff --git a/group/daemon/cpg.c b/group/daemon/cpg.c index edd593f..d5cfb8d 100644 --- a/group/daemon/cpg.c +++ b/group/daemon/cpg.c @@ -413,6 +413,7 @@ void process_confchg(void) case CPG_REASON_NODEDOWN: case CPG_REASON_PROCDOWN: /* process_node_down(g, saved_left[i].nodeid); */ + purge_node_messages(g, saved_left[i].nodeid); break; default: log_error(g, "unknown leave reason %d node %d", diff --git a/group/daemon/gd_internal.h b/group/daemon/gd_internal.h index 9c0026c..691d187 100644 --- a/group/daemon/gd_internal.h +++ b/group/daemon/gd_internal.h @@ -263,6 +263,7 @@ void groupd_down(int nodeid); char *msg_type(int type); int process_app(group_t *g); int is_our_join(event_t *ev); +void purge_node_messages(group_t *g, int nodeid); /* main.c */ void app_stop(app_t *a); hooks/post-receive -- Cluster Project