From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 20298 invoked by alias); 20 Apr 2005 05:51:15 -0000 Mailing-List: contact cluster-cvs-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: cluster-cvs-owner@sources.redhat.com Received: (qmail 20281 invoked by uid 9453); 20 Apr 2005 05:51:15 -0000 Date: Wed, 20 Apr 2005 05:51:00 -0000 Message-ID: <20050420055115.20278.qmail@sourceware.org> From: teigland@sourceware.org To: cluster-cvs@sources.redhat.com Subject: cluster/fence/fenced recover.c X-SW-Source: 2005-q2/txt/msg00098.txt.bz2 List-Id: CVSROOT: /cvs/cluster Module name: cluster Branch: RHEL4 Changes by: teigland@sourceware.org 2005-04-20 05:51:15 Modified files: fence/fenced : recover.c Log message: Improve logic that delays and reduces fencing. When fenced is recovering for a failed node, the 'post_fail_delay' is used to give victims some time to rejoin the cluster and avoid being fenced. If this happens once, then it's likely to happen again and the 'post_join_delay' is more appropriate, so fenced switches to the 'post_join_delay' value (if it's larger which is usually the case.) The common situation where this helps is when multiple nodes fail causing the cluster to lose quorum and then the failed nodes all rejoin the cluster at about the same time. The rejoining nodes are more likely to all avoid being fenced if fenced uses the larger post_join_delay. Patches: http://sources.redhat.com/cgi-bin/cvsweb.cgi/cluster/fence/fenced/recover.c.diff?cvsroot=cluster&only_with_tag=RHEL4&r1=1.10.2.6&r2=1.10.2.7