From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11498 invoked by alias); 19 May 2009 19:57:57 -0000 Received: (qmail 11492 invoked by alias); 19 May 2009 19:57:57 -0000 X-SWARE-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS X-Spam-Status: No, hits=-2.0 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS X-Spam-Check-By: sourceware.org X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bastion2.fedora.phx.redhat.com Subject: rgmanager: master - rgmanager: Allow reboot if main proc. is killed To: cluster-cvs-relay@redhat.com X-Project: Cluster Project X-Git-Module: rgmanager.git X-Git-Refname: refs/heads/master X-Git-Reftype: branch X-Git-Oldrev: 07e55b5fb5a82b5e1ee61b6145e6b2b6f16f1cb4 X-Git-Newrev: aa4d48b19cd3925cab71f2d2e34b9362ebbfcad2 From: Lon Hohberger Message-Id: <20090519195730.97FDA120152@lists.fedorahosted.org> Date: Tue, 19 May 2009 19:57:00 -0000 X-Scanned-By: MIMEDefang 2.58 on 172.16.52.254 Mailing-List: contact cluster-cvs-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Post: List-Help: , Sender: cluster-cvs-owner@sourceware.org X-SW-Source: 2009-q2/txt/msg00336.txt.bz2 Gitweb: http://git.fedorahosted.org/git/rgmanager.git?p=rgmanager.git;a=commitdiff;h=aa4d48b19cd3925cab71f2d2e34b9362ebbfcad2 Commit: aa4d48b19cd3925cab71f2d2e34b9362ebbfcad2 Parent: 07e55b5fb5a82b5e1ee61b6145e6b2b6f16f1cb4 Author: Lon Hohberger AuthorDate: Tue May 19 15:45:13 2009 -0400 Committer: Lon Hohberger CommitterDate: Tue May 19 15:57:10 2009 -0400 rgmanager: Allow reboot if main proc. is killed The Linux OOM killer uses SIGKILL to destroy processes. While rgmanager isn't likely to die due to high memory pressure due to a low 'badness' score, inadvertently dying and not rebooting the node can have unintended consequences. Resolves: 488072 Signed-off-by: Lon Hohberger --- rgmanager/src/daemons/watchdog.c | 24 ++++++++++++++---------- 1 files changed, 14 insertions(+), 10 deletions(-) diff --git a/rgmanager/src/daemons/watchdog.c b/rgmanager/src/daemons/watchdog.c index 7dc004d..3846104 100644 --- a/rgmanager/src/daemons/watchdog.c +++ b/rgmanager/src/daemons/watchdog.c @@ -3,6 +3,7 @@ #include #include #include +#include #include #include @@ -50,6 +51,7 @@ watchdog_init(void) return parent; redirect_signals(); + mlockall(MCL_CURRENT); /* shouldn't need MCL_FUTURE */ while (1) { if (waitpid(child, &status, 0) <= 0) @@ -60,20 +62,22 @@ watchdog_init(void) if (WIFSIGNALED(status)) { if (WTERMSIG(status) == SIGKILL) { - logt_print(LOG_CRIT, "Watchdog: Daemon killed, exiting\n"); - raise(SIGKILL); - while(1) ; + /* Assume the admin did a 'killall' - it will + * kill us within a couple of seconds. If + * we are still alive after this sleep, it + * could have been the OOM killer killing + * rgmanager proper and we need to reboot. + */ + sleep(3); } - else { #ifdef DEBUG - logt_print(LOG_CRIT, "Watchdog: Daemon died, but not rebooting because DEBUG is set\n"); + logt_print(LOG_CRIT, "Watchdog: Daemon died, but not rebooting because DEBUG is set\n"); #else - logt_print(LOG_CRIT, "Watchdog: Daemon died, rebooting...\n"); - sync(); - reboot(RB_AUTOBOOT); + logt_print(LOG_CRIT, "Watchdog: Daemon died, rebooting...\n"); + sync(); + reboot(RB_AUTOBOOT); #endif - exit(255); - } + exit(255); } } }