public inbox for cluster-cvs@sourceware.org
help / color / mirror / Atom feed
From: Lon Hohberger <lon@fedoraproject.org>
To: cluster-cvs-relay@redhat.com
Subject: cluster: STABLE3 - rgmanager: Allow reboot if main proc. is killed
Date: Tue, 19 May 2009 19:55:00 -0000	[thread overview]
Message-ID: <20090519195451.15A1E120152@lists.fedorahosted.org> (raw)

Gitweb:        http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=62b8e821509c36b6b35777a0f3c1a459ac084766
Commit:        62b8e821509c36b6b35777a0f3c1a459ac084766
Parent:        38810c9725b5295c2d97ae5034c421b65a5d9863
Author:        Lon Hohberger <lhh@redhat.com>
AuthorDate:    Tue May 19 15:45:13 2009 -0400
Committer:     Lon Hohberger <lhh@redhat.com>
CommitterDate: Tue May 19 15:54:44 2009 -0400

rgmanager: Allow reboot if main proc. is killed

The Linux OOM killer uses SIGKILL to destroy processes.
While rgmanager isn't likely to die due to high memory
pressure due to a low 'badness' score, inadvertently
dying and not rebooting the node can have unintended
consequences.

Resolves: 488072

Signed-off-by: Lon Hohberger <lhh@redhat.com>
---
 rgmanager/src/daemons/watchdog.c |   24 ++++++++++++++----------
 1 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/rgmanager/src/daemons/watchdog.c b/rgmanager/src/daemons/watchdog.c
index 7dc004d..3846104 100644
--- a/rgmanager/src/daemons/watchdog.c
+++ b/rgmanager/src/daemons/watchdog.c
@@ -3,6 +3,7 @@
 #include <sys/wait.h>
 #include <sys/reboot.h>
 #include <stdlib.h>
+#include <sys/mman.h>
 
 #include <signals.h>
 #include <logging.h>
@@ -50,6 +51,7 @@ watchdog_init(void)
 		return parent;
 	
 	redirect_signals();
+	mlockall(MCL_CURRENT); /* shouldn't need MCL_FUTURE */
 	
 	while (1) {
 	        if (waitpid(child, &status, 0) <= 0)
@@ -60,20 +62,22 @@ watchdog_init(void)
 		
 		if (WIFSIGNALED(status)) {
 		        if (WTERMSIG(status) == SIGKILL) {
-				logt_print(LOG_CRIT, "Watchdog: Daemon killed, exiting\n");
-				raise(SIGKILL);
-				while(1) ;
+				/* Assume the admin did a 'killall' - it will
+				 * kill us within a couple of seconds.  If 
+				 * we are still alive after this sleep, it
+				 * could have been the OOM killer killing
+				 * rgmanager proper and we need to reboot.
+				 */
+				sleep(3);
 			}
-			else {
 #ifdef DEBUG
-			        logt_print(LOG_CRIT, "Watchdog: Daemon died, but not rebooting because DEBUG is set\n");
+		        logt_print(LOG_CRIT, "Watchdog: Daemon died, but not rebooting because DEBUG is set\n");
 #else
-				logt_print(LOG_CRIT, "Watchdog: Daemon died, rebooting...\n");
-				sync();
-			        reboot(RB_AUTOBOOT);
+			logt_print(LOG_CRIT, "Watchdog: Daemon died, rebooting...\n");
+			sync();
+		        reboot(RB_AUTOBOOT);
 #endif
-				exit(255);
-			}
+			exit(255);
 		}
 	}
 }


                 reply	other threads:[~2009-05-19 19:55 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090519195451.15A1E120152@lists.fedorahosted.org \
    --to=lon@fedoraproject.org \
    --cc=cluster-cvs-relay@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).