public inbox for cluster-cvs@sourceware.org
help / color / mirror / Atom feed
* cluster: RHEL55 - "fsck.gfs2: invalid option -- a" on boot when mounting gfs2 root
@ 2009-08-17 16:14 Bob Peterson
0 siblings, 0 replies; only message in thread
From: Bob Peterson @ 2009-08-17 16:14 UTC (permalink / raw)
To: cluster-cvs-relay
Gitweb: http://git.fedorahosted.org/git/cluster.git?p=cluster.git;a=commitdiff;h=01439d0fcb9466245ecd87c002e59949864f1fcc
Commit: 01439d0fcb9466245ecd87c002e59949864f1fcc
Parent: 93cd1d93eadd44ea67ff4a2e486564c0254a1105
Author: Bob Peterson <rpeterso@redhat.com>
AuthorDate: Thu Jul 30 14:06:22 2009 -0500
Committer: Bob Peterson <rpeterso@redhat.com>
CommitterDate: Mon Aug 17 10:56:25 2009 -0500
"fsck.gfs2: invalid option -- a" on boot when mounting gfs2 root
This patch fixes a problem whereby fsck.gfs2 did not accept the
-a parameter passed to it by rc.sysinit. That cause booting from
gfs2 as root partition to fail.
rhbz#507596
This patch adds three new parameters to fsck.gfs2:
-a Same as -p (repair file system if dirty)
-p Preen (Check file system and repair if it is dirty and if it's safe)
-f force check, even if the file system seems clean
---
gfs2/fsck/fs_recovery.c | 107 +++++++++++++++++++++++++++++++++++++++++------
gfs2/fsck/fs_recovery.h | 4 +-
gfs2/fsck/fsck.h | 4 +-
gfs2/fsck/initialize.c | 48 ++++++++++++---------
gfs2/fsck/main.c | 27 ++++++++++--
gfs2/man/fsck.gfs2.8 | 31 +++++++++++++-
6 files changed, 179 insertions(+), 42 deletions(-)
diff --git a/gfs2/fsck/fs_recovery.c b/gfs2/fsck/fs_recovery.c
index 63eda74..a90aa77 100644
--- a/gfs2/fsck/fs_recovery.c
+++ b/gfs2/fsck/fs_recovery.c
@@ -384,8 +384,34 @@ int fix_journal_seq_no(struct gfs2_inode *ip)
}
/**
+ * preen_is_safe - Can we safely preen the file system?
+ *
+ * If a preen option was specified (-a or -p) we're likely to have been
+ * called from rc.sysinit. We need to determine whether this is shared
+ * storage or not. If it's local storage (locking protocol==lock_nolock)
+ * it's safe to preen the file system. If it's lock_dlm, it's likely
+ * mounted by other nodes in the cluster, which is dangerous and therefore,
+ * we should warn the user to run fsck.gfs2 manually when it's safe.
+ */
+int preen_is_safe(struct gfs2_sbd *sdp, int preen, int force_check)
+{
+ if (!preen) /* If preen was not specified */
+ return 1; /* not called by rc.sysinit--we're okay to preen */
+ if (force_check) /* If check was forced by the user? */
+ return 1; /* user's responsibility--we're okay to preen */
+ if(!memcmp(sdp->sd_sb.sb_lockproto + 5, "nolock", 6))
+ return 1; /* local file system--preen is okay */
+ return 0; /* might be mounted on another node--not guaranteed safe */
+}
+
+/**
* gfs2_recover_journal - recovery a given journal
* @ip: the journal incore inode
+ * j: which journal to check
+ * preen: Was preen (-a or -p) specified?
+ * force_check: Was -f specified to force the check?
+ * @was_clean: if the journal was originally clean, this is set to 1.
+ * if the journal was dirty from the start, this is set to 0.
*
* Acquire the journal's lock, check to see if the journal is clean, and
* do recovery if necessary.
@@ -393,21 +419,42 @@ int fix_journal_seq_no(struct gfs2_inode *ip)
* Returns: errno
*/
-int gfs2_recover_journal(struct gfs2_inode *ip, int j)
+static int gfs2_recover_journal(struct gfs2_inode *ip, int j, int preen,
+ int force_check, int *was_clean)
{
struct gfs2_sbd *sdp = ip->i_sbd;
struct gfs2_log_header head;
unsigned int pass;
int error;
+ *was_clean = 0;
log_info("jid=%u: Looking at journal...\n", j);
osi_list_init(&sd_revoke_list);
error = gfs2_find_jhead(ip, &head);
if (error) {
- if (!query(&opts, "\nJournal #%d (\"journal%d\") is corrupt. "
- "Okay to repair it? (y/n)", j+1, j)) {
- log_err("jid=%u: The journal was not repaired.\n", j);
+ if (opts.no) {
+ log_err("Journal #%d (\"journal%d\") is corrupt.\n"
+ "Not fixing it due to the -n option.\n",
+ j+1, j);
+ goto out;
+ }
+ if (!preen_is_safe(sdp, preen, force_check)) {
+ log_err("Journal #%d (\"journal%d\") is corrupt.\n",
+ j+1, j);
+ log_err("I'm not fixing it because it may be unsafe:\n"
+ "Locking protocol is not lock_nolock and "
+ "the -a or -p option was specified.\n");
+ log_err("Please make sure no node has the file system "
+ "mounted then rerun fsck.gfs2 manually "
+ "without -a or -p.\n");
+ goto out;
+ }
+ if (!query(&opts, "\nJournal #%d (\"journal%d\") is "
+ "corrupt. Okay to repair it? (y/n)",
+ j+1, j)) {
+ log_err("jid=%u: The journal was not repaired.\n",
+ j);
goto out;
}
log_info("jid=%u: Repairing journal...\n", j);
@@ -426,10 +473,28 @@ int gfs2_recover_journal(struct gfs2_inode *ip, int j)
}
if (head.lh_flags & GFS2_LOG_HEAD_UNMOUNT) {
log_info("jid=%u: Journal is clean.\n", j);
+ *was_clean = 1;
return 0;
}
- if (query(&opts, "\nJournal #%d (\"journal%d\") is dirty. Okay to replay it? (y/n)",
- j+1, j)) {
+ if (opts.no) {
+ log_err("Journal #%d (\"journal%d\") is dirty; not replaying"
+ " due to the -n option.\n",
+ j+1, j);
+ goto out;
+ }
+ if (!preen_is_safe(sdp, preen, force_check)) {
+ log_err("Journal #%d (\"journal%d\") is dirty.\n", j+1, j);
+ log_err("I'm not replaying it because it may be unsafe:\n"
+ "Locking protocol is not lock_nolock and "
+ "the -a or -p option was specified.\n");
+ log_err("Please make sure no node has the file system "
+ "mounted then rerun fsck.gfs2 manually "
+ "without -a or -p.\n");
+ error = FSCK_ERROR;
+ goto out;
+ }
+ if (query(&opts, "\nJournal #%d (\"journal%d\") is dirty. Okay to "
+ "replay it? (y/n)", j+1, j)) {
log_info("jid=%u: Replaying journal...\n", j);
sd_found_jblocks = sd_replayed_jblocks = 0;
@@ -470,6 +535,9 @@ out:
/*
* replay_journals - replay the journals
* sdp: the super block
+ * preen: Was preen (-a or -p) specified?
+ * force_check: Was -f specified to force the check?
+ * @clean_journals - set to the number of clean journals we find
*
* There should be a flag to the fsck to enable/disable this
* feature. The fsck falls back to clearing the journal if an
@@ -477,10 +545,13 @@ out:
*
* Returns: 0 on success, -1 on failure
*/
-int replay_journals(struct gfs2_sbd *sdp){
+int replay_journals(struct gfs2_sbd *sdp, int preen, int force_check,
+ int *clean_journals)
+{
int i;
+ int clean = 0, dirty_journals = 0, error = 0, gave_msg = 0;
- log_notice("Recovering journals (this may take a while)");
+ *clean_journals = 0;
/* Get master dinode */
sdp->master_dir = gfs2_load_inode(sdp,
@@ -494,17 +565,27 @@ int replay_journals(struct gfs2_sbd *sdp){
}
for(i = 0; i < sdp->md.journals; i++) {
- if((i % 2) == 0)
- log_at_notice(".");
- gfs2_recover_journal(sdp->md.journal[i], i);
+ if (!error) {
+ error = gfs2_recover_journal(sdp->md.journal[i], i,
+ preen, force_check,
+ &clean);
+ if (!clean)
+ dirty_journals++;
+ if (!gave_msg && dirty_journals == 1 && !opts.no &&
+ preen_is_safe(sdp, preen, force_check)) {
+ gave_msg = 1;
+ log_notice("Recovering journals (this may "
+ "take a while)\n");
+ }
+ *clean_journals += clean;
+ }
inode_put(sdp->md.journal[i],
(opts.no ? not_updated : updated));
}
- log_notice("\nJournal recovery complete.\n");
inode_put(sdp->master_dir, not_updated);
inode_put(sdp->md.jiinode, not_updated);
/* Sync the buffers to disk so we get a fresh start. */
bsync(&sdp->buf_list);
bsync(&sdp->nvbuf_list);
- return 0;
+ return error;
}
diff --git a/gfs2/fsck/fs_recovery.h b/gfs2/fsck/fs_recovery.h
index b7ae2b0..c86a85d 100644
--- a/gfs2/fsck/fs_recovery.h
+++ b/gfs2/fsck/fs_recovery.h
@@ -16,7 +16,9 @@
#include "libgfs2.h"
-int replay_journals(struct gfs2_sbd *sdp);
+int replay_journals(struct gfs2_sbd *sdp, int preen, int force_check,
+ int *clean_journals);
+int preen_is_safe(struct gfs2_sbd *sdp, int preen, int force_check);
#endif /* __FS_RECOVERY_H__ */
diff --git a/gfs2/fsck/fsck.h b/gfs2/fsck/fsck.h
index e6043ac..57dd6b4 100644
--- a/gfs2/fsck/fsck.h
+++ b/gfs2/fsck/fsck.h
@@ -77,9 +77,9 @@ struct gfs2_inode *fsck_inode_get(struct gfs2_sbd *sdp,
struct gfs2_buffer_head *bh);
void fsck_inode_put(struct gfs2_inode *ip, enum update_flags update);
-int initialize(struct gfs2_sbd *sbp);
+int initialize(struct gfs2_sbd *sbp, int force_check, int preen,
+ int *all_clean);
void destroy(struct gfs2_sbd *sbp);
-int block_mounters(struct gfs2_sbd *sbp, int block_em);
int pass1(struct gfs2_sbd *sbp);
int pass1b(struct gfs2_sbd *sbp);
int pass1c(struct gfs2_sbd *sbp);
diff --git a/gfs2/fsck/initialize.c b/gfs2/fsck/initialize.c
index ede9f43..10c8ecc 100644
--- a/gfs2/fsck/initialize.c
+++ b/gfs2/fsck/initialize.c
@@ -21,6 +21,7 @@
#include <fcntl.h>
#include <string.h>
#include <unistd.h>
+#include <errno.h>
#include "libgfs2.h"
#include "fsck.h"
@@ -35,20 +36,6 @@
}
/**
- * init_journals
- *
- * Go through journals and replay them - then clear them
- */
-int init_journals(struct gfs2_sbd *sbp)
-{
- if(!opts.no) {
- if(replay_journals(sbp))
- return 1;
- }
- return 0;
-}
-
-/**
* block_mounters
*
* Change the lock protocol so nobody can mount the fs
@@ -122,7 +109,8 @@ static void empty_super_block(struct gfs2_sbd *sdp)
}
}
- gfs2_block_list_destroy(sdp, bl);
+ if (bl)
+ gfs2_block_list_destroy(sdp, bl);
}
@@ -339,8 +327,13 @@ static int fill_super_block(struct gfs2_sbd *sdp)
* initialize - initialize superblock pointer
*
*/
-int initialize(struct gfs2_sbd *sbp)
+int initialize(struct gfs2_sbd *sbp, int force_check, int preen,
+ int *all_clean)
{
+ int clean_journals = 0;
+
+ *all_clean = 0;
+
if(opts.no) {
if ((sbp->device_fd = open(opts.device, O_RDONLY)) < 0) {
log_crit("Unable to open device: %s\n", opts.device);
@@ -348,8 +341,12 @@ int initialize(struct gfs2_sbd *sbp)
}
} else {
/* read in sb from disk */
- if ((sbp->device_fd = open(opts.device, O_RDWR)) < 0){
- log_crit("Unable to open device: %s\n", opts.device);
+ if ((sbp->device_fd = open(opts.device, O_RDWR | O_EXCL)) < 0){
+ if (errno == EBUSY)
+ log_crit("Device %s is busy.\n", opts.device);
+ else
+ log_crit("Unable to open device: %s\n",
+ opts.device);
return FSCK_USAGE;
}
}
@@ -359,7 +356,7 @@ int initialize(struct gfs2_sbd *sbp)
}
/* Change lock protocol to be fsck_* instead of lock_* */
- if(!opts.no) {
+ if(!opts.no && preen_is_safe(sbp, preen, force_check)) {
if(block_mounters(sbp, 1)) {
log_err("Unable to block other mounters\n");
return FSCK_USAGE;
@@ -368,12 +365,21 @@ int initialize(struct gfs2_sbd *sbp)
/* verify various things */
- if(init_journals(sbp)) {
- if(!opts.no)
+ if(replay_journals(sbp, preen, force_check, &clean_journals)) {
+ if(!opts.no && preen_is_safe(sbp, preen, force_check))
block_mounters(sbp, 0);
stack;
return FSCK_ERROR;
}
+ if (sbp->md.journals == clean_journals)
+ *all_clean = 1;
+ else {
+ if (force_check || !preen)
+ log_notice("\nJournal recovery complete.\n");
+ }
+
+ if (!force_check && *all_clean && preen)
+ return FSCK_OK;
if (init_system_inodes(sbp))
return FSCK_ERROR;
diff --git a/gfs2/fsck/main.c b/gfs2/fsck/main.c
index a3759d2..8c85c8d 100644
--- a/gfs2/fsck/main.c
+++ b/gfs2/fsck/main.c
@@ -29,7 +29,7 @@ struct gfs2_options opts = {0};
struct gfs2_inode *lf_dip; /* Lost and found directory inode */
osi_list_t dir_hash[FSCK_HASH_SIZE];
osi_list_t inode_hash[FSCK_HASH_SIZE];
-struct gfs2_block_list *bl;
+struct gfs2_block_list *bl = NULL;
uint64_t last_fs_block, last_reported_block = -1;
int skip_this_pass = FALSE, fsck_abort = FALSE;
int errors_found = 0, errors_corrected = 0;
@@ -37,6 +37,7 @@ const char *pass = "";
uint64_t last_data_block;
uint64_t first_data_block;
char *prog_name = "gfs2_fsck"; /* needed by libgfs2 */
+int preen = 0, force_check = 0;
/* This function is for libgfs2's sake. */
void print_it(const char *label, const char *fmt, const char *fmt2, ...)
@@ -51,7 +52,7 @@ void print_it(const char *label, const char *fmt, const char *fmt2, ...)
void usage(char *name)
{
- printf("Usage: %s [-hnqvVy] <device> \n", basename(name));
+ printf("Usage: %s [-afhnpqvVy] <device> \n", basename(name));
}
void version(void)
@@ -65,9 +66,16 @@ int read_cmdline(int argc, char **argv, struct gfs2_options *opts)
{
int c;
- while((c = getopt(argc, argv, "hnqvyV")) != -1) {
+ while((c = getopt(argc, argv, "afhnpqvyV")) != -1) {
switch(c) {
+ case 'a':
+ preen = 1;
+ opts->yes = 1;
+ break;
+ case 'f':
+ force_check = 1;
+ break;
case 'h':
usage(argv[0]);
exit(FSCK_OK);
@@ -75,6 +83,10 @@ int read_cmdline(int argc, char **argv, struct gfs2_options *opts)
case 'n':
opts->no = 1;
break;
+ case 'p':
+ preen = 1;
+ opts->yes = 1;
+ break;
case 'q':
decrease_verbosity();
break;
@@ -264,6 +276,7 @@ int main(int argc, char **argv)
int j;
enum update_flags update_sys_files;
int error = 0;
+ int all_clean = 0;
memset(sbp, 0, sizeof(*sbp));
@@ -271,9 +284,15 @@ int main(int argc, char **argv)
exit(error);
setbuf(stdout, NULL);
log_notice("Initializing fsck\n");
- if ((error = initialize(sbp)))
+ if ((error = initialize(sbp, force_check, preen, &all_clean)))
exit(error);
+ if (!force_check && all_clean && preen) {
+ log_err("%s: clean.\n", opts.device);
+ destroy(sbp);
+ exit(FSCK_OK);
+ }
+
signal(SIGINT, interrupt);
log_notice("Starting pass1\n");
pass = "pass 1";
diff --git a/gfs2/man/fsck.gfs2.8 b/gfs2/man/fsck.gfs2.8
index ef6d43a..6910760 100644
--- a/gfs2/man/fsck.gfs2.8
+++ b/gfs2/man/fsck.gfs2.8
@@ -22,15 +22,29 @@ fsck.gfs2 can do. If important file system structures are destroyed, such that
the checker cannot determine what the repairs should be, reparations could
fail.
-GFS2 is a journaled file system, and as such should be able to repair damages to
+GFS2 is a journaled file system, and as such should be able to repair damage to
the file system on its own. However, faulty hardware has the ability to write
incomplete blocks to a file system thereby causing corruption that GFS2 cannot
fix. The first step to ensuring a healthy file system is the selection of
reliable hardware (i.e. storage systems that will write complete blocks - even
in the event of power failure).
+Note: Most file system checkers will not check the file system if it is
+"clean" (i.e. unmounted since the last use). The fsck.gfs program behaves
+differently because the storage may be shared among several nodes in a
+cluster, and therefore problems may have been introduced on a different
+computer. Therefore, fsck.gfs2 will always check the file system unless
+the -p (preen) option is used, in which case it follows special rules
+(see below).
+
.SH OPTIONS
.TP
+\fB-a\fP
+Same as the -p (preen) option.
+.TP
+\fB-f\fP
+Force checking even if the file system seems clean.
+.TP
\fB-h\fP
Help.
@@ -45,6 +59,21 @@ No to all questions.
By specifying this option, fsck.gfs2 will only show the changes that
would be made, but not make any changes to the filesystem.
.TP
+\fB-p\fP
+Preen (same as -a: automatically repair the file system if it is dirty,
+and safe to do so, otherwise exit.)
+
+Note: If the file system has locking protocol lock_nolock, the file system
+is considered a non-shared storage device and the fsck is deemed safe.
+However, fsck.gfs2 does not know whether it was called automatically
+from the init process, due to options in the /etc/fstab file. Therefore, if
+the locking protocol is lock_dlm and -a or -p was specified, fsck.gfs2
+cannot determine whether the disk is mounted by other nodes in the cluster.
+Therefore, the fsck is deemed to be unsafe and a warning is given
+if any damage or dirty journals are found. In that case, the file system
+should be unmounted from all nodes in the cluster and fsck.gfs2 should be
+run manually without the -a or -p options.
+.TP
\fB-V\fP
Version.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2009-08-17 16:14 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-17 16:14 cluster: RHEL55 - "fsck.gfs2: invalid option -- a" on boot when mounting gfs2 root Bob Peterson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).