public inbox for lvm2-cvs@sourceware.org help / color / mirror / Atom feed
From: jbrassow@sourceware.org To: lvm-devel@redhat.com, lvm2-cvs@sourceware.org Subject: LVM2 ./WHATS_NEW daemons/cmirrord/functions.c Date: Tue, 17 Aug 2010 23:56:00 -0000 [thread overview] Message-ID: <20100817235625.9805.qmail@sourceware.org> (raw) CVSROOT: /cvs/lvm2 Module name: LVM2 Changes by: jbrassow@sourceware.org 2010-08-17 23:56:24 Modified files: . : WHATS_NEW daemons/cmirrord: functions.c Log message: Fix for bug 596453: multiple mirror image failures cause lvm repair... The lvm repair issues I believe are the superficial symptoms of this bug - there are worse issues that are not as clearly seen. From my inline comments: * If the mirror was successfully recovered, we want to always * force every machine to write to all devices - otherwise, * corruption will occur. Here's how: * Node1 suffers a failure and marks a region out-of-sync * Node2 attempts a write, gets by is_remote_recovering, * and queries the sync status of the region - finding * it out-of-sync. * Node2 thinks the write should be a nosync write, but it * hasn't suffered the drive failure that Node1 has yet. * It then issues a generic_make_request directly to * the primary image only - which is exactly the device * that has suffered the failure. * Node2 suffers a lost write - which completely bypasses the * mirror layer because it had gone through generic_m_r. * The file system will likely explode at this point due to * I/O errors. If it wasn't the primary that failed, it is * easily possible in this case to issue writes to just one * of the remaining images - also leaving the mirror inconsistent. * * We let in_sync() return 1 in a cluster regardless of what is * in the bitmap once recovery has successfully completed on a * mirror. This ensures the mirroring code will continue to * attempt to write to all mirror images. The worst that can * happen for reads is that additional read attempts may be * taken. Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/LVM2/WHATS_NEW.diff?cvsroot=lvm2&r1=1.1709&r2=1.1710 http://sourceware.org/cgi-bin/cvsweb.cgi/LVM2/daemons/cmirrord/functions.c.diff?cvsroot=lvm2&r1=1.22&r2=1.23 --- LVM2/WHATS_NEW 2010/08/17 19:25:05 1.1709 +++ LVM2/WHATS_NEW 2010/08/17 23:56:23 1.1710 @@ -1,5 +1,6 @@ Version 2.02.73 - ================================ + Fix potential for corruption during cluster mirror device failure. Use 'SINGLENODE' instead of 'dead' in clvmd singlenode messages. Ignore snapshots when performing mirror recovery beneath an origin. Pass LCK_ORIGIN_ONLY flag around cluster. --- LVM2/daemons/cmirrord/functions.c 2010/08/04 18:18:18 1.22 +++ LVM2/daemons/cmirrord/functions.c 2010/08/17 23:56:24 1.23 @@ -54,6 +54,7 @@ time_t delay; /* limits how fast a resume can happen after suspend */ int touched; + int in_sync; /* An in-sync that stays set until suspend/resume */ uint32_t region_size; uint32_t region_count; uint64_t sync_count; @@ -720,6 +721,7 @@ if (!lc) return -EINVAL; + lc->in_sync = 0; switch (lc->resume_override) { case 1000: LOG_ERROR("[%s] Additional resume issued before suspend", @@ -963,6 +965,42 @@ return -EINVAL; *rtn = log_test_bit(lc->sync_bits, region); + + /* + * If the mirror was successfully recovered, we want to always + * force every machine to write to all devices - otherwise, + * corruption will occur. Here's how: + * Node1 suffers a failure and marks a region out-of-sync + * Node2 attempts a write, gets by is_remote_recovering, + * and queries the sync status of the region - finding + * it out-of-sync. + * Node2 thinks the write should be a nosync write, but it + * hasn't suffered the drive failure that Node1 has yet. + * It then issues a generic_make_request directly to + * the primary image only - which is exactly the device + * that has suffered the failure. + * Node2 suffers a lost write - which completely bypasses the + * mirror layer because it had gone through generic_m_r. + * The file system will likely explode at this point due to + * I/O errors. If it wasn't the primary that failed, it is + * easily possible in this case to issue writes to just one + * of the remaining images - also leaving the mirror inconsistent. + * + * We let in_sync() return 1 in a cluster regardless of what is + * in the bitmap once recovery has successfully completed on a + * mirror. This ensures the mirroring code will continue to + * attempt to write to all mirror images. The worst that can + * happen for reads is that additional read attempts may be + * taken. + * + * Futher investigation may be required to determine if there are + * similar possible outcomes when the mirror is in the process of + * recovering. In that case, lc->in_sync would not have been set + * yet. + */ + if (!*rtn && lc->in_sync) + *rtn = 1; + if (*rtn) LOG_DBG("[%s] Region is in-sync: %llu", SHORT_UUID(lc->uuid), (unsigned long long)region); @@ -1282,7 +1320,7 @@ lc->skip_bit_warning = lc->region_count; if (pkg->region > (lc->skip_bit_warning + 5)) { - LOG_ERROR("*** Region #%llu skipped during recovery ***", + LOG_SPRINT(lc, "*** Region #%llu skipped during recovery ***", (unsigned long long)lc->skip_bit_warning); lc->skip_bit_warning = lc->region_count; #ifdef DEBUG @@ -1324,6 +1362,9 @@ "(lc->sync_count > lc->region_count) - this is bad", rq->seq, SHORT_UUID(lc->uuid), originator); + if (lc->sync_count == lc->region_count) + lc->in_sync = 1; + rq->data_size = 0; return 0; }
next reply other threads:[~2010-08-17 23:56 UTC|newest] Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top 2010-08-17 23:56 jbrassow [this message] -- strict thread matches above, loose matches on Subject: below -- 2012-02-08 11:34 zkabelac 2011-09-06 18:24 zkabelac 2010-12-20 13:57 zkabelac 2010-08-30 18:37 jbrassow 2010-08-04 18:18 jbrassow 2010-06-18 20:58 jbrassow 2010-01-15 18:48 jbrassow
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20100817235625.9805.qmail@sourceware.org \ --to=jbrassow@sourceware.org \ --cc=lvm-devel@redhat.com \ --cc=lvm2-cvs@sourceware.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).