public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
From: Joseph Myers <joseph@codesourcery.com>
To: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Sourceware Overseers <overseers@sourceware.org>
Subject: Re: ongoing sourceware.org recovery from disk corruption
Date: Tue, 15 Aug 2017 14:02:00 -0000	[thread overview]
Message-ID: <alpine.DEB.2.20.1708151352020.10784@digraph.polyomino.org.uk> (raw)
In-Reply-To: <20170815133516.GH18258@redhat.com>

On Tue, 15 Aug 2017, Frank Ch. Eigler wrote:

> So we're proceeding to restore bits, file by file, when/as corruption
> is found.  It's silly laborious, and we'll appreciate your patience
> and help identifying affected files.  The version control repositories
> appear fine now, /ftp is getting mass-restored (since it's apprx. all
> old), so the most important stuff seems OK.  There are reports of some
> mailing list archives and wiki pages being broken; will look at those
> next.  Please come hang out on #overseers on irc.freenode.net to chat.

Suggested general comparison methodology (for the whole backup against 
live data, not just areas we know to be broken, and with due disregard of 
any areas we are confident have been safely fixed now - and of course any 
areas of old files that shouldn't change at all can just be restored with 
rsync -c without such comparisons needed):

* If the file contents have changed but the timestamp hasn't, it's almost 
certainly corruption (especially if the changes involve blocks of NUL 
bytes) and restoring the old version makes sense.

* If a file has disappeared, it's very probably corruption and restoring 
the old file is very probably safe (it *could* be a temporary file 
captured by the backup, or something that was properly removed, but more 
likely something lost by metadata corruption).

* If a file is new since the last backup, it may or may not also be 
corrupted, but there's not much that can be done about it if it is 
corrupted (except in special cases, e.g. getting copies of release files 
or version control data from elsewhere).

* If a file's contents and timestamp have both changed, it may or may not 
also be corrupted, and if it is it may or may not make sense to restore 
from backup.  Depending on how many such files there are, we may need to 
consider case by case what should be done for particular kinds of files 
(e.g. if they should be text, checking for NUL bytes would help indicate 
corruption).

-- 
Joseph S. Myers
joseph@codesourcery.com

  reply	other threads:[~2017-08-15 14:02 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-15 13:35 Frank Ch. Eigler
2017-08-15 14:02 ` Joseph Myers [this message]
2017-08-16 13:35   ` Joseph Myers
2017-08-18 22:30     ` Joseph Myers
2017-08-21 11:06       ` Joseph Myers
2017-08-15 16:23 ` Joseph Myers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1708151352020.10784@digraph.polyomino.org.uk \
    --to=joseph@codesourcery.com \
    --cc=fche@redhat.com \
    --cc=overseers@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).