public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
* A time to pause and fsck
@ 2004-04-08 19:02 Christopher Faylor
  2004-04-08 19:12 ` law
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Christopher Faylor @ 2004-04-08 19:02 UTC (permalink / raw)
  To: overseers

We'd like to take the system down for a through fscking sometime
soon.  Unfortunately, the only time we really have anyone available
to do that is in the afternoon EDT.

So, I'm proposing 2PM EDT 2004-04-15 for a few hours of down time, i.e.

  Thu 2004-04-15 18:00 GMT

  Thu 2004-04-15 11:00 US/Pacific
  Thu 2004-04-15 11:00 US/Arizona
  Thu 2004-04-15 12:00 US/Mountain
  Thu 2004-04-15 13:00 US/Central
  Thu 2004-04-15 14:00 US/Eastern
  Thu 2004-04-15 14:00 Canada/Eastern
  Thu 2004-04-15 15:00 America/Sao_Paulo
  Thu 2004-04-15 19:00 Europe/London
  Thu 2004-04-15 20:00 Europe/Berlin
  Fri 2004-04-16 04:00 Australia/Victoria
  Fri 2004-04-16 04:00 Australia/Sydney
  Fri 2004-04-16 03:00 Japan

I know that this isn't the best of times for this interruption but it
is pretty important that we check out the health of the system disks.

Objections?

cgf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A time to pause and fsck
  2004-04-08 19:02 A time to pause and fsck Christopher Faylor
  2004-04-08 19:12 ` law
@ 2004-04-08 19:12 ` David Edelsohn
  2004-04-08 19:58   ` Christopher Faylor
  2004-04-08 19:14 ` Matthew Galgoci
  2 siblings, 1 reply; 14+ messages in thread
From: David Edelsohn @ 2004-04-08 19:12 UTC (permalink / raw)
  To: overseers

	Do we know that the 3 hour reboot this morning did not fsck the
disks?  Why else would it take that long?

	Also, I think this should be done much sooner than next week.
2004-04-15 is around the time when GCC 3.4.0 might be released.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A time to pause and fsck
  2004-04-08 19:02 A time to pause and fsck Christopher Faylor
@ 2004-04-08 19:12 ` law
  2004-04-08 19:12 ` David Edelsohn
  2004-04-08 19:14 ` Matthew Galgoci
  2 siblings, 0 replies; 14+ messages in thread
From: law @ 2004-04-08 19:12 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: overseers

In message <20040408185310.GA13499@coc.bosbc.com>, Christopher Faylor writes:
 >We'd like to take the system down for a through fscking sometime
 >soon.  Unfortunately, the only time we really have anyone available
 >to do that is in the afternoon EDT.
 >
 >So, I'm proposing 2PM EDT 2004-04-15 for a few hours of down time, i.e.
 >
 >  Thu 2004-04-15 18:00 GMT
 >
 >  Thu 2004-04-15 11:00 US/Pacific
 >  Thu 2004-04-15 11:00 US/Arizona
 >  Thu 2004-04-15 12:00 US/Mountain
 >  Thu 2004-04-15 13:00 US/Central
 >  Thu 2004-04-15 14:00 US/Eastern
 >  Thu 2004-04-15 14:00 Canada/Eastern
 >  Thu 2004-04-15 15:00 America/Sao_Paulo
 >  Thu 2004-04-15 19:00 Europe/London
 >  Thu 2004-04-15 20:00 Europe/Berlin
 >  Fri 2004-04-16 04:00 Australia/Victoria
 >  Fri 2004-04-16 04:00 Australia/Sydney
 >  Fri 2004-04-16 03:00 Japan
 >
 >I know that this isn't the best of times for this interruption but it
 >is pretty important that we check out the health of the system disks.
 >
 >Objections?
None.  Please take it down and get it checked out -- better to be offline
for a few hours this afternoon and ensure things are OK than to leave it
running and risk something getting horked in a bad way.

jeff


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A time to pause and fsck
  2004-04-08 19:02 A time to pause and fsck Christopher Faylor
  2004-04-08 19:12 ` law
  2004-04-08 19:12 ` David Edelsohn
@ 2004-04-08 19:14 ` Matthew Galgoci
  2004-04-08 19:17   ` Matthew Galgoci
  2004-04-08 19:55   ` Christopher Faylor
  2 siblings, 2 replies; 14+ messages in thread
From: Matthew Galgoci @ 2004-04-08 19:14 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: overseers


I know it is short notice, but given the nature of the beast, I would
suggest something sooner. I planned on working tomorrow (a holiday for
many) and I can also do the fsck at that time.

Translation:

Thu 2004-04-9 18:00 GMT
Thu 2004-04-9 11:00 US/Pacific
Thu 2004-04-9 11:00 US/Arizona
Thu 2004-04-9 12:00 US/Mountain
Thu 2004-04-9 13:00 US/Central
Thu 2004-04-9 14:00 US/Eastern
Thu 2004-04-9 14:00 Canada/Eastern
Thu 2004-04-9 9:00 America/Sao_Paulo
Thu 2004-04-9 19:00 Europe/London
Thu 2004-04-9 20:00 Europe/Berlin
Fri 2004-04-10 04:00 Australia/Victoria
Fri 2004-04-10 04:00 Australia/Sydney
Fri 2004-04-10 03:00 Japan

Filesystem corruption only gets worse as time goes on ;(

On Thu, 8 Apr 2004, Christopher Faylor wrote:

> We'd like to take the system down for a through fscking sometime
> soon.  Unfortunately, the only time we really have anyone available
> to do that is in the afternoon EDT.
> 
> So, I'm proposing 2PM EDT 2004-04-15 for a few hours of down time, i.e.
> 
>   Thu 2004-04-15 18:00 GMT
> 
>   Thu 2004-04-15 11:00 US/Pacific
>   Thu 2004-04-15 11:00 US/Arizona
>   Thu 2004-04-15 12:00 US/Mountain
>   Thu 2004-04-15 13:00 US/Central
>   Thu 2004-04-15 14:00 US/Eastern
>   Thu 2004-04-15 14:00 Canada/Eastern
>   Thu 2004-04-15 15:00 America/Sao_Paulo
>   Thu 2004-04-15 19:00 Europe/London
>   Thu 2004-04-15 20:00 Europe/Berlin
>   Fri 2004-04-16 04:00 Australia/Victoria
>   Fri 2004-04-16 04:00 Australia/Sydney
>   Fri 2004-04-16 03:00 Japan
> 
> I know that this isn't the best of times for this interruption but it
> is pretty important that we check out the health of the system disks.
> 
> Objections?
> 
> cgf
> 

-- 
Matthew Galgoci
System Administrator and Sr. Manager of Ruminants
Red Hat, Inc
919.754.3700 x44155

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A time to pause and fsck
  2004-04-08 19:14 ` Matthew Galgoci
@ 2004-04-08 19:17   ` Matthew Galgoci
  2004-04-08 19:55   ` Christopher Faylor
  1 sibling, 0 replies; 14+ messages in thread
From: Matthew Galgoci @ 2004-04-08 19:17 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: overseers


It's been a long day. Sigh.
 
> Translation:
> 
> Fri 2004-04-9 18:00 GMT
> Fri 2004-04-9 11:00 US/Pacific
> Fri 2004-04-9 11:00 US/Arizona
> Fri 2004-04-9 12:00 US/Mountain
> Fri 2004-04-9 13:00 US/Central
> Fri 2004-04-9 14:00 US/Eastern
> Fri 2004-04-9 14:00 Canada/Eastern
> Fri 2004-04-9 9:00 America/Sao_Paulo
> Fri 2004-04-9 19:00 Europe/London
> Fri 2004-04-9 20:00 Europe/Berlin
> Sat 2004-04-10 04:00 Australia/Victoria
> Sat 2004-04-10 04:00 Australia/Sydney
> Sat 2004-04-10 03:00 Japan
 
> Filesystem corruption only gets worse as time goes on ;(

-- 
Matthew Galgoci
System Administrator and Sr. Manager of Ruminants
Red Hat, Inc
919.754.3700 x44155

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A time to pause and fsck
  2004-04-08 19:14 ` Matthew Galgoci
  2004-04-08 19:17   ` Matthew Galgoci
@ 2004-04-08 19:55   ` Christopher Faylor
  1 sibling, 0 replies; 14+ messages in thread
From: Christopher Faylor @ 2004-04-08 19:55 UTC (permalink / raw)
  To: Matthew Galgoci; +Cc: overseers

On Thu, Apr 08, 2004 at 03:14:14PM -0400, Matthew Galgoci wrote:
>I know it is short notice, but given the nature of the beast, I would
>suggest something sooner. I planned on working tomorrow (a holiday for
>many) and I can also do the fsck at that time.

Sooner is fine with me.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A time to pause and fsck
  2004-04-08 19:12 ` David Edelsohn
@ 2004-04-08 19:58   ` Christopher Faylor
  2004-04-08 20:10     ` Matthew Galgoci
  0 siblings, 1 reply; 14+ messages in thread
From: Christopher Faylor @ 2004-04-08 19:58 UTC (permalink / raw)
  To: David Edelsohn; +Cc: overseers

On Thu, Apr 08, 2004 at 03:12:20PM -0400, David Edelsohn wrote:
>Do we know that the 3 hour reboot this morning did not fsck the disks?
>Why else would it take that long?

It didn't.  I should have figured this out from /var/log/messages, actually.

The system was actually down for three hours.

>Also, I think this should be done much sooner than next week.
>2004-04-15 is around the time when GCC 3.4.0 might be released.

Ok.  Given that the system seems to behaving now and that we often get
complaints when we bring down the system without enough advance notice,
I obviously erred too much in the side of advance notice.

I, too, would prefer that it be fscked ASAP.

cgf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A time to pause and fsck
  2004-04-08 19:58   ` Christopher Faylor
@ 2004-04-08 20:10     ` Matthew Galgoci
  2004-04-08 20:51       ` law
  0 siblings, 1 reply; 14+ messages in thread
From: Matthew Galgoci @ 2004-04-08 20:10 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: David Edelsohn, overseers


I'm happy to fsck it right now if I have the go-ahead.

On Thu, 8 Apr 2004, Christopher Faylor wrote:

> On Thu, Apr 08, 2004 at 03:12:20PM -0400, David Edelsohn wrote:
> >Do we know that the 3 hour reboot this morning did not fsck the disks?
> >Why else would it take that long?
> 
> It didn't.  I should have figured this out from /var/log/messages, actually.
> 
> The system was actually down for three hours.
> 
> >Also, I think this should be done much sooner than next week.
> >2004-04-15 is around the time when GCC 3.4.0 might be released.
> 
> Ok.  Given that the system seems to behaving now and that we often get
> complaints when we bring down the system without enough advance notice,
> I obviously erred too much in the side of advance notice.
> 
> I, too, would prefer that it be fscked ASAP.

-- 
Matthew Galgoci
System Administrator and Sr. Manager of Ruminants
Red Hat, Inc
919.754.3700 x44155

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A time to pause and fsck
  2004-04-08 20:10     ` Matthew Galgoci
@ 2004-04-08 20:51       ` law
  2004-04-08 22:31         ` Matthew Galgoci
  2004-04-08 22:38         ` Angela Marie Thomas
  0 siblings, 2 replies; 14+ messages in thread
From: law @ 2004-04-08 20:51 UTC (permalink / raw)
  To: Matthew Galgoci; +Cc: Christopher Faylor, David Edelsohn, overseers

In message <Pine.LNX.4.44.0404081610120.447-100000@lacrosse.corp.redhat.com>, M
atthew Galgoci writes:
 >
 >I'm happy to fsck it right now if I have the go-ahead.
That's fine by me.

I'd suggest that if there aren't any objections within the next 30 minutes
that you go ahead with the fsck.  Again, I'd rather be safe than sorry given
the problems we're having.

jeff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A time to pause and fsck
  2004-04-08 20:51       ` law
@ 2004-04-08 22:31         ` Matthew Galgoci
  2004-04-08 22:56           ` Christopher Faylor
  2004-04-08 22:38         ` Angela Marie Thomas
  1 sibling, 1 reply; 14+ messages in thread
From: Matthew Galgoci @ 2004-04-08 22:31 UTC (permalink / raw)
  To: law; +Cc: Christopher Faylor, David Edelsohn, overseers

>  >I'm happy to fsck it right now if I have the go-ahead.
> That's fine by me.
> 
> I'd suggest that if there aren't any objections within the next 30 minutes
> that you go ahead with the fsck.  Again, I'd rather be safe than sorry given
> the problems we're having.

Ok, this is done.

Apparently forcefsck doesn't seem to work for logical volumes. I had to drop the machine
to single user mode, unmount everything a lv at a time, and fsck by hand.

cvs-temp was indeed dirty, requiring multiple passes of e2fsck. Any inodes that were orphaned
were connected to lost+found.

/home also had a few deleted inodes that needed to be cleared.

Recommendation: after each crash, the system needs to be fscked by hand.

-- 
Matthew Galgoci
System Administrator and Sr. Manager of Ruminants
Red Hat, Inc
919.754.3700 x44155

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A time to pause and fsck
  2004-04-08 20:51       ` law
  2004-04-08 22:31         ` Matthew Galgoci
@ 2004-04-08 22:38         ` Angela Marie Thomas
  1 sibling, 0 replies; 14+ messages in thread
From: Angela Marie Thomas @ 2004-04-08 22:38 UTC (permalink / raw)
  To: law; +Cc: Matthew Galgoci, Christopher Faylor, David Edelsohn, overseers


> I'd suggest that if there aren't any objections within the next 30 minutes
> that you go ahead with the fsck.  Again, I'd rather be safe than sorry given
> the problems we're having.

I don't have an objection, but given the number of tags created,
would it be worthwhile to do the fsck immediately after I have a
more up-to-date snapshot of /sourceware/projects?

--Angela

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A time to pause and fsck
  2004-04-08 22:31         ` Matthew Galgoci
@ 2004-04-08 22:56           ` Christopher Faylor
  2004-04-09  0:20             ` Angela Marie Thomas
  0 siblings, 1 reply; 14+ messages in thread
From: Christopher Faylor @ 2004-04-08 22:56 UTC (permalink / raw)
  To: Matthew Galgoci; +Cc: law, David Edelsohn, overseers

On Thu, Apr 08, 2004 at 06:31:48PM -0400, Matthew Galgoci wrote:
>>  >I'm happy to fsck it right now if I have the go-ahead.
>> That's fine by me.
>> 
>> I'd suggest that if there aren't any objections within the next 30 minutes
>> that you go ahead with the fsck.  Again, I'd rather be safe than sorry given
>> the problems we're having.
>
>Ok, this is done.
>
>Apparently forcefsck doesn't seem to work for logical volumes. I had to drop the machine
>to single user mode, unmount everything a lv at a time, and fsck by hand.
>
>cvs-temp was indeed dirty, requiring multiple passes of e2fsck. Any inodes that were orphaned
>were connected to lost+found.

Thanks, Matt.

The fsck problem is due to the way fstab is laid out with labels rather
than partitions, IIRC.  I keep expecting that some new e2fsprogs update
will fix this.

One thing I was thinking about was just doing a mke2fs on cvs-temp on
every reboot.  There is nothing on that partition worth keeping and it
just gets wiped out at every reboot anyway.

>/home also had a few deleted inodes that needed to be cleared.
>
>Recommendation: after each crash, the system needs to be fscked by hand.

Huh.  It's surprising that it was /home that needed cleaning since I would expect
that to be relatively stable.

cgf

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A time to pause and fsck
  2004-04-08 22:56           ` Christopher Faylor
@ 2004-04-09  0:20             ` Angela Marie Thomas
  2004-04-13 20:19               ` Frank Ch. Eigler
  0 siblings, 1 reply; 14+ messages in thread
From: Angela Marie Thomas @ 2004-04-09  0:20 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: Matthew Galgoci, law, David Edelsohn, overseers


> Huh.  It's surprising that it was /home that needed cleaning since I would ex
pect
> that to be relatively stable.

Ben's still writing his sql data to his homedir.  That keeps home
fairly active.

--Angela

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: A time to pause and fsck
  2004-04-09  0:20             ` Angela Marie Thomas
@ 2004-04-13 20:19               ` Frank Ch. Eigler
  0 siblings, 0 replies; 14+ messages in thread
From: Frank Ch. Eigler @ 2004-04-13 20:19 UTC (permalink / raw)
  To: angela; +Cc: overseers, bje

[-- Attachment #1: Type: text/plain, Size: 553 bytes --]

Hi -

On Thu, Apr 08, 2004 at 05:36:21PM -0700, Angela Marie Thomas wrote:
> > Huh.  It's surprising that it was /home that needed cleaning since
> > I would expect that to be relatively stable.
> 
> Ben's still writing his sql data to his homedir.  That keeps home
> fairly active.

According to bje, it's an accident that this is currently running.
I'll zap it wherever I find it.  However, he says he'd like to find
some official host for the database back-end & web front-end for the
service, so it can come back up sometime.

- FChE

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2004-04-13 18:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-08 19:02 A time to pause and fsck Christopher Faylor
2004-04-08 19:12 ` law
2004-04-08 19:12 ` David Edelsohn
2004-04-08 19:58   ` Christopher Faylor
2004-04-08 20:10     ` Matthew Galgoci
2004-04-08 20:51       ` law
2004-04-08 22:31         ` Matthew Galgoci
2004-04-08 22:56           ` Christopher Faylor
2004-04-09  0:20             ` Angela Marie Thomas
2004-04-13 20:19               ` Frank Ch. Eigler
2004-04-08 22:38         ` Angela Marie Thomas
2004-04-08 19:14 ` Matthew Galgoci
2004-04-08 19:17   ` Matthew Galgoci
2004-04-08 19:55   ` Christopher Faylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).