public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
* web/665: gcc list archives for Jan 1999 still broken (fwd)
  2000-12-30  6:08 web/665: gcc list archives for Jan 1999 still broken (fwd) Gerald Pfeifer
@ 2000-11-09 18:04 ` Gerald Pfeifer
  2000-12-30  6:08 ` Jason Molenda
  1 sibling, 0 replies; 13+ messages in thread
From: Gerald Pfeifer @ 2000-11-09 18:04 UTC (permalink / raw)
  To: overseers

Can someone please have a look?

I believe Jason's mbox archives could be used to regenerate these
broken archives?

Gerald

---------- Forwarded message ----------
>Number:         665
>Category:       web
>Synopsis:       gcc list archives for Jan 1999 still broken
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    gerald
>State:          open
>Class:          doc-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Oct 20 16:36:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Joseph S. Myers
>Release:        2.97
>Organization:
none
>Environment:
System: Linux decomino 2.2.17 #1 Mon Sep 4 20:22:16 UTC 2000 i686 unknown
Architecture: i686


host: i686-pc-linux-gnu
build: i686-pc-linux-gnu
target: i686-pc-linux-gnu
>Description:

As I previously reported to gcc-bugs back in April, and still the case:

	The gcc list archives for January 1999
	( http://gcc.gnu.org/ml/gcc/1999-01n/ ) are broken.  Note the lack of any
	messages listed that month from before 20 January, and the inclusion of
	some messages from February 2000 and March 1998.

The strange dates may be due to broken clocks (perhaps regeneration
should list dates according to when the message arrived at the GCC
server instead of the Date header) and aren't that important, but the
lack of messages for much of January should be fixed.  (The FTP
archive for that month seems to have the same problem.)

>How-To-Repeat:

>Fix:

If someone has a personal archive of the gcc/egcs list for that month,
use it to replace the missing messages.

For the strange dates, regenerate the old archives (again) using a
different source for the date in the index, if this can be done
without changing the URLs to the archived messages.
>Release-Note:
>Audit-Trail:
>Unformatted:

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: web/665: gcc list archives for Jan 1999 still broken (fwd)
  2000-12-30  6:08 ` Jason Molenda
@ 2000-11-09 19:11   ` Jason Molenda
  2000-12-30  6:08   ` Gerald Pfeifer
  1 sibling, 0 replies; 13+ messages in thread
From: Jason Molenda @ 2000-11-09 19:11 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: overseers

On Fri, Nov 10, 2000 at 03:04:28AM +0100, Gerald Pfeifer wrote:
> Can someone please have a look?
>
> I believe Jason's mbox archives could be used to regenerate these
> broken archives?

My memory of the details from this timeframe is not good - everything
was happening very quickly back then.

The gcc list was one of the first mailing lists moved from @cygnus.com
to @sourceware (aka egcs aka gcc, etc).  JeffL did this because
the cygnus.com mail server was taking 24 hours to distribute mail
notes to the whole mailing list - it was a huge mess.

Jeff made the switch on Dec 7 1998.

I didn't have any mbox style list archiving in place until late
January (I'm betting Jan 20), so the egcs-1999-01 file only has
entries starting then.

My early archiving attemps were, um, not well very thought out.
The funky dates on there (and those weirdo anon notes at the front)
are at least partially due to this.  You'll see this in many of the
old web archives on gcc/sourceware until around May when I got that
cleaned up.


Now keep in mind, ezmlm maintains its own archives of all mail
notes sent to every list, and I've never deleted a single one of
those.  They aren't in mbox format, but they're pretty close - pipe
them through formail and they are acceptable (although I think the
date on the From_ header is the date that you did the processing).
The egcs-1999-01 mbox archive could be recreated from these ezmlm
archives.

In fact, it appears to me that egcs-1998-12 was created in May 1999
by me by running the archive messages through formail like that.
Up until Dec 4, the messages all have normal From_ headers - after
that, they're dated in May.

I probably didn't recreate the egcs-1999-01 archive out of laziness or
thinking it wasn't that big of a deal or something.  Or maybe I just
didn't notice that it was only two-thirds of the month.


Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: web/665: gcc list archives for Jan 1999 still broken (fwd)
  2000-12-30  6:08   ` Gerald Pfeifer
@ 2000-11-10 14:52     ` Gerald Pfeifer
  2000-12-30  6:08     ` Gerald Pfeifer
  1 sibling, 0 replies; 13+ messages in thread
From: Gerald Pfeifer @ 2000-11-10 14:52 UTC (permalink / raw)
  To: Jason Molenda; +Cc: overseers

On Thu, 9 Nov 2000, Jason Molenda wrote:
> In fact, it appears to me that egcs-1998-12 was created in May 1999
> by me by running the archive messages through formail like that.
> Up until Dec 4, the messages all have normal From_ headers - after
> that, they're dated in May.
>
> I probably didn't recreate the egcs-1999-01 archive out of laziness or
> thinking it wasn't that big of a deal or something.  Or maybe I just
> didn't notice that it was only two-thirds of the month.

So, could you (or someone else) regenerate that archive? :-)

Ya' know, there is this GNATS report about this that has been
auto-assigned to me. :-/

Gerald
-- 
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: web/665: gcc list archives for Jan 1999 still broken (fwd)
  2000-12-30  6:08     ` Gerald Pfeifer
@ 2000-11-24 11:46       ` Gerald Pfeifer
  2001-03-13 14:40       ` Gerald Pfeifer
  1 sibling, 0 replies; 13+ messages in thread
From: Gerald Pfeifer @ 2000-11-24 11:46 UTC (permalink / raw)
  To: overseers

I am not going to push anyone (let alone in any way that might be
interpretable as bossy ;-) ), but I'd really appreciate if someone
could help me with this.

So, a "Packerl Mozartkugeln" from the hear of Europe to whoever
regenerates those archives. :-)

Gerald

PS: I also realized now that not only the gcc list archive for 1999-01
but also 1999-02 needs to be regenerated. :-(

On Fri, 10 Nov 2000, Gerald Pfeifer wrote:
> On Thu, 9 Nov 2000, Jason Molenda wrote:
>> In fact, it appears to me that egcs-1998-12 was created in May 1999
>> by me by running the archive messages through formail like that.
>> Up until Dec 4, the messages all have normal From_ headers - after
>> that, they're dated in May.
>>
>> I probably didn't recreate the egcs-1999-01 archive out of laziness or
>> thinking it wasn't that big of a deal or something.  Or maybe I just
>> didn't notice that it was only two-thirds of the month.
>
> So, could you (or someone else) regenerate that archive? :-)
>
> Ya' know, there is this GNATS report about this that has been
> auto-assigned to me. :-/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* web/665: gcc list archives for Jan 1999 still broken (fwd)
@ 2000-12-30  6:08 Gerald Pfeifer
  2000-11-09 18:04 ` Gerald Pfeifer
  2000-12-30  6:08 ` Jason Molenda
  0 siblings, 2 replies; 13+ messages in thread
From: Gerald Pfeifer @ 2000-12-30  6:08 UTC (permalink / raw)
  To: overseers

Can someone please have a look?

I believe Jason's mbox archives could be used to regenerate these
broken archives?

Gerald

---------- Forwarded message ----------
>Number:         665
>Category:       web
>Synopsis:       gcc list archives for Jan 1999 still broken
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    gerald
>State:          open
>Class:          doc-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Oct 20 16:36:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     Joseph S. Myers
>Release:        2.97
>Organization:
none
>Environment:
System: Linux decomino 2.2.17 #1 Mon Sep 4 20:22:16 UTC 2000 i686 unknown
Architecture: i686


host: i686-pc-linux-gnu
build: i686-pc-linux-gnu
target: i686-pc-linux-gnu
>Description:

As I previously reported to gcc-bugs back in April, and still the case:

	The gcc list archives for January 1999
	( http://gcc.gnu.org/ml/gcc/1999-01n/ ) are broken.  Note the lack of any
	messages listed that month from before 20 January, and the inclusion of
	some messages from February 2000 and March 1998.

The strange dates may be due to broken clocks (perhaps regeneration
should list dates according to when the message arrived at the GCC
server instead of the Date header) and aren't that important, but the
lack of messages for much of January should be fixed.  (The FTP
archive for that month seems to have the same problem.)

>How-To-Repeat:

>Fix:

If someone has a personal archive of the gcc/egcs list for that month,
use it to replace the missing messages.

For the strange dates, regenerate the old archives (again) using a
different source for the date in the index, if this can be done
without changing the URLs to the archived messages.
>Release-Note:
>Audit-Trail:
>Unformatted:

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: web/665: gcc list archives for Jan 1999 still broken (fwd)
  2000-12-30  6:08   ` Gerald Pfeifer
  2000-11-10 14:52     ` Gerald Pfeifer
@ 2000-12-30  6:08     ` Gerald Pfeifer
  2000-11-24 11:46       ` Gerald Pfeifer
  2001-03-13 14:40       ` Gerald Pfeifer
  1 sibling, 2 replies; 13+ messages in thread
From: Gerald Pfeifer @ 2000-12-30  6:08 UTC (permalink / raw)
  To: overseers

I am not going to push anyone (let alone in any way that might be
interpretable as bossy ;-) ), but I'd really appreciate if someone
could help me with this.

So, a "Packerl Mozartkugeln" from the hear of Europe to whoever
regenerates those archives. :-)

Gerald

PS: I also realized now that not only the gcc list archive for 1999-01
but also 1999-02 needs to be regenerated. :-(

On Fri, 10 Nov 2000, Gerald Pfeifer wrote:
> On Thu, 9 Nov 2000, Jason Molenda wrote:
>> In fact, it appears to me that egcs-1998-12 was created in May 1999
>> by me by running the archive messages through formail like that.
>> Up until Dec 4, the messages all have normal From_ headers - after
>> that, they're dated in May.
>>
>> I probably didn't recreate the egcs-1999-01 archive out of laziness or
>> thinking it wasn't that big of a deal or something.  Or maybe I just
>> didn't notice that it was only two-thirds of the month.
>
> So, could you (or someone else) regenerate that archive? :-)
>
> Ya' know, there is this GNATS report about this that has been
> auto-assigned to me. :-/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: web/665: gcc list archives for Jan 1999 still broken (fwd)
  2000-12-30  6:08 web/665: gcc list archives for Jan 1999 still broken (fwd) Gerald Pfeifer
  2000-11-09 18:04 ` Gerald Pfeifer
@ 2000-12-30  6:08 ` Jason Molenda
  2000-11-09 19:11   ` Jason Molenda
  2000-12-30  6:08   ` Gerald Pfeifer
  1 sibling, 2 replies; 13+ messages in thread
From: Jason Molenda @ 2000-12-30  6:08 UTC (permalink / raw)
  To: Gerald Pfeifer; +Cc: overseers

On Fri, Nov 10, 2000 at 03:04:28AM +0100, Gerald Pfeifer wrote:
> Can someone please have a look?
>
> I believe Jason's mbox archives could be used to regenerate these
> broken archives?

My memory of the details from this timeframe is not good - everything
was happening very quickly back then.

The gcc list was one of the first mailing lists moved from @cygnus.com
to @sourceware (aka egcs aka gcc, etc).  JeffL did this because
the cygnus.com mail server was taking 24 hours to distribute mail
notes to the whole mailing list - it was a huge mess.

Jeff made the switch on Dec 7 1998.

I didn't have any mbox style list archiving in place until late
January (I'm betting Jan 20), so the egcs-1999-01 file only has
entries starting then.

My early archiving attemps were, um, not well very thought out.
The funky dates on there (and those weirdo anon notes at the front)
are at least partially due to this.  You'll see this in many of the
old web archives on gcc/sourceware until around May when I got that
cleaned up.


Now keep in mind, ezmlm maintains its own archives of all mail
notes sent to every list, and I've never deleted a single one of
those.  They aren't in mbox format, but they're pretty close - pipe
them through formail and they are acceptable (although I think the
date on the From_ header is the date that you did the processing).
The egcs-1999-01 mbox archive could be recreated from these ezmlm
archives.

In fact, it appears to me that egcs-1998-12 was created in May 1999
by me by running the archive messages through formail like that.
Up until Dec 4, the messages all have normal From_ headers - after
that, they're dated in May.

I probably didn't recreate the egcs-1999-01 archive out of laziness or
thinking it wasn't that big of a deal or something.  Or maybe I just
didn't notice that it was only two-thirds of the month.


Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: web/665: gcc list archives for Jan 1999 still broken (fwd)
  2000-12-30  6:08 ` Jason Molenda
  2000-11-09 19:11   ` Jason Molenda
@ 2000-12-30  6:08   ` Gerald Pfeifer
  2000-11-10 14:52     ` Gerald Pfeifer
  2000-12-30  6:08     ` Gerald Pfeifer
  1 sibling, 2 replies; 13+ messages in thread
From: Gerald Pfeifer @ 2000-12-30  6:08 UTC (permalink / raw)
  To: Jason Molenda; +Cc: overseers

On Thu, 9 Nov 2000, Jason Molenda wrote:
> In fact, it appears to me that egcs-1998-12 was created in May 1999
> by me by running the archive messages through formail like that.
> Up until Dec 4, the messages all have normal From_ headers - after
> that, they're dated in May.
>
> I probably didn't recreate the egcs-1999-01 archive out of laziness or
> thinking it wasn't that big of a deal or something.  Or maybe I just
> didn't notice that it was only two-thirds of the month.

So, could you (or someone else) regenerate that archive? :-)

Ya' know, there is this GNATS report about this that has been
auto-assigned to me. :-/

Gerald
-- 
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: web/665: gcc list archives for Jan 1999 still broken (fwd)
  2000-12-30  6:08     ` Gerald Pfeifer
  2000-11-24 11:46       ` Gerald Pfeifer
@ 2001-03-13 14:40       ` Gerald Pfeifer
  1 sibling, 0 replies; 13+ messages in thread
From: Gerald Pfeifer @ 2001-03-13 14:40 UTC (permalink / raw)
  To: overseers

I really hope we never experience a disk full situation, as apparently
we've none able/willing to regenerate mailing list archives. :-(

This has been unresolved for months. I'd do it by myself, but I just
don't have any capacity left.

Gerald

On Fri, 24 Nov 2000, Gerald Pfeifer wrote:
> I am not going to push anyone (let alone in any way that might be
> interpretable as bossy ;-) ), but I'd really appreciate if someone
> could help me with this.
>
> So, a "Packerl Mozartkugeln" from the hear of Europe to whoever
> regenerates those archives. :-)
>
> Gerald
>
> PS: I also realized now that not only the gcc list archive for 1999-01
> but also 1999-02 needs to be regenerated. :-(
>
> On Fri, 10 Nov 2000, Gerald Pfeifer wrote:
>> On Thu, 9 Nov 2000, Jason Molenda wrote:
>>> In fact, it appears to me that egcs-1998-12 was created in May 1999
>>> by me by running the archive messages through formail like that.
>>> Up until Dec 4, the messages all have normal From_ headers - after
>>> that, they're dated in May.
>>>
>>> I probably didn't recreate the egcs-1999-01 archive out of laziness or
>>> thinking it wasn't that big of a deal or something.  Or maybe I just
>>> didn't notice that it was only two-thirds of the month.
>>
>> So, could you (or someone else) regenerate that archive? :-)
>>
>> Ya' know, there is this GNATS report about this that has been
>> auto-assigned to me. :-/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: web/665: gcc list archives for Jan 1999 still broken (fwd)
@ 2001-12-31 19:40 Jason Molenda
  2001-03-18  1:34 ` Jason Molenda
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Molenda @ 2001-12-31 19:40 UTC (permalink / raw)
  To: overseers

Gerald made an offer I couldn't refuse, so I fixed this problem
for him.  I spent a few hours and fixed some other problems I saw.

   Moved libstdc++ and java mbox ftp archives to
   ~ftp/pub/gcc/mail-archives.

   Redirected future mbox archives so they go there too.

   Made gcc-prs a per-month archive (it is too high volume for the quarterly
   mode).  This will become visible on 2001-04-01.

   Stopped per-quarter archiving of libstdc++.  This should have been done
   back when the quarter-to-month conversion of that list was done.

   Moved gcc's year 2000 ftp-mbox mail archives into 2000 subdir.

   Created subdirs for gcc-regression and gcc-testresults so mbox ftp
   archives aren't dropped into /pub/gcc/mail-archives.

   Removed old error logs to reclaim a little bit of space on the web
   disk.  (it's still too tight)

   Started scrolling of ha.redhat.com* web logs.  The person who created
   these logs should have set this up, although I'll admit it took me
   a minute to figure out what two scripts needed tweaking to do it.

   Installed texi2html 1.64 in /usr/local/bin (Joseph wanted this).

   Upgraded mhonarc to the new version, 2.4.7 (minor fixes).

   Moved the libstdc++ and libg++ old software releases into 
   ~ftp/pub/gcc/old-releases under their own directories.  Same 
   thing for libgcj 2.95 releases.

   Removed a couple of old libg++ releases/snapshots/diffs from
   ~ftp/pub/gcc/infrastructure that are present in 
   ~ftp/pub/gcc/old-releases/libg++ (and were present in 
   ~ftp/pub/libstdc++/... in the past.)  I left the most recent
   release and the diffs added on to that release in ~ftp/pub/gcc/infra.



There's lots of little cleanup things that really need to be done on the 
system, but I would have been up all night going over the system.


<stating-the-obvious-soapbox>

Sourceware is a big system, but it generally runs itself just fine.
Anything this large changes over time, and periodically requires
fixes, maintenance, and tweaks.  It needs to be done by someone
with real attention to detail and thoroughness.  The current system
maintainers handle outright breakage, but that's not enough IMHO.

I know that all three of the maintainers are hella busy with their
jobs, their side-work in the evening on their software projects,
and maybe even a little time away from their computers.  There's
no time left over to notice that a mailing list is too high volume
for its current archiving, or that the mailing list archive software
or texi2html software could usefully be upgraded, or what have you.

If this system is going to continue to work smoothly over time,
time mujst be allocated for a suitably clever RH engineer to keep
and eye on the box and keep it tuned up.  Anything as dynamic and
large-scale as sourceware can't run on its own over a long period
of time without these kinds of periodic tweaks.

</stating-the-obvious-soapbox>

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: web/665: gcc list archives for Jan 1999 still broken (fwd)
  2001-03-19 14:14 Jason Molenda
@ 2001-03-20 11:27 ` Benjamin Kosnik
  0 siblings, 0 replies; 13+ messages in thread
From: Benjamin Kosnik @ 2001-03-20 11:27 UTC (permalink / raw)
  To: Jason Molenda; +Cc: overseers

>    1.  The high-volume date index for the web archiving is better
>        suited for lists with more than ~5 notes a day.  The "low-volume"
>        date index (used for quarterly/yearly lists) is better suited
>        for lists with fewer than that.  I originally tried to create a
>        single date index that looked good for both types of lists, but
>        it wasn't working out, so I ended up making two. [1]

right. FYI I think the two formats are a good balance, especially 
considering the projected usage of souceware when first deployed.

> [1]  There should really be four types of archives -- super-high volume,
>      high-volume, medium-volume, and low-volume -- corresponding to
>      "one dir per week", "one dir per month", "one dir per quarter", and
>      "one dir per year".  Some lists like gcc and cygwin are so high
>      volume that they'd be better suited by weekly or semi-monthly archives.
>      I'd still only have two types of date indexes, I don't think other
>      index layouts would be warranted.

... as somebody who reads all the gcc lists with the web archive, volume 
on these lists has definitely increased substantially in the last year. 
It feels that way to me, at least. It might make sense to try for a new 
format so that gcc-patches, et. al are easier to deal with. 

For these larger lists a threaded view by default might make sense.

-benjamin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: web/665: gcc list archives for Jan 1999 still broken (fwd)
@ 2001-03-19 14:14 Jason Molenda
  2001-03-20 11:27 ` Benjamin Kosnik
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Molenda @ 2001-03-19 14:14 UTC (permalink / raw)
  To: overseers

Chris asked me in direct e-mail why I changed the gcc-prs list from
quarterly to monthly.  It's a useful answer to share with other folks,
so here it is.

There are a few reasons:

   1.  The high-volume date index for the web archiving is better
       suited for lists with more than ~5 notes a day.  The "low-volume"
       date index (used for quarterly/yearly lists) is better suited
       for lists with fewer than that.  I originally tried to create a
       single date index that looked good for both types of lists, but
       it wasn't working out, so I ended up making two. [1]


[1]  There should really be four types of archives -- super-high volume,
     high-volume, medium-volume, and low-volume -- corresponding to
     "one dir per week", "one dir per month", "one dir per quarter", and
     "one dir per year".  Some lists like gcc and cygwin are so high
     volume that they'd be better suited by weekly or semi-monthly archives.
     I'd still only have two types of date indexes, I don't think other
     index layouts would be warranted.

   2.  The amount of time mhonarc needs to add a message to a web archive
       is directly proportional to the # of already-archived messages.
       When you get a couple thousand messages, it can take so long to
       add new messages that mhonarc can time out on waiting for a lock;
       messages can be omitted from the archive.  This is happening
       on the gcc-prs list - look at /www/gcc/ml/gcc-prs/Log and you'll
       see a few error messages from Mhonarc.  This is pretty bad -- the
       www-archives.sh script has mhonarc wait 20 seconds between trying
       to get a lock, and it retries 60 times -- a total of 20 minutes
       without being able to get a lock.    (it's not a strict FIFO wait
       to get a lock or anything, so presumably there were other processes
       grabbing the lock during that 20 minute period)

       This also adds a big CPU/disk load on sourceware.

   3.  As your HTML index pages get really long, it adds more load
       on the network as people download them all the time.  Network
       resources are (of course) precious, and the index is a very
       commonly downloaded item -- if it gets too big, you're
       wasting net bandwidth.  (and it is more annoying for users
       on slow links)

   4.  Medium- and low-volume lists have their archives put in 
       ~ftp/pub/XXX/mail-archives on a per-message basis as the messages
       are sent.  High-volume lists are staged in 
       /sourceware/projects/XXXX/mail-archives and then put in ~ftp when
       the month is over.

       The reason for this is pretty obvious - if you've got a really high
       volume list, you're wasting big-time network bandwidth when mirror
       sites download the ever-changing mbox-formatted file every time
       they connect.

       gcc-prs is especially guilty in this case because large output
       files are sent through the list.  Look at this:

          ~ftp/pub/gcc/mail-archives% ls -l gcc-prs/gcc-prs-2001-q1
-rw-rw-r--   1 listarch gcc      40552008 Mar 18 17:46 gcc-prs/gcc-prs-2001-q1
          ~ftp/pub/gcc/mail-archives%

       So every time a gcc mirror site connects, it has a 40MB uncompressed
       file to download.  (messages are being sent to this list all the
       time, so you're guaranteed that it's changed).  How expensive is
       this, exactly?

           % bzcat ftp-xferlog.11.01.bz2 | grep gcc-prs/gcc-prs-2001-q1 | 
                   awk '{n+=$8} END{printf ("%f\n",  n)}'
           7865394669.000000
           %

        7.86GB of the ftp traffic from last week was completely
        unnecessary and probably unintentional.  According to the
        bandwidth report here:

http://sourceware.cygnus.com/sourceware/bandwidth/bw-report-2001W11.txt

        All of the gcc ftp traffic from last week was 27GB.  i.e. this one
        mbox archive accounted for 26% of all the gcc ftp traffic last week.


As an aside, a really useful background task for a maintainer to
do is to look over the ftp downloads and see where the network
bandwith is going.  The bandwidth report is a nice hint, but it
can be really informative to say, "What files in the gcc dir are
responsible for the majority of the network bandwidth usage?"  ftp
is the largest bandwidth hog of all the services (we don't have
any accounting on cvs traffic) so it pays to look into where the
bytes are going.

Even once a system is at a colo, IIRC you'll be paying for your
network resources on a per-byte basis, so it'll pay off to keep a
close eye on the system for wasted bandwidth like this.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: web/665: gcc list archives for Jan 1999 still broken (fwd)
  2001-12-31 19:40 Jason Molenda
@ 2001-03-18  1:34 ` Jason Molenda
  0 siblings, 0 replies; 13+ messages in thread
From: Jason Molenda @ 2001-03-18  1:34 UTC (permalink / raw)
  To: overseers

Gerald made an offer I couldn't refuse, so I fixed this problem
for him.  I spent a few hours and fixed some other problems I saw.

   Moved libstdc++ and java mbox ftp archives to
   ~ftp/pub/gcc/mail-archives.

   Redirected future mbox archives so they go there too.

   Made gcc-prs a per-month archive (it is too high volume for the quarterly
   mode).  This will become visible on 2001-04-01.

   Stopped per-quarter archiving of libstdc++.  This should have been done
   back when the quarter-to-month conversion of that list was done.

   Moved gcc's year 2000 ftp-mbox mail archives into 2000 subdir.

   Created subdirs for gcc-regression and gcc-testresults so mbox ftp
   archives aren't dropped into /pub/gcc/mail-archives.

   Removed old error logs to reclaim a little bit of space on the web
   disk.  (it's still too tight)

   Started scrolling of ha.redhat.com* web logs.  The person who created
   these logs should have set this up, although I'll admit it took me
   a minute to figure out what two scripts needed tweaking to do it.

   Installed texi2html 1.64 in /usr/local/bin (Joseph wanted this).

   Upgraded mhonarc to the new version, 2.4.7 (minor fixes).

   Moved the libstdc++ and libg++ old software releases into 
   ~ftp/pub/gcc/old-releases under their own directories.  Same 
   thing for libgcj 2.95 releases.

   Removed a couple of old libg++ releases/snapshots/diffs from
   ~ftp/pub/gcc/infrastructure that are present in 
   ~ftp/pub/gcc/old-releases/libg++ (and were present in 
   ~ftp/pub/libstdc++/... in the past.)  I left the most recent
   release and the diffs added on to that release in ~ftp/pub/gcc/infra.



There's lots of little cleanup things that really need to be done on the 
system, but I would have been up all night going over the system.


<stating-the-obvious-soapbox>

Sourceware is a big system, but it generally runs itself just fine.
Anything this large changes over time, and periodically requires
fixes, maintenance, and tweaks.  It needs to be done by someone
with real attention to detail and thoroughness.  The current system
maintainers handle outright breakage, but that's not enough IMHO.

I know that all three of the maintainers are hella busy with their
jobs, their side-work in the evening on their software projects,
and maybe even a little time away from their computers.  There's
no time left over to notice that a mailing list is too high volume
for its current archiving, or that the mailing list archive software
or texi2html software could usefully be upgraded, or what have you.

If this system is going to continue to work smoothly over time,
time mujst be allocated for a suitably clever RH engineer to keep
and eye on the box and keep it tuned up.  Anything as dynamic and
large-scale as sourceware can't run on its own over a long period
of time without these kinds of periodic tweaks.

</stating-the-obvious-soapbox>

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2001-12-31 19:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-12-30  6:08 web/665: gcc list archives for Jan 1999 still broken (fwd) Gerald Pfeifer
2000-11-09 18:04 ` Gerald Pfeifer
2000-12-30  6:08 ` Jason Molenda
2000-11-09 19:11   ` Jason Molenda
2000-12-30  6:08   ` Gerald Pfeifer
2000-11-10 14:52     ` Gerald Pfeifer
2000-12-30  6:08     ` Gerald Pfeifer
2000-11-24 11:46       ` Gerald Pfeifer
2001-03-13 14:40       ` Gerald Pfeifer
2001-03-19 14:14 Jason Molenda
2001-03-20 11:27 ` Benjamin Kosnik
2001-12-31 19:40 Jason Molenda
2001-03-18  1:34 ` Jason Molenda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).