From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <overseers-return-6732-listarch-overseers=sources.redhat.com@sources.redhat.com>
Received: (qmail 9789 invoked by alias); 5 Apr 2004 22:51:47 -0000
Mailing-List: contact overseers-help@sources.redhat.com; run by ezmlm
Precedence: bulk
List-Archive: <http://sources.redhat.com/ml/overseers/>
List-Post: <mailto:overseers@sources.redhat.com>
List-Help: <mailto:overseers-help@sources.redhat.com>,
	<http://sources.redhat.com/ml/#faqs>
Sender: overseers-owner@sources.redhat.com
Received: (qmail 9693 invoked from network); 5 Apr 2004 22:51:43 -0000
Received: from unknown (HELO molenda.com) (192.220.74.81)
  by sources.redhat.com with SMTP; 5 Apr 2004 22:51:43 -0000
Received: (qmail 52228 invoked by uid 19025); 5 Apr 2004 22:51:43 -0000
Date: Mon, 05 Apr 2004 22:51:00 -0000
From: Jason Molenda <jason-swarelist@molenda.com>
To: overseers@sources.redhat.com
Subject: Re: htdig and sources.redhat.com loadavg
Message-ID: <20040405155143.A42999@molenda.com>
References: <200404051849.i35InoT27980@makai.watson.ibm.com> <20040405205147.GA21949@coc.bosbc.com> <200404052103.i35L3mT30622@makai.watson.ibm.com> <m3y8pab0al.fsf@gossamer.airs.com> <ian@airs.com> <200404052114.i35LEKT28958@makai.watson.ibm.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5.1i
In-Reply-To: <200404052114.i35LEKT28958@makai.watson.ibm.com>; from dje@watson.ibm.com on Mon, Apr 05, 2004 at 05:14:20PM -0400
X-SW-Source: 2004-q2/txt/msg00051.txt.bz2

Hi all, sorry I was in meetings all afternoon - just catching
up.


On Mon, Apr 05, 2004 at 05:14:20PM -0400, David Edelsohn wrote:

> 	I am guessing that htdig pushed sourceware into a thrashing mode.
> Because CVS doesn't time out or service requests sequentially, people are
> just hanging around with long CVS operations while they do other things.
> sourceware needs workload management.


The schedule of crontabs is rather carefully considered.  The
service load on sourceware variest considerably by hour-of-the-day --
Monday mornings EST happen to be the highest load of the week.
Sunday is the lowest load.

The htdig index update jobs are scheduled so they finish before
the weekday morning load happens.  When something goes wrong - as
it did in this case - and htdig runs into the morning rush, the
system trashes for quite a long time.


Matthew Galgoci wrote;

> Ideally the searching should be offloaded to another machine that
> is dedicated to the purpose of indexing and running the search
> database. 

The reason for keeping the search engine on the main system is
that the search engine has direct access to the files; it doesn't
have to go through httpd.  NFS could be used to access the files
from a different system, but you're still introducing a slowdown
by not having local access.

I'm not trying to preclude such a change, I'm just pointing out
the thinking behind the current arrangement.

(well, and the fact that we only had one computer allocated
for the original sourceware system.)


No one has mentioned my favorite possibility:  Not archiving older
e-mail notes.  Or having multiple search archives, divided by time
period.  e.g. epoch - 2001.  2001 - 2003.  2004 - ...

I can't remember if there's a good reason to not do this.  It seems
like a good idea to me, with the obvious caveat that this complicates
the web search engine UI.

J