From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9789 invoked by alias); 5 Apr 2004 22:51:47 -0000 Mailing-List: contact overseers-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Archive: List-Post: List-Help: , Sender: overseers-owner@sources.redhat.com Received: (qmail 9693 invoked from network); 5 Apr 2004 22:51:43 -0000 Received: from unknown (HELO molenda.com) (192.220.74.81) by sources.redhat.com with SMTP; 5 Apr 2004 22:51:43 -0000 Received: (qmail 52228 invoked by uid 19025); 5 Apr 2004 22:51:43 -0000 Date: Mon, 05 Apr 2004 22:51:00 -0000 From: Jason Molenda To: overseers@sources.redhat.com Subject: Re: htdig and sources.redhat.com loadavg Message-ID: <20040405155143.A42999@molenda.com> References: <200404051849.i35InoT27980@makai.watson.ibm.com> <20040405205147.GA21949@coc.bosbc.com> <200404052103.i35L3mT30622@makai.watson.ibm.com> <200404052114.i35LEKT28958@makai.watson.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5.1i In-Reply-To: <200404052114.i35LEKT28958@makai.watson.ibm.com>; from dje@watson.ibm.com on Mon, Apr 05, 2004 at 05:14:20PM -0400 X-SW-Source: 2004-q2/txt/msg00051.txt.bz2 Hi all, sorry I was in meetings all afternoon - just catching up. On Mon, Apr 05, 2004 at 05:14:20PM -0400, David Edelsohn wrote: > I am guessing that htdig pushed sourceware into a thrashing mode. > Because CVS doesn't time out or service requests sequentially, people are > just hanging around with long CVS operations while they do other things. > sourceware needs workload management. The schedule of crontabs is rather carefully considered. The service load on sourceware variest considerably by hour-of-the-day -- Monday mornings EST happen to be the highest load of the week. Sunday is the lowest load. The htdig index update jobs are scheduled so they finish before the weekday morning load happens. When something goes wrong - as it did in this case - and htdig runs into the morning rush, the system trashes for quite a long time. Matthew Galgoci wrote; > Ideally the searching should be offloaded to another machine that > is dedicated to the purpose of indexing and running the search > database. The reason for keeping the search engine on the main system is that the search engine has direct access to the files; it doesn't have to go through httpd. NFS could be used to access the files from a different system, but you're still introducing a slowdown by not having local access. I'm not trying to preclude such a change, I'm just pointing out the thinking behind the current arrangement. (well, and the fact that we only had one computer allocated for the original sourceware system.) No one has mentioned my favorite possibility: Not archiving older e-mail notes. Or having multiple search archives, divided by time period. e.g. epoch - 2001. 2001 - 2003. 2004 - ... I can't remember if there's a good reason to not do this. It seems like a good idea to me, with the obvious caveat that this complicates the web search engine UI. J