From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 31928 invoked by alias); 17 May 2011 14:26:24 -0000 Received: (qmail 31921 invoked by uid 22791); 17 May 2011 14:26:23 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL,BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,SPF_HELO_PASS,T_MIME_NO_TEXT,T_RP_MATCHES_RCVD,T_TVD_MIME_NO_HEADERS X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (74.125.121.67) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 17 May 2011 14:26:09 +0000 Received: from kpbe17.cbf.corp.google.com (kpbe17.cbf.corp.google.com [172.25.105.81]) by smtp-out.google.com with ESMTP id p4HEQ7JW003142 for ; Tue, 17 May 2011 07:26:07 -0700 Received: from pxi9 (pxi9.prod.google.com [10.243.27.9]) by kpbe17.cbf.corp.google.com with ESMTP id p4HEQ54g029087 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Tue, 17 May 2011 07:26:05 -0700 Received: by pxi9 with SMTP id 9so455524pxi.14 for ; Tue, 17 May 2011 07:26:05 -0700 (PDT) Received: by 10.68.48.129 with SMTP id l1mr1177245pbn.112.1305642365243; Tue, 17 May 2011 07:26:05 -0700 (PDT) Received: from coign.google.com ([67.218.110.18]) by mx.google.com with ESMTPS id u1sm403409pbm.41.2011.05.17.07.26.03 (version=TLSv1/SSLv3 cipher=OTHER); Tue, 17 May 2011 07:26:04 -0700 (PDT) From: Ian Lance Taylor To: overseers@gcc.gnu.org Subject: [Michael Matz] Re: Don't let search bots look at buglist.cgi Date: Tue, 17 May 2011 14:26:00 -0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-System-Of-Record: true Mailing-List: contact overseers-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: , Sender: overseers-owner@sourceware.org X-SW-Source: 2011-q2/txt/msg00074.txt.bz2 --=-=-= Content-length: 41 fche: a comment about MnoGoSearch. Ian --=-=-= Content-Type: message/rfc822 Content-Disposition: inline Content-length: 3056 X-From-Line: imap Tue May 17 06:14:25 2011 Delivered-To: iant@google.com Received: by 10.151.107.3 with SMTP id j3cs29765ybm; Tue, 17 May 2011 04:12:27 -0700 (PDT) Received: by 10.204.7.213 with SMTP id e21mr454878bke.209.1305630746201; Tue, 17 May 2011 04:12:26 -0700 (PDT) Return-Path: Received: from mx2.suse.de (cantor2.suse.de [195.135.220.15]) by mx.google.com with ESMTP id p6si1166339bkw.34.2011.05.17.04.12.24; Tue, 17 May 2011 04:12:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of matz@suse.de designates 195.135.220.15 as permitted sender) client-ip=195.135.220.15; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of matz@suse.de designates 195.135.220.15 as permitted sender) smtp.mail=matz@suse.de Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.221.2]) by mx2.suse.de (Postfix) with ESMTP id 46D4D87567; Tue, 17 May 2011 13:12:24 +0200 (CEST) Date: Tue, 17 May 2011 13:12:23 +0200 (CEST) From: Michael Matz To: Ian Lance Taylor Cc: Richard Guenther , Andrew Haley , gcc-patches@gcc.gnu.org Subject: Re: Don't let search bots look at buglist.cgi In-Reply-To: Message-ID: References: <4DD10623.40705@redhat.com> <4DD120F3.9050100@redhat.com> <4DD1240F.8080809@redhat.com> <4DD125C8.8090105@redhat.com> <4DD12802.4040705@redhat.com> Lines: 26 Xref: coign mail.misc:610484 MIME-Version: 1.0 Content-length: 1032 Hi, On Mon, 16 May 2011, Ian Lance Taylor wrote: > >>> httpd being in the top-10 always, fiddling with bugzilla URLs? > >>> (Note, I don't have access to gcc.gnu.org, I'm relaying info from > >>> multiple instances of discussion on #gcc and richi poking on it; > >>> that said, it still might not be web crawlers, that's right, but > >>> I'll happily accept > >>> _any_ load improvement on gcc.gnu.org, how unfounded they might seem) > > I think that simply blocking buglist.cgi has dropped bugzilla off the > immediate radar. It also seems to have lowered the load, although I'm > not sure if we are still keeping historical data. Btw. FWIW, I had a quick look at one of the httpd log files, and in seven hours on last Saturday (from 5:30 to 12:30), there were overall 435203 GET requests, and 391319 of them came from our own MnoGoSearch engine, that's 90%. Granted many are then in fact 304 (not modified) responses, but still, perhaps the eagerness of our own crawler can be turned down a bit. Ciao, Michael. --=-=-=--