From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17905 invoked by alias); 17 May 2011 08:12:22 -0000 Received: (qmail 17895 invoked by uid 22791); 17 May 2011 08:12:19 -0000 X-SWARE-Spam-Status: No, hits=-1.8 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,TW_WV,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: sourceware.org Received: from mailout-de.gmx.net (HELO mailout-de.gmx.net) (213.165.64.22) by sourceware.org (qpsmtpd/0.43rc1) with SMTP; Tue, 17 May 2011 08:12:04 +0000 Received: (qmail invoked by alias); 17 May 2011 08:12:02 -0000 Received: from LN-mac29.grenoble.cnrs.fr (EHLO axel) [147.173.67.29] by mail.gmx.net (mp054) with SMTP; 17 May 2011 10:12:02 +0200 Date: Tue, 17 May 2011 11:12:00 -0000 From: Axel Freyn To: gcc-patches@gcc.gnu.org Subject: Re: Don't let search bots look at buglist.cgi Message-ID: <20110517081200.GP10499@axel> References: <4DD120F3.9050100@redhat.com> <4DD1240F.8080809@redhat.com> <4DD125C8.8090105@redhat.com> <4DD12802.4040705@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-05/txt/msg01203.txt.bz2 On Mon, May 16, 2011 at 10:27:44PM -0700, Ian Lance Taylor wrote: > On Mon, May 16, 2011 at 6:42 AM, Richard Guenther > wrote: > >>> > >>> httpd being in the top-10 always, fiddling with bugzilla URLs? > >>> (Note, I don't have access to gcc.gnu.org, I'm relaying info from multiple > >>> instances of discussion on #gcc and richi poking on it; that said, it > >>> still might not be web crawlers, that's right, but I'll happily accept > >>> _any_ load improvement on gcc.gnu.org, how unfounded they might seem) > > I think that simply blocking buglist.cgi has dropped bugzilla off the > immediate radar. > It also seems to have lowered the load, although I'm not sure if we > are still keeping > historical data. > > > > I for example see also > > > > 66.249.71.59 - - [16/May/2011:13:37:58 +0000] "GET > > /viewcvs?view=revision&revision=169814 HTTP/1.1" 200 1334 "-" > > "Mozilla/5.0 (compatible; Googlebot/2.1; > > +http://www.google.com/bot.html)" (35%) 2060117us > > > > and viewvc is certainly even worse (from an I/O perspecive).  I thought > > we blocked all bot traffic from the viewvc stuff ... > > This is only happening at top level. I committed this patch to fix this. Probably you know it much better than me, but wouldn't it be a possibility to only allow some of google crawlers? (if all try to crawl bugzilla) As I read http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=1061943 it would be possible to block the Crawlers Googlebot-Mobile, Mediapartners-Google and AdsBot-Google, (which seem to be independent Crawlers?) while allowing the main Googlebot (Well, I don't know how often which crawler appears how often on bugzilla...) Axel