From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3564 invoked by alias); 16 May 2011 13:18:30 -0000 Received: (qmail 3504 invoked by uid 22791); 16 May 2011 13:18:29 -0000 X-SWARE-Spam-Status: No, hits=-6.6 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_HI,SPF_HELO_PASS,T_RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Received: from mx1.redhat.com (HELO mx1.redhat.com) (209.132.183.28) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Mon, 16 May 2011 13:18:10 +0000 Received: from int-mx12.intmail.prod.int.phx2.redhat.com (int-mx12.intmail.prod.int.phx2.redhat.com [10.5.11.25]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id p4GDIAnu014347 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 16 May 2011 09:18:10 -0400 Received: from zebedee.pink (ovpn-113-73.phx2.redhat.com [10.3.113.73]) by int-mx12.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id p4GDI8ed022057; Mon, 16 May 2011 09:18:09 -0400 Message-ID: <4DD1240F.8080809@redhat.com> Date: Mon, 16 May 2011 13:39:00 -0000 From: Andrew Haley User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110428 Fedora/3.1.10-1.fc14 Thunderbird/3.1.10 MIME-Version: 1.0 To: Richard Guenther CC: Michael Matz , gcc-patches@gcc.gnu.org Subject: Re: Don't let search bots look at buglist.cgi References: <4DD10623.40705@redhat.com> <4DD120F3.9050100@redhat.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2011-05/txt/msg01119.txt.bz2 On 05/16/2011 02:10 PM, Richard Guenther wrote: > On Mon, May 16, 2011 at 3:04 PM, Andrew Haley wrote: >> On 05/16/2011 01:09 PM, Michael Matz wrote: >>> Hi, >>> >>> On Mon, 16 May 2011, Andrew Haley wrote: >>> >>>> On 16/05/11 10:45, Richard Guenther wrote: >>>>> On Fri, May 13, 2011 at 7:14 PM, Ian Lance Taylor wrote: >>>>>> I noticed that buglist.cgi was taking quite a bit of CPU time. I looked >>>>>> at some of the long running instances, and they were coming from >>>>>> searchbots. I can't think of a good reason for this, so I have >>>>>> committed this patch to the gcc.gnu.org robots.txt file to not let >>>>>> searchbots search through lists of bugs. I plan to make a similar >>>>>> change on the sourceware.org and cygwin.com sides. Please let me know >>>>>> if this seems like a mistake. >>>>>> >>>>>> Does anybody have any experience with >>>>>> http://code.google.com/p/bugzilla-sitemap/ ? That might be a slightly >>>>>> better approach. >>>>> >>>>> Shouldn't we keep searchbots way from bugzilla completely? Searchbots >>>>> can crawl the gcc-bugs mailinglist archives. >>>> >>>> I don't understand this. Surely it is super-useful for Google etc. to >>>> be able to search gcc's Bugzilla. >>> >>> gcc-bugs provides exactly the same information, and doesn't have to >>> regenerate the full web page for each access to a bug report. >> >> It's not quite the same information, surely. Wouldn't searchers be directed >> to an email rather than the bug itself? > > Yes, though there is a link in all mails. Right, so we are contemplating a reduction in search quality in exchange for a reduction in server load. That is not an improvement from the point of view of our users, and is therefore not the sort of thing we should do unless the server load is so great that it impedes our mission. Andrew.