From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13493 invoked by alias); 18 Nov 2011 01:00:45 -0000 Received: (qmail 13425 invoked by uid 22791); 18 Nov 2011 01:00:42 -0000 X-SWARE-Spam-Status: No, hits=-1.9 required=5.0 tests=AWL,BAYES_05,RP_MATCHES_RCVD,TW_CG X-Spam-Check-By: sourceware.org Received: from yosemite.airs.com (HELO yosemite.airs.com) (64.13.131.148) by sourceware.org (qpsmtpd/0.43rc1) with SMTP; Fri, 18 Nov 2011 01:00:28 +0000 Received: (qmail 4492 invoked by uid 10); 18 Nov 2011 01:00:28 -0000 Received: (qmail 5733 invoked by uid 500); 18 Nov 2011 01:00:23 -0000 Mail-Followup-To: overseers@sourceware.org From: Ian Lance Taylor To: overseers@sourceware.org Subject: Re: [v-amwilc at microsoft period com: Robots.txt file restricting msnbot with crawl-delay at http://gcc.gnu.org/robots.txt] References: <20111117175629.GA3124@sourceware.org> Date: Fri, 18 Nov 2011 01:00:00 -0000 In-Reply-To: <20111117175629.GA3124@sourceware.org> (Chris Faylor's message of "Thu, 17 Nov 2011 17:56:29 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Mailing-List: contact overseers-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: , Sender: overseers-owner@sourceware.org X-SW-Source: 2011-q4/txt/msg00040.txt.bz2 Chris Faylor writes: > [Reply-To set to overseers] > Should we change the Crawl-Delay ? I think it would be fine to try. Ian > From: "Amy Wilcox (Murphy & Associates)" > Subject: Robots.txt file restricting msnbot with crawl-delay at http://gcc.gnu.org/robots.txt > To: gcc > Date: Wed, 16 Nov 2011 20:05:26 +0000 > > Hi, > > I am contacting you from the Microsoft Corporation and its Internet search engine Bing (http://www.bing.com) in regards to your robots.txt file at http://gcc.gnu.org/robots.txt. Our customers have alerted us that some of your site content was not visible in our results. We have discovered that you are preventing us from crawling this content by the following crawl-delay settings in your robots.txt. > > User-agent: * > Disallow: /viewcvs > Disallow: /cgi-bin/ > Disallow: /bugzilla/buglist.cgi > Crawl-Delay: 60 > > Your current crawl-delay setting of 60 authorizes us to crawl around 1440 URLs per day (86,400 seconds per day / 60 crawl-delay ) which is not enough to guarantee that new URLs are crawled and indexed. Also this rate will not allow us to crawl older URLs to verify if they have been updated or if they are still available on your site. > > Since you have a large number of URLs on your site, we would be pleased if you remove the crawl delay settings in your robots.txt which additionally will increase traffic to your site via Bing and Yahoo search results. If you would like to use a slower or faster crawl rate at different times of the day our Bing Webmaster Tools will allow you to configure these settings (http://www.bing.com/community/site_blogs/b/webmaster/archive/2011/06/08/updates-to-bing-webmaster-tools-data-and-content.aspx) and also assist you further in obtaining the best results possible for your business or website (http://www.bing.com/toolbox/webmaster/ ). > > If you have further questions please let me know. > > Best regards, > > Amy Wilcox > Web Analyst, Bing from Microsoft > v-amwilc at microsoft period com > > > ----------