From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4144 invoked by alias); 17 Nov 2011 17:56:51 -0000 Received: (qmail 4110 invoked by uid 0); 17 Nov 2011 17:56:29 -0000 Date: Thu, 17 Nov 2011 17:56:00 -0000 From: Chris Faylor To: overseers@sourceware.org Subject: [v-amwilc at microsoft period com: Robots.txt file restricting msnbot with crawl-delay at http://gcc.gnu.org/robots.txt] Message-ID: <20111117175629.GA3124@sourceware.org> Reply-To: overseers@sourceware.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i Mailing-List: contact overseers-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: , Sender: overseers-owner@sourceware.org X-SW-Source: 2011-q4/txt/msg00038.txt.bz2 [Reply-To set to overseers] Should we change the Crawl-Delay ? cgf ----- Forwarded message from "Amy Wilcox (Murphy & Associates)" ----- From: "Amy Wilcox (Murphy & Associates)" To: gcc Subject: Robots.txt file restricting msnbot with crawl-delay at http://gcc.gnu.org/robots.txt Date: Wed, 16 Nov 2011 20:05:26 +0000 X-SWARE-Spam-Status: No, hits=-0.4 required=5.0 tests=BAYES_50,HTML_MESSAGE,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD X-Spam-Status: No, hits=-0.4 required=5.0 tests=BAYES_50,HTML_MESSAGE,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD X-Spam-Check-By: sourceware.org Thread-Topic: Robots.txt file restricting msnbot with crawl-delay at http://gcc.gnu.org/robots.txt Thread-Index: AcykmurVUWZdOAcKRlO9QHDwspi2hA== Accept-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [157.54.51.34] X-Virus-Checked: Checked by ClamAV on sourceware.org Hi, I am contacting you from the Microsoft Corporation and its Internet search engine Bing (http://www.bing.com) in regards to your robots.txt file at http://gcc.gnu.org/robots.txt. Our customers have alerted us that some of your site content was not visible in our results. We have discovered that you are preventing us from crawling this content by the following crawl-delay settings in your robots.txt. User-agent: * Disallow: /viewcvs Disallow: /cgi-bin/ Disallow: /bugzilla/buglist.cgi Crawl-Delay: 60 Your current crawl-delay setting of 60 authorizes us to crawl around 1440 URLs per day (86,400 seconds per day / 60 crawl-delay ) which is not enough to guarantee that new URLs are crawled and indexed. Also this rate will not allow us to crawl older URLs to verify if they have been updated or if they are still available on your site. Since you have a large number of URLs on your site, we would be pleased if you remove the crawl delay settings in your robots.txt which additionally will increase traffic to your site via Bing and Yahoo search results. If you would like to use a slower or faster crawl rate at different times of the day our Bing Webmaster Tools will allow you to configure these settings (http://www.bing.com/community/site_blogs/b/webmaster/archive/2011/06/08/updates-to-bing-webmaster-tools-data-and-content.aspx) and also assist you further in obtaining the best results possible for your business or website (http://www.bing.com/toolbox/webmaster/ ). If you have further questions please let me know. Best regards, Amy Wilcox Web Analyst, Bing from Microsoft v-amwilc at microsoft period com ----- End forwarded message -----