public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
* [v-amwilc at microsoft period com: Robots.txt file restricting msnbot with crawl-delay at http://gcc.gnu.org/robots.txt]
@ 2011-11-17 17:56 Chris Faylor
  2011-11-18  0:22 ` Jonathan Larmour
  2011-11-18  1:00 ` Ian Lance Taylor
  0 siblings, 2 replies; 3+ messages in thread
From: Chris Faylor @ 2011-11-17 17:56 UTC (permalink / raw)
  To: overseers

[Reply-To set to overseers]
Should we change the Crawl-Delay ?

cgf

----- Forwarded message from "Amy Wilcox (Murphy & Associates)"  -----

From: "Amy Wilcox (Murphy & Associates)" <v-amwilc at microsoft period com>
To: gcc
Subject: Robots.txt file restricting msnbot with crawl-delay at http://gcc.gnu.org/robots.txt
Date: Wed, 16 Nov 2011 20:05:26 +0000
X-SWARE-Spam-Status: No, hits=-0.4 required=5.0	tests=BAYES_50,HTML_MESSAGE,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD
X-Spam-Status: No, hits=-0.4 required=5.0	tests=BAYES_50,HTML_MESSAGE,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD
X-Spam-Check-By: sourceware.org
Thread-Topic: Robots.txt file restricting msnbot with crawl-delay at http://gcc.gnu.org/robots.txt
Thread-Index: AcykmurVUWZdOAcKRlO9QHDwspi2hA==
Accept-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [157.54.51.34]
X-Virus-Checked: Checked by ClamAV on sourceware.org

Hi,

I am contacting you from the Microsoft Corporation and its Internet search engine Bing (http://www.bing.com) in regards to your robots.txt file at http://gcc.gnu.org/robots.txt. Our customers have alerted us that some of your site content was not visible in our results. We have discovered that you are preventing us from crawling this content by the following crawl-delay settings in your robots.txt.

User-agent: *
Disallow: /viewcvs
Disallow: /cgi-bin/
Disallow: /bugzilla/buglist.cgi
Crawl-Delay: 60

Your current crawl-delay setting of 60 authorizes us to crawl around 1440 URLs per day (86,400 seconds per day / 60 crawl-delay ) which is not enough to guarantee that new URLs are crawled and indexed. Also this rate will not allow us to crawl older URLs to verify if they have been updated or if they are still available on your site.

Since you have a large number of URLs on your site, we would be pleased if you remove the crawl delay settings in your robots.txt which additionally will increase traffic to your site via Bing and Yahoo search results. If you would like to use a slower or faster crawl rate at different times of the day our Bing Webmaster Tools will allow you to configure these settings (http://www.bing.com/community/site_blogs/b/webmaster/archive/2011/06/08/updates-to-bing-webmaster-tools-data-and-content.aspx) and also assist you further in obtaining the best results possible for your business or website (http://www.bing.com/toolbox/webmaster/ ).

If you have further questions please let me know.

Best regards,

Amy Wilcox
Web Analyst, Bing from Microsoft
v-amwilc at microsoft period com


----- End forwarded message -----

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-11-18  1:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-17 17:56 [v-amwilc at microsoft period com: Robots.txt file restricting msnbot with crawl-delay at http://gcc.gnu.org/robots.txt] Chris Faylor
2011-11-18  0:22 ` Jonathan Larmour
2011-11-18  1:00 ` Ian Lance Taylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).