public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
From: Jonathan Larmour <jifl@jifvik.org>
To: overseers@sourceware.org
Subject: Re: [v-amwilc at microsoft period com: Robots.txt file restricting msnbot with crawl-delay at http://gcc.gnu.org/robots.txt]
Date: Fri, 18 Nov 2011 00:22:00 -0000	[thread overview]
Message-ID: <4EC5A52E.1040402@jifvik.org> (raw)
In-Reply-To: <20111117175629.GA3124@sourceware.org>

On 17/11/11 17:56, Chris Faylor wrote:
> [Reply-To set to overseers]
> Should we change the Crawl-Delay ?

Probably yes as the vastness of sourceware's links make using a search
engine all the more important.

But on my own website, I find the occasional problems caused by crawler
activity are usually not so much to do with the individual crawlers
themselves so much as if they happen to all be kicking the server
simultaneously. In one sense that's not their fault as they can't know
about server load and other crawlers, but there is a potential solution...

We already use mod_bw version 0.6 in Apache - but according to
<http://ivn.cl/> if we upgraded to version 0.8 or later we would be able
to use a regexp to match the user-agent of the various search engines, and
therefore restrict them collectively.

For example:
BandWidth "u:(Slurp|BaiduSpider|Googlebot|msnbot)" 100000
MaxConnection "u:(Slurp|BaiduSpider|Googlebot|msnbot)" 10

This is what I've been meaning to do on my own site, although I've never
got round to it. I'm sure there are many more user agents that could be
added into that.

Jifl

  reply	other threads:[~2011-11-18  0:22 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-17 17:56 Chris Faylor
2011-11-18  0:22 ` Jonathan Larmour [this message]
2011-11-18  1:00 ` Ian Lance Taylor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EC5A52E.1040402@jifvik.org \
    --to=jifl@jifvik.org \
    --cc=overseers@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).