From: "mark at klomp dot org" <sourceware-bugzilla@sourceware.org>
To: overseers@sourceware.org
Subject: [Bug Infrastructure/31551] New: Better fail2ban scripts for search/ai spider fighting
Date: Mon, 25 Mar 2024 00:22:36 +0000 [thread overview]
Message-ID: <bug-31551-14326@http.sourceware.org/bugzilla/> (raw)
https://sourceware.org/bugzilla/show_bug.cgi?id=31551
Bug ID: 31551
Summary: Better fail2ban scripts for search/ai spider fighting
Product: sourceware
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: Infrastructure
Assignee: overseers at sourceware dot org
Reporter: mark at klomp dot org
Target Milestone: ---
Search and AI spiders are difficult things. Since everything we do is
open and public we actually like people to easily find anything our
projects publish. But often these spiders (especially the new AI ones)
are very aggressive and ignore our robots.txt causing service
overload.
We have some fail2ban scripts that help and worst case we include
agressive spider ip addresses in the httpd block.include list
(by hand). But this doesn't really scale. One solution is smarter
fail2ban scripts. Another is providing sitemaps https://www.sitemaps.org/
so spiders have a known list of resources to index and we can more
easily block any that go outside those.
We should have some kind of automation of fail2ban and robots.txt.
Anything that aggressively hits urls that are in robots.txt should
get banned.
--
You are receiving this mail because:
You are the assignee for the bug.
next reply other threads:[~2024-03-25 0:22 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-25 0:22 mark at klomp dot org [this message]
2024-03-25 0:24 ` [Bug Infrastructure/31551] " mark at klomp dot org
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-31551-14326@http.sourceware.org/bugzilla/ \
--to=sourceware-bugzilla@sourceware.org \
--cc=overseers@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).