public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
From: "mark at klomp dot org" <sourceware-bugzilla@sourceware.org>
To: overseers@sourceware.org
Subject: [Bug Infrastructure/31551] New: Better fail2ban scripts for search/ai spider fighting
Date: Mon, 25 Mar 2024 00:22:36 +0000	[thread overview]
Message-ID: <bug-31551-14326@http.sourceware.org/bugzilla/> (raw)

https://sourceware.org/bugzilla/show_bug.cgi?id=31551

            Bug ID: 31551
           Summary: Better fail2ban scripts for search/ai spider fighting
           Product: sourceware
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Infrastructure
          Assignee: overseers at sourceware dot org
          Reporter: mark at klomp dot org
  Target Milestone: ---

Search and AI spiders are difficult things. Since everything we do is
open and public we actually like people to easily find anything our
projects publish. But often these spiders (especially the new AI ones)
are very aggressive and ignore our robots.txt causing service
overload.

We have some fail2ban scripts that help and worst case we include
agressive spider ip addresses in the httpd block.include list
(by hand). But this doesn't really scale. One solution is smarter
fail2ban scripts. Another is providing sitemaps https://www.sitemaps.org/
so spiders have a known list of resources to index and we can more
easily block any that go outside those.

We should have some kind of automation of fail2ban and robots.txt.
Anything that aggressively hits urls that are in robots.txt should
get banned.

-- 
You are receiving this mail because:
You are the assignee for the bug.

             reply	other threads:[~2024-03-25  0:22 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-25  0:22 mark at klomp dot org [this message]
2024-03-25  0:24 ` [Bug Infrastructure/31551] " mark at klomp dot org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-31551-14326@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=overseers@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).