public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
* [Bug Infrastructure/31551] New: Better fail2ban scripts for search/ai spider fighting
@ 2024-03-25  0:22 mark at klomp dot org
  2024-03-25  0:24 ` [Bug Infrastructure/31551] " mark at klomp dot org
  0 siblings, 1 reply; 2+ messages in thread
From: mark at klomp dot org @ 2024-03-25  0:22 UTC (permalink / raw)
  To: overseers

https://sourceware.org/bugzilla/show_bug.cgi?id=31551

            Bug ID: 31551
           Summary: Better fail2ban scripts for search/ai spider fighting
           Product: sourceware
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Infrastructure
          Assignee: overseers at sourceware dot org
          Reporter: mark at klomp dot org
  Target Milestone: ---

Search and AI spiders are difficult things. Since everything we do is
open and public we actually like people to easily find anything our
projects publish. But often these spiders (especially the new AI ones)
are very aggressive and ignore our robots.txt causing service
overload.

We have some fail2ban scripts that help and worst case we include
agressive spider ip addresses in the httpd block.include list
(by hand). But this doesn't really scale. One solution is smarter
fail2ban scripts. Another is providing sitemaps https://www.sitemaps.org/
so spiders have a known list of resources to index and we can more
easily block any that go outside those.

We should have some kind of automation of fail2ban and robots.txt.
Anything that aggressively hits urls that are in robots.txt should
get banned.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug Infrastructure/31551] Better fail2ban scripts for search/ai spider fighting
  2024-03-25  0:22 [Bug Infrastructure/31551] New: Better fail2ban scripts for search/ai spider fighting mark at klomp dot org
@ 2024-03-25  0:24 ` mark at klomp dot org
  0 siblings, 0 replies; 2+ messages in thread
From: mark at klomp dot org @ 2024-03-25  0:24 UTC (permalink / raw)
  To: overseers

https://sourceware.org/bugzilla/show_bug.cgi?id=31551

Mark Wielaard <mark at klomp dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://sourceware.org/bugz
                   |                            |illa/show_bug.cgi?id=31549

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2024-03-25  0:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-25  0:22 [Bug Infrastructure/31551] New: Better fail2ban scripts for search/ai spider fighting mark at klomp dot org
2024-03-25  0:24 ` [Bug Infrastructure/31551] " mark at klomp dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).