From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 8A5823858408; Mon, 25 Mar 2024 00:22:37 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8A5823858408 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1711326157; bh=7fyKjx7TBEgi7mcttAguE6UuVqSKY76EBUYiVqkVWUY=; h=From:To:Subject:Date:From; b=GjZrBzNowVWoTSR7kIPjr+960bPjLVHjFOHisIao8bs5hoCX3ER8t6VnyrcK1kvtZ LH0PkoDwlGk+74w5hrQjmxCmWhRr8gp39LEwlHO4xS8SwA3JIlinHYOiIpu/YstsrL MA6CQPWUWUq/NNqeOLcha2xI/El5Qg3QgmfzAtLU= From: "mark at klomp dot org" To: overseers@sourceware.org Subject: [Bug Infrastructure/31551] New: Better fail2ban scripts for search/ai spider fighting Date: Mon, 25 Mar 2024 00:22:36 +0000 X-Bugzilla-Reason: AssignedTo X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: sourceware X-Bugzilla-Component: Infrastructure X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: mark at klomp dot org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: overseers at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D31551 Bug ID: 31551 Summary: Better fail2ban scripts for search/ai spider fighting Product: sourceware Version: unspecified Status: NEW Severity: normal Priority: P2 Component: Infrastructure Assignee: overseers at sourceware dot org Reporter: mark at klomp dot org Target Milestone: --- Search and AI spiders are difficult things. Since everything we do is open and public we actually like people to easily find anything our projects publish. But often these spiders (especially the new AI ones) are very aggressive and ignore our robots.txt causing service overload. We have some fail2ban scripts that help and worst case we include agressive spider ip addresses in the httpd block.include list (by hand). But this doesn't really scale. One solution is smarter fail2ban scripts. Another is providing sitemaps https://www.sitemaps.org/ so spiders have a known list of resources to index and we can more easily block any that go outside those. We should have some kind of automation of fail2ban and robots.txt. Anything that aggressively hits urls that are in robots.txt should get banned. --=20 You are receiving this mail because: You are the assignee for the bug.=