From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (gnu.wildebeest.org [45.83.234.184]) by sourceware.org (Postfix) with ESMTPS id 1BD103858C33 for ; Wed, 24 Aug 2022 21:06:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1BD103858C33 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=klomp.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=klomp.org Received: from reform (deer0x0c.wildebeest.org [172.31.17.142]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id 7B04A302AB2C; Wed, 24 Aug 2022 23:06:06 +0200 (CEST) Received: by reform (Postfix, from userid 1000) id 58D7C2E814B6; Wed, 24 Aug 2022 23:06:06 +0200 (CEST) Date: Wed, 24 Aug 2022 23:06:06 +0200 From: Mark Wielaard To: Overseers mailing list Cc: Simon Marchi Subject: Re: inbox.sourceware.org experiment Message-ID: References: <20220813141403.GL5520@gnu.wildebeest.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,JMQ_SPF_NEUTRAL,KAM_DMARC_STATUS,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi, On Wed, Aug 24, 2022 at 12:05:03PM +0200, Mark Wielaard via Overseers wrote: > I noticed two issues some lists seem to have a bad/corrupt xapian > database and generate an error while indexing (gcc-patches). I tried reindexing and compacting the largest lists. This did not help. But the compacting did reduce the disk size of the xapian indexes by 10GB (!). There is now a bit more logging in /home/inbox/logs/public-inbox-mda.out.log It looks like this error: rollback ineffective with AutoCommit enabled at /usr/share/perl5/vendor_perl/PublicInbox/V2Writable.pm line 621. checkpoint: Exception: Error writing block 147232 shard close: Exception: Error writing block 147236 Only happens after importing a new gcc-patches message. The message isn't fully indexed, but can be referenced normally. It won't show up in full text searches though. I haven't figured out why. I'll ask upstream how the better debug this. > emails with slashes / in the Message-ID sometimes get wrongly > escaped and appear to not be in the archive while they really are. > e.g. the message I am replying to shows as: > https://inbox.sourceware.org/overseers/YwVP8+LHvyLzUG%2F+@wildebeest.org/ > But should be: > https://inbox.sourceware.org/overseers/YwVP8+LHvyLzUG/+@wildebeest.org/ This isn't a big deal except when the / is at the end of the Message-ID. Which unfortunately happens for bugzilla emails which end in @http.sourceware.org/bugzilla/ that last slash seems to be a real problem. Don't know a workaround for that yet. You see public-inbox does know about the Message-ID by searching for: https://inbox.sourceware.org/libabigail/bug-29464-9487@http.sourceware.org/bugzilla// Which will suggest that actual URL as "partial match" but then when following that link the slashes get escaped again... Will ask upstream if there is any solution for this. Finally there are some lists that accept HTML emails (by stripping off the HTML part). public-inbox however simply rejects those emails. *** We only accept plain-text mail, No HTML *** Again, we should ask upstream if there could be an option to accept just the plain/text part of such emails. Note that such emails do end up in the .public-inbox/emergency mailbox so in theory we could remove the text/html part and then reinsert the message. So there are some issues, but in general I think it works just fine now. Cheers, Mark