From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (gnu.wildebeest.org [45.83.234.184]) by sourceware.org (Postfix) with ESMTPS id C40FA3858D1E for ; Wed, 17 Aug 2022 21:18:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C40FA3858D1E Received: from reform (unknown [178.228.156.55]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id 6C448300070C; Wed, 17 Aug 2022 23:18:37 +0200 (CEST) Received: by reform (Postfix, from userid 1000) id B44142E8157E; Wed, 17 Aug 2022 23:18:35 +0200 (CEST) Date: Wed, 17 Aug 2022 23:18:35 +0200 From: Mark Wielaard To: "Frank Ch. Eigler" Cc: Overseers mailing list , Simon Marchi Subject: Re: inbox.sourceware.org experiment Message-ID: References: <20220813141403.GL5520@gnu.wildebeest.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: overseers@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Overseers mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2022 21:18:40 -0000 Hi Frank, On Wed, Aug 17, 2022 at 09:24:56AM -0400, Frank Ch. Eigler wrote: > > I don't think we should be using v1 archives, those or deprecated > > upstream and they strongly recommend using v2 archives which are much > > more scalable. > > Given that v1 is the default of public-inbox-init, they can't be that bad. Looks like it is just for backward compatibility. They actively warn against using it for new installations and strongly recommend using -V2. See also the public-inbox-init, public-inbox-v1-format and public-inbox-v2-format man pages. I don't expect support for v1 will disappear, but new projects around public-inbox, like lei, only support v2. So it is better to simply use the v2 format from the start. > > Reimporting the lists as v2 archives using the import_from_mbox > > script should be much more efficient and can be done in a couple of > > hours instead of days. > > That speed is nice, but I suspect that's not a v1/v2 representation > efficiency issue but something else. The v2 format allows parallel imports so it defaults to using multiple threads. Also using the import_from_mbox script allows to stream the import of messages using just one perl process per mbox instead of starting a new perl process per message. > Yes, understood that the extra indexing can do extra searches. My > question was about utility/need for this. The use seems obvious to me for anybody using the web based archives to generate tailored message/mbox results, specifically date ranged searches seem pretty mandatory since otherwise you essentially just need to keep clicking, next, next, next. But also to get specific messages based on author or subject. On specific use case for public-inbox is to not have to be subscribed to a list to read it or to have a local copy to search through it (even if it makes mirroring a mailinglist easy, but not everybody has the space or network to do that). > For elfutils-devel, note > that the full xapian indexes are about 10x the size of the > git-compressed email archive, whereas in the case of the systemtap > import, it's only about 0.2x, so there is a serious cost/benefit > question. That is a concern and much bigger than I anticipated. So we should probably only enable full indexing for active discussion and patch lists and keep it at basic for autogenerated lists like -cvs or old/inactive lists. > > That would be great. But I would need some time reading up on > > postfix/mailman configs. Do you have an example of where/how this hack > > would be done? > > postfix delivers mailing list traffic via /etc/mailman/aliases, > e.g.: > > autobook-cvs: "|/usr/local/mailman/mailman post autobook-cvs" > autobook-cvs-bounces: "|/usr/local/mailman/mailman bounces autobook-cvs" > autobook-cvs-confirm: "|/usr/local/mailman/mailman confirm autobook-cvs" > autobook-cvs-join: "|/usr/local/mailman/mailman join autobook-cvs" > > I would use a script to generate a new config file from that, so that the > primary mailing list incoming aliases are forked: > > autobook-cvs: autobook-cvs-mailman, autobook-cvs-inbox > autobook-cvs-mailman: "|/usr/local/mailman/mailman post autobook-cvs" > autobook-cvs-inbox: "|env SOMETHING /usr/bin/public-inbox-mda SOMETHING" > autobook-cvs-bounces: "|/usr/local/mailman/mailman bounces autobook-cvs" > autobook-cvs-confirm: "|/usr/local/mailman/mailman confirm autobook-cvs" > autobook-cvs-join: "|/usr/local/mailman/mailman join autobook-cvs" > > and then switch postfix to this alias file instead. OK that could work and should be easy to generate combining /etc/mailman/aliases with the lists in /home/inbox/.public-inbox/config So this is before mailman sees the message, so we do need to do a spam-check. And I think postfix sets ORIGINAL_RECIPIENT already, we just need to make sure it is one of the addresses for a list in the config. But what generates /etc/mailman/aliases itself? Can we hook into that to trigger generation of this aliases-inbox file? Otherwise if we add a new mailman list it won't work. And do we need to update/regenerate /etc/aliases.db and/or /etc/mailman/aliases.db ? Cheers, Mark