From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gnu.wildebeest.org (gnu.wildebeest.org [45.83.234.184]) by sourceware.org (Postfix) with ESMTPS id 9F0653858C74 for ; Wed, 17 Aug 2022 13:02:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9F0653858C74 Received: from reform (unknown [178.228.156.55]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gnu.wildebeest.org (Postfix) with ESMTPSA id 8DD03300070C; Wed, 17 Aug 2022 15:02:18 +0200 (CEST) Received: by reform (Postfix, from userid 1000) id DFB872E8182C; Wed, 17 Aug 2022 14:25:17 +0200 (CEST) Date: Wed, 17 Aug 2022 14:25:17 +0200 From: Mark Wielaard To: "Frank Ch. Eigler" Cc: Overseers mailing list , Simon Marchi Subject: Re: inbox.sourceware.org experiment Message-ID: References: <20220813141403.GL5520@gnu.wildebeest.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.7 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: overseers@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Overseers mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2022 13:02:26 -0000 Hi Frank, On Tue, Aug 16, 2022 at 06:10:58PM -0400, Frank Ch. Eigler wrote: > > It turns out public-inbox does support importing a full mbox in one > > go. But it doesn't have a nice binary for it yet. There is however > > scripts/import_vger_from_mbox in upstream git which is easily adapted > > (just remove the vger specific filtering). > > This is already 99% done for the sourceware mailing lists. Nice. Was this done using the mailman2inbox.sh script? I believe that is still generating v1 archives. Which is why I regenerated the elfutils-devel one. > > [...] > > Note this is V2 plus full indexing and includes and extra historical > > elfutils-devel.nospam.mbox > > Is there a need for "full" indexing as opposed to "basic"? I don't > see why we'd need another text search engine for this stuff, we > already have. The basic "v1" with basic indexing seems fine and > effective for web and nntp. Note that full indexing is separate from using v1 or v2 archives. I don't think we should be using v1 archives, those or deprecated upstream and they strongly recommend using v2 archives which are much more scalable. Reimporting the lists as v2 archives using the import_from_mbox script should be much more efficient and can be done in a couple of hours instead of days. A full index does not just make full text search of the mailinglist really fast, it also indexes addresses, date ranges, subjects, headers, body, attachments, etc. And the results are also available as mbox. So you would then be able to easily express "give me all emails/threads in gcc-patches from the last 6 months that discuss dwarf2out.cc where I was not the sender or one of the receivers" and then download the whole mbox or browse all those messages/threads online. See e.g. https://inbox.sourceware.org/elfutils-devel/_/text/help/ for the xapian queries you can execute. > > [...] > > I don't have a solution for keeping the archive up to date. [...] > > We can hack a postfix->|mailman and |inbox-mda alias-fork > and dual pipe delivery for each mailing list. That would be great. But I would need some time reading up on postfix/mailman configs. Do you have an example of where/how this hack would be done? Thanks, Mark