From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from elastic.org (elastic.org [96.126.110.187]) by sourceware.org (Postfix) with ESMTPS id 7C3373858D37 for ; Wed, 17 Aug 2022 13:24:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7C3373858D37 Received: from vpn-home.elastic.org ([10.0.0.2] helo=elastic.org) by elastic.org with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1oOJ2X-0000AF-D1; Wed, 17 Aug 2022 13:24:57 +0000 Received: from very.elastic.org ([192.168.1.1]) by elastic.org with esmtp (Exim 4.94.2) (envelope-from ) id 1oOJ2W-000Uqu-HI; Wed, 17 Aug 2022 09:24:56 -0400 Received: from fche by very.elastic.org with local (Exim 4.95) (envelope-from ) id 1oOJ2W-0002AY-GA; Wed, 17 Aug 2022 09:24:56 -0400 Date: Wed, 17 Aug 2022 09:24:56 -0400 From: "Frank Ch. Eigler" To: Mark Wielaard Cc: Overseers mailing list , Simon Marchi Subject: Re: inbox.sourceware.org experiment Message-ID: References: <20220813141403.GL5520@gnu.wildebeest.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Sender-Verification: "" X-Spam-Status: No, score=-101.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, USER_IN_WELCOMELIST, USER_IN_WHITELIST autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: overseers@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Overseers mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2022 13:25:00 -0000 Hi - > [...] > > Is there a need for "full" indexing as opposed to "basic"? I don't > > see why we'd need another text search engine for this stuff, we > > already have. The basic "v1" with basic indexing seems fine and > > effective for web and nntp. > [...] > I don't think we should be using v1 archives, those or deprecated > upstream and they strongly recommend using v2 archives which are much > more scalable. Given that v1 is the default of public-inbox-init, they can't be that bad. > Reimporting the lists as v2 archives using the import_from_mbox > script should be much more efficient and can be done in a couple of > hours instead of days. That speed is nice, but I suspect that's not a v1/v2 representation efficiency issue but something else. > A full index does not just make full text search of the mailinglist > really fast, it also indexes addresses, date ranges, subjects, headers, > body, attachments, etc. And the results are also available as mbox. So > you would then be able to easily express "give me all emails/threads > in gcc-patches from the last 6 months that discuss dwarf2out.cc where > I was not the sender or one of the receivers" and then download the > whole mbox or browse all those messages/threads online. [...] Yes, understood that the extra indexing can do extra searches. My question was about utility/need for this. For elfutils-devel, note that the full xapian indexes are about 10x the size of the git-compressed email archive, whereas in the case of the systemtap import, it's only about 0.2x, so there is a serious cost/benefit question. (In both v1 and v2 cases, the git representation of the mailboxes is about 60% of the size of the raw mbox files. That's pretty puny compression TBH, I expected much better.) > That would be great. But I would need some time reading up on > postfix/mailman configs. Do you have an example of where/how this hack > would be done? postfix delivers mailing list traffic via /etc/mailman/aliases, e.g.: autobook-cvs: "|/usr/local/mailman/mailman post autobook-cvs" autobook-cvs-bounces: "|/usr/local/mailman/mailman bounces autobook-cvs" autobook-cvs-confirm: "|/usr/local/mailman/mailman confirm autobook-cvs" autobook-cvs-join: "|/usr/local/mailman/mailman join autobook-cvs" I would use a script to generate a new config file from that, so that the primary mailing list incoming aliases are forked: autobook-cvs: autobook-cvs-mailman, autobook-cvs-inbox autobook-cvs-mailman: "|/usr/local/mailman/mailman post autobook-cvs" autobook-cvs-inbox: "|env SOMETHING /usr/bin/public-inbox-mda SOMETHING" autobook-cvs-bounces: "|/usr/local/mailman/mailman bounces autobook-cvs" autobook-cvs-confirm: "|/usr/local/mailman/mailman confirm autobook-cvs" autobook-cvs-join: "|/usr/local/mailman/mailman join autobook-cvs" and then switch postfix to this alias file instead. - FChE