From: "Frank Ch. Eigler" <fche@elastic.org>
To: Mark Wielaard <mark@klomp.org>
Cc: Overseers mailing list <overseers@sourceware.org>,
Simon Marchi <simon.marchi@polymtl.ca>
Subject: Re: inbox.sourceware.org experiment
Date: Wed, 17 Aug 2022 09:24:56 -0400 [thread overview]
Message-ID: <YvzsKJqAMxnF9tcz@elastic.org> (raw)
In-Reply-To: <YvzeLWYlQdzvfPAM@wildebeest.org>
Hi -
> [...]
> > Is there a need for "full" indexing as opposed to "basic"? I don't
> > see why we'd need another text search engine for this stuff, we
> > already have. The basic "v1" with basic indexing seems fine and
> > effective for web and nntp.
> [...]
> I don't think we should be using v1 archives, those or deprecated
> upstream and they strongly recommend using v2 archives which are much
> more scalable.
Given that v1 is the default of public-inbox-init, they can't be that bad.
> Reimporting the lists as v2 archives using the import_from_mbox
> script should be much more efficient and can be done in a couple of
> hours instead of days.
That speed is nice, but I suspect that's not a v1/v2 representation
efficiency issue but something else.
> A full index does not just make full text search of the mailinglist
> really fast, it also indexes addresses, date ranges, subjects, headers,
> body, attachments, etc. And the results are also available as mbox. So
> you would then be able to easily express "give me all emails/threads
> in gcc-patches from the last 6 months that discuss dwarf2out.cc where
> I was not the sender or one of the receivers" and then download the
> whole mbox or browse all those messages/threads online. [...]
Yes, understood that the extra indexing can do extra searches. My
question was about utility/need for this. For elfutils-devel, note
that the full xapian indexes are about 10x the size of the
git-compressed email archive, whereas in the case of the systemtap
import, it's only about 0.2x, so there is a serious cost/benefit
question.
(In both v1 and v2 cases, the git representation of the mailboxes is
about 60% of the size of the raw mbox files. That's pretty puny
compression TBH, I expected much better.)
> That would be great. But I would need some time reading up on
> postfix/mailman configs. Do you have an example of where/how this hack
> would be done?
postfix delivers mailing list traffic via /etc/mailman/aliases,
e.g.:
autobook-cvs: "|/usr/local/mailman/mailman post autobook-cvs"
autobook-cvs-bounces: "|/usr/local/mailman/mailman bounces autobook-cvs"
autobook-cvs-confirm: "|/usr/local/mailman/mailman confirm autobook-cvs"
autobook-cvs-join: "|/usr/local/mailman/mailman join autobook-cvs"
I would use a script to generate a new config file from that, so that the
primary mailing list incoming aliases are forked:
autobook-cvs: autobook-cvs-mailman, autobook-cvs-inbox
autobook-cvs-mailman: "|/usr/local/mailman/mailman post autobook-cvs"
autobook-cvs-inbox: "|env SOMETHING /usr/bin/public-inbox-mda SOMETHING"
autobook-cvs-bounces: "|/usr/local/mailman/mailman bounces autobook-cvs"
autobook-cvs-confirm: "|/usr/local/mailman/mailman confirm autobook-cvs"
autobook-cvs-join: "|/usr/local/mailman/mailman join autobook-cvs"
and then switch postfix to this alias file instead.
- FChE
next prev parent reply other threads:[~2022-08-17 13:24 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-13 14:14 Mark Wielaard
2022-08-15 13:00 ` Mark Wielaard
2022-08-16 21:36 ` Mark Wielaard
2022-08-16 22:10 ` Frank Ch. Eigler
2022-08-17 12:25 ` Mark Wielaard
2022-08-17 13:24 ` Frank Ch. Eigler [this message]
2022-08-17 21:18 ` Mark Wielaard
2022-08-17 21:33 ` Frank Ch. Eigler
2022-08-18 13:50 ` Mark Wielaard
2022-08-18 14:40 ` Simon Marchi
2022-08-21 17:41 ` Mark Wielaard
2022-08-23 20:15 ` Mark Wielaard
2022-08-23 22:08 ` Mark Wielaard
2022-08-24 10:05 ` Mark Wielaard
2022-08-24 21:06 ` Mark Wielaard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YvzsKJqAMxnF9tcz@elastic.org \
--to=fche@elastic.org \
--cc=mark@klomp.org \
--cc=overseers@sourceware.org \
--cc=simon.marchi@polymtl.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).