public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
From: "Frank Ch. Eigler" <fche@elastic.org>
To: Mark Wielaard <mark@klomp.org>
Cc: Overseers mailing list <overseers@sourceware.org>,
	Simon Marchi <simon.marchi@polymtl.ca>
Subject: Re: inbox.sourceware.org experiment
Date: Wed, 17 Aug 2022 09:24:56 -0400	[thread overview]
Message-ID: <YvzsKJqAMxnF9tcz@elastic.org> (raw)
In-Reply-To: <YvzeLWYlQdzvfPAM@wildebeest.org>

Hi -

> [...]
> > Is there a need for "full" indexing as opposed to "basic"?  I don't
> > see why we'd need another text search engine for this stuff, we
> > already have.  The basic "v1" with basic indexing seems fine and
> > effective for web and nntp.
> [...]
> I don't think we should be using v1 archives, those or deprecated
> upstream and they strongly recommend using v2 archives which are much
> more scalable.

Given that v1 is the default of public-inbox-init, they can't be that bad.

> Reimporting the lists as v2 archives using the import_from_mbox
> script should be much more efficient and can be done in a couple of
> hours instead of days.

That speed is nice, but I suspect that's not a v1/v2 representation
efficiency issue but something else.


> A full index does not just make full text search of the mailinglist
> really fast, it also indexes addresses, date ranges, subjects, headers,
> body, attachments, etc. And the results are also available as mbox. So
> you would then be able to easily express "give me all emails/threads
> in gcc-patches from the last 6 months that discuss dwarf2out.cc where
> I was not the sender or one of the receivers" and then download the
> whole mbox or browse all those messages/threads online.  [...]

Yes, understood that the extra indexing can do extra searches.  My
question was about utility/need for this.  For elfutils-devel, note
that the full xapian indexes are about 10x the size of the
git-compressed email archive, whereas in the case of the systemtap
import, it's only about 0.2x, so there is a serious cost/benefit
question.

(In both v1 and v2 cases, the git representation of the mailboxes is
about 60% of the size of the raw mbox files.  That's pretty puny
compression TBH, I expected much better.)


> That would be great. But I would need some time reading up on
> postfix/mailman configs. Do you have an example of where/how this hack
> would be done?

postfix delivers mailing list traffic via /etc/mailman/aliases,
e.g.:

autobook-cvs:             "|/usr/local/mailman/mailman post autobook-cvs"
autobook-cvs-bounces:     "|/usr/local/mailman/mailman bounces autobook-cvs"
autobook-cvs-confirm:     "|/usr/local/mailman/mailman confirm autobook-cvs"
autobook-cvs-join:        "|/usr/local/mailman/mailman join autobook-cvs"

I would use a script to generate a new config file from that, so that the
primary mailing list incoming aliases are forked:

autobook-cvs:             autobook-cvs-mailman, autobook-cvs-inbox
autobook-cvs-mailman:     "|/usr/local/mailman/mailman post autobook-cvs"
autobook-cvs-inbox:       "|env SOMETHING /usr/bin/public-inbox-mda SOMETHING"
autobook-cvs-bounces:     "|/usr/local/mailman/mailman bounces autobook-cvs"
autobook-cvs-confirm:     "|/usr/local/mailman/mailman confirm autobook-cvs"
autobook-cvs-join:        "|/usr/local/mailman/mailman join autobook-cvs"

and then switch postfix to this alias file instead.

- FChE

  reply	other threads:[~2022-08-17 13:24 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-13 14:14 Mark Wielaard
2022-08-15 13:00 ` Mark Wielaard
2022-08-16 21:36 ` Mark Wielaard
2022-08-16 22:10   ` Frank Ch. Eigler
2022-08-17 12:25     ` Mark Wielaard
2022-08-17 13:24       ` Frank Ch. Eigler [this message]
2022-08-17 21:18         ` Mark Wielaard
2022-08-17 21:33           ` Frank Ch. Eigler
2022-08-18 13:50             ` Mark Wielaard
2022-08-18 14:40               ` Simon Marchi
2022-08-21 17:41                 ` Mark Wielaard
2022-08-23 20:15                   ` Mark Wielaard
2022-08-23 22:08               ` Mark Wielaard
2022-08-24 10:05                 ` Mark Wielaard
2022-08-24 21:06                   ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YvzsKJqAMxnF9tcz@elastic.org \
    --to=fche@elastic.org \
    --cc=mark@klomp.org \
    --cc=overseers@sourceware.org \
    --cc=simon.marchi@polymtl.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).