public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
From: Mark Wielaard <mark@klomp.org>
To: "Frank Ch. Eigler" <fche@elastic.org>
Cc: Overseers mailing list <overseers@sourceware.org>,
	Simon Marchi <simon.marchi@polymtl.ca>
Subject: Re: inbox.sourceware.org experiment
Date: Wed, 17 Aug 2022 23:18:35 +0200	[thread overview]
Message-ID: <Yv1bK/fAu86QXP98@wildebeest.org> (raw)
In-Reply-To: <YvzsKJqAMxnF9tcz@elastic.org>

Hi Frank,

On Wed, Aug 17, 2022 at 09:24:56AM -0400, Frank Ch. Eigler wrote:
> > I don't think we should be using v1 archives, those or deprecated
> > upstream and they strongly recommend using v2 archives which are much
> > more scalable.
> 
> Given that v1 is the default of public-inbox-init, they can't be that bad.

Looks like it is just for backward compatibility. They actively warn
against using it for new installations and strongly recommend using
-V2. See also the public-inbox-init, public-inbox-v1-format and
public-inbox-v2-format man pages.

I don't expect support for v1 will disappear, but new projects around
public-inbox, like lei, only support v2. So it is better to simply use
the v2 format from the start.

> > Reimporting the lists as v2 archives using the import_from_mbox
> > script should be much more efficient and can be done in a couple of
> > hours instead of days.
> 
> That speed is nice, but I suspect that's not a v1/v2 representation
> efficiency issue but something else.

The v2 format allows parallel imports so it defaults to using multiple
threads. Also using the import_from_mbox script allows to stream the
import of messages using just one perl process per mbox instead of
starting a new perl process per message.

> Yes, understood that the extra indexing can do extra searches.  My
> question was about utility/need for this.

The use seems obvious to me for anybody using the web based archives
to generate tailored message/mbox results, specifically date ranged
searches seem pretty mandatory since otherwise you essentially just
need to keep clicking, next, next, next. But also to get specific
messages based on author or subject. On specific use case for
public-inbox is to not have to be subscribed to a list to read it or
to have a local copy to search through it (even if it makes mirroring
a mailinglist easy, but not everybody has the space or network to do
that).

> For elfutils-devel, note
> that the full xapian indexes are about 10x the size of the
> git-compressed email archive, whereas in the case of the systemtap
> import, it's only about 0.2x, so there is a serious cost/benefit
> question.

That is a concern and much bigger than I anticipated. So we should
probably only enable full indexing for active discussion and patch
lists and keep it at basic for autogenerated lists like -cvs or
old/inactive lists.

> > That would be great. But I would need some time reading up on
> > postfix/mailman configs. Do you have an example of where/how this hack
> > would be done?
> 
> postfix delivers mailing list traffic via /etc/mailman/aliases,
> e.g.:
> 
> autobook-cvs:             "|/usr/local/mailman/mailman post autobook-cvs"
> autobook-cvs-bounces:     "|/usr/local/mailman/mailman bounces autobook-cvs"
> autobook-cvs-confirm:     "|/usr/local/mailman/mailman confirm autobook-cvs"
> autobook-cvs-join:        "|/usr/local/mailman/mailman join autobook-cvs"
> 
> I would use a script to generate a new config file from that, so that the
> primary mailing list incoming aliases are forked:
> 
> autobook-cvs:             autobook-cvs-mailman, autobook-cvs-inbox
> autobook-cvs-mailman:     "|/usr/local/mailman/mailman post autobook-cvs"
> autobook-cvs-inbox:       "|env SOMETHING /usr/bin/public-inbox-mda SOMETHING"
> autobook-cvs-bounces:     "|/usr/local/mailman/mailman bounces autobook-cvs"
> autobook-cvs-confirm:     "|/usr/local/mailman/mailman confirm autobook-cvs"
> autobook-cvs-join:        "|/usr/local/mailman/mailman join autobook-cvs"
> 
> and then switch postfix to this alias file instead.

OK that could work and should be easy to generate combining
/etc/mailman/aliases with the lists in
/home/inbox/.public-inbox/config

So this is before mailman sees the message, so we do need to do a
spam-check. And I think postfix sets ORIGINAL_RECIPIENT already, we
just need to make sure it is one of the addresses for a list in the
config.

But what generates /etc/mailman/aliases itself?  Can we hook into that
to trigger generation of this aliases-inbox file? Otherwise if we add
a new mailman list it won't work. And do we need to update/regenerate
/etc/aliases.db and/or /etc/mailman/aliases.db ?

Cheers,

Mark

  reply	other threads:[~2022-08-17 21:18 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-13 14:14 Mark Wielaard
2022-08-15 13:00 ` Mark Wielaard
2022-08-16 21:36 ` Mark Wielaard
2022-08-16 22:10   ` Frank Ch. Eigler
2022-08-17 12:25     ` Mark Wielaard
2022-08-17 13:24       ` Frank Ch. Eigler
2022-08-17 21:18         ` Mark Wielaard [this message]
2022-08-17 21:33           ` Frank Ch. Eigler
2022-08-18 13:50             ` Mark Wielaard
2022-08-18 14:40               ` Simon Marchi
2022-08-21 17:41                 ` Mark Wielaard
2022-08-23 20:15                   ` Mark Wielaard
2022-08-23 22:08               ` Mark Wielaard
2022-08-24 10:05                 ` Mark Wielaard
2022-08-24 21:06                   ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yv1bK/fAu86QXP98@wildebeest.org \
    --to=mark@klomp.org \
    --cc=fche@elastic.org \
    --cc=overseers@sourceware.org \
    --cc=simon.marchi@polymtl.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).