public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
From: Mark Wielaard <mark@klomp.org>
To: Overseers mailing list <overseers@sourceware.org>
Cc: Simon Marchi <simon.marchi@polymtl.ca>
Subject: Re: inbox.sourceware.org experiment
Date: Tue, 16 Aug 2022 23:36:17 +0200	[thread overview]
Message-ID: <YvwN0QZbKA+N9hN8@wildebeest.org> (raw)
In-Reply-To: <20220813141403.GL5520@gnu.wildebeest.org>

Hi,

On Sat, Aug 13, 2022 at 04:14:03PM +0200, Mark Wielaard via Overseers wrote:
> Looking at the mailman2inbox.sh script I have a few suggestions (I can
> make them to the script myself, but don't know if you are currently
> editing/running it):
> 
> - public-inbox-init should probably use -V2 (see above). You can then
>   also use -j JOBS to speed up the import.
> 
> - --indexlevel shuld be full to make the Xapian searching more useful
>   (this is the default, so you can also not set it). Note that this
>   also affects the incremental updating done by public-inbox-mda.
> 
> - You want to kill public-inbox-httpd using -SIGHUP so it just reloads
>   the new config files. Yo also want to kill the other daemons,
>   public-inbox-imapd and public-inbox-nntpd
> 
> - The --ng name should be based on the primary domain name (see
>   above). I don't know how to determine that easily though. Maybe
>   mailman knows, then we can also set the initial ADDRESS properly.
> 
> The formail -s public-inbox-mda seems to work well for batch
> importing, but is it efficient enough for keeping the importing up to
> date? It looks like the last .mbox file is just really big and new
> messages are appended at the end, so we would be trying to import all
> messages all the ime. And how do we make sure it is triggered when new
> messages come in?

It turns out public-inbox does support importing a full mbox in one
go. But it doesn't have a nice binary for it yet. There is however
scripts/import_vger_from_mbox in upstream git which is easily adapted
(just remove the vger specific filtering).

I put this in the inbox homedir as import_from_mbox.  And to test I
remove the already imported elfutils-devel and reimported it using the
import_from_mbox script using:

$ public-inbox-init -V2 --ng inbox.sourceware.elfutils-devel -L full elfutils-devel /home/inbox/lists/elfutils-devel https://inbox.sourceware.org/elfutils-devel elfutils@sourceware.org elfutils-devel@lists.fedorahosted.org

$ ./import_from_mbox elfutils-devel elfutils-devel@lists.fedorahosted.org lists/elfutils-devel < /sourceware/projects/elfutils-home/elfutils-devel.nospam.mbox

$ for i in /var/lib/mailman/archives/private/elfutils-devel.mbox/*mbox; do ./import_from_mbox elfutils-devel elfutils-devel@sourceware.org lists/elfutils-devel < $i; done

Note this is V2 plus full indexing and includes and extra historical
elfutils-devel.nospam.mbox

Surprisingly this only took ~30 seconds in total.

The elfutils-devel.nospam.mbox doesn't contain enough headers to do
proper threading unfortunately. But the full index does make it
possible to match on similar subject.

I don't have a solution for keeping the archive up to date. Parsing
mboxes is really discouraged upstream because it needs reparsing all
messages and there is no locking mechanism for mboxes so if mailman
writes to the mbox and public-inbox reads from it odd things can
happen.

One way to make it work with public-inbox-watch is to subscribe the
inbox user to each list and create a Maildir of messages. But then the
message headers will have been rewritten by mailman. So it would be
better to somehow get the inbox user the messages before mailman sees
them, or somehow get the inbox user a copy of the message as mailman
would add to the mbox archive instead of what it sents to list
subscribers.

Cheers,

Mark

  parent reply	other threads:[~2022-08-16 21:36 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-13 14:14 Mark Wielaard
2022-08-15 13:00 ` Mark Wielaard
2022-08-16 21:36 ` Mark Wielaard [this message]
2022-08-16 22:10   ` Frank Ch. Eigler
2022-08-17 12:25     ` Mark Wielaard
2022-08-17 13:24       ` Frank Ch. Eigler
2022-08-17 21:18         ` Mark Wielaard
2022-08-17 21:33           ` Frank Ch. Eigler
2022-08-18 13:50             ` Mark Wielaard
2022-08-18 14:40               ` Simon Marchi
2022-08-21 17:41                 ` Mark Wielaard
2022-08-23 20:15                   ` Mark Wielaard
2022-08-23 22:08               ` Mark Wielaard
2022-08-24 10:05                 ` Mark Wielaard
2022-08-24 21:06                   ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YvwN0QZbKA+N9hN8@wildebeest.org \
    --to=mark@klomp.org \
    --cc=overseers@sourceware.org \
    --cc=simon.marchi@polymtl.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).