public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
From: Mark Wielaard <mark@klomp.org>
To: "Frank Ch. Eigler" <fche@elastic.org>
Cc: Overseers mailing list <overseers@sourceware.org>,
	Simon Marchi <simon.marchi@polymtl.ca>
Subject: Re: inbox.sourceware.org experiment
Date: Wed, 17 Aug 2022 14:25:17 +0200	[thread overview]
Message-ID: <YvzeLWYlQdzvfPAM@wildebeest.org> (raw)
In-Reply-To: <YvwV8nmPT8LFkx4X@elastic.org>

Hi Frank,

On Tue, Aug 16, 2022 at 06:10:58PM -0400, Frank Ch. Eigler wrote:
> > It turns out public-inbox does support importing a full mbox in one
> > go. But it doesn't have a nice binary for it yet. There is however
> > scripts/import_vger_from_mbox in upstream git which is easily adapted
> > (just remove the vger specific filtering).
> 
> This is already 99% done for the sourceware mailing lists.

Nice. Was this done using the mailman2inbox.sh script? I believe that
is still generating v1 archives. Which is why I regenerated the
elfutils-devel one.

> > [...]
> > Note this is V2 plus full indexing and includes and extra historical
> > elfutils-devel.nospam.mbox
> 
> Is there a need for "full" indexing as opposed to "basic"?  I don't
> see why we'd need another text search engine for this stuff, we
> already have.  The basic "v1" with basic indexing seems fine and
> effective for web and nntp.

Note that full indexing is separate from using v1 or v2 archives.

I don't think we should be using v1 archives, those or deprecated
upstream and they strongly recommend using v2 archives which are much
more scalable. Reimporting the lists as v2 archives using the
import_from_mbox script should be much more efficient and can be done
in a couple of hours instead of days.

A full index does not just make full text search of the mailinglist
really fast, it also indexes addresses, date ranges, subjects, headers,
body, attachments, etc. And the results are also available as mbox. So
you would then be able to easily express "give me all emails/threads
in gcc-patches from the last 6 months that discuss dwarf2out.cc where
I was not the sender or one of the receivers" and then download the
whole mbox or browse all those messages/threads online. See
e.g. https://inbox.sourceware.org/elfutils-devel/_/text/help/ for the
xapian queries you can execute.

> > [...]
> > I don't have a solution for keeping the archive up to date. [...]
> 
> We can hack a postfix->|mailman and |inbox-mda alias-fork
> and dual pipe delivery for each mailing list.

That would be great. But I would need some time reading up on
postfix/mailman configs. Do you have an example of where/how this hack
would be done?

Thanks,

Mark

  reply	other threads:[~2022-08-17 13:02 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-13 14:14 Mark Wielaard
2022-08-15 13:00 ` Mark Wielaard
2022-08-16 21:36 ` Mark Wielaard
2022-08-16 22:10   ` Frank Ch. Eigler
2022-08-17 12:25     ` Mark Wielaard [this message]
2022-08-17 13:24       ` Frank Ch. Eigler
2022-08-17 21:18         ` Mark Wielaard
2022-08-17 21:33           ` Frank Ch. Eigler
2022-08-18 13:50             ` Mark Wielaard
2022-08-18 14:40               ` Simon Marchi
2022-08-21 17:41                 ` Mark Wielaard
2022-08-23 20:15                   ` Mark Wielaard
2022-08-23 22:08               ` Mark Wielaard
2022-08-24 10:05                 ` Mark Wielaard
2022-08-24 21:06                   ` Mark Wielaard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YvzeLWYlQdzvfPAM@wildebeest.org \
    --to=mark@klomp.org \
    --cc=fche@elastic.org \
    --cc=overseers@sourceware.org \
    --cc=simon.marchi@polymtl.ca \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).