public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
* inbox.sourceware.org experiment
@ 2022-08-13 14:14 Mark Wielaard
  2022-08-15 13:00 ` Mark Wielaard
  2022-08-16 21:36 ` Mark Wielaard
  0 siblings, 2 replies; 15+ messages in thread
From: Mark Wielaard @ 2022-08-13 14:14 UTC (permalink / raw)
  To: overseers; +Cc: Simon Marchi

Hi,

It looks like our public-inbox experiment at
https://inbox.sourceware.org/ is starting to work out.

Currently only I and Simon have access to the inbox account through
ssh, but I think we can automate it enough to not need any manual
intervention unless now lists are added. But please ask if you want to
help with the setup.

I have setup sourceware-vhost-inbox.conf with corresponding
letsencrypt certificate. And public-inbox-nntpd, public-inbox-imapd
and public-inbox-httpd through systemd socket and service files.

So you should be able to access the mailboxes through git mirroring,
https, mbox downloads, atom feeds, nntp and imap.

You can already look at the experimental setup per list, e.g.
web-archive:
  https://inbox.sourceware.org/elfutils-devel/
individual messages and mbox per thread instructions:
  https://inbox.sourceware.org/elfutils-devel/_/text/help/
git mirror instructions:
  https://inbox.sourceware.org/elfutils-devel/_/text/mirror/
atom feed:
  https://inbox.sourceware.org/elfutils-devel/new.atom
imap:
  imap://inbox.sourceware.org/ (readonly, port 143, any user/pass)
nntp:
  nntp://inbox.sourceware.org/ (readonly, port 119)

Note that nntp group names and imap folder names might still change.
All current mailboxes are imported/mirrored as public-inbox-v1-format
but for scalability we will want to import them into
public-inbox-v2-format (this also parallelizes xapian indexing and
uses an sqlite database).

It looks like the inbox user can access the original emails to the
lists before mailman mangles the headers, but it cannot easily see for
which domain (sourceware, gcc, cygwin, ecos, etc.) they are. It would
be nice if we could name the news groups/folders after the primary
domain e.g. inbox.sourceware.elfutils-devel, inbox.gcc.gcc-patches,
inbox.cygwin.cygwin-talk.

The inbox.sourceware.test group at https://inbox.sourceware.org/test
is a simple mirror of http://try.public-inbox.org/test/ and I will
remove it soon (plus the cronjob that does the mirroring).

Looking at the mailman2inbox.sh script I have a few suggestions (I can
make them to the script myself, but don't know if you are currently
editing/running it):

- public-inbox-init should probably use -V2 (see above). You can then
  also use -j JOBS to speed up the import.

- --indexlevel shuld be full to make the Xapian searching more useful
  (this is the default, so you can also not set it). Note that this
  also affects the incremental updating done by public-inbox-mda.

- You want to kill public-inbox-httpd using -SIGHUP so it just reloads
  the new config files. Yo also want to kill the other daemons,
  public-inbox-imapd and public-inbox-nntpd

- The --ng name should be based on the primary domain name (see
  above). I don't know how to determine that easily though. Maybe
  mailman knows, then we can also set the initial ADDRESS properly.

The formail -s public-inbox-mda seems to work well for batch
importing, but is it efficient enough for keeping the importing up to
date? It looks like the last .mbox file is just really big and new
messages are appended at the end, so we would be trying to import all
messages all the ime. And how do we make sure it is triggered when new
messages come in?

Cheers,

Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: inbox.sourceware.org experiment
  2022-08-13 14:14 inbox.sourceware.org experiment Mark Wielaard
@ 2022-08-15 13:00 ` Mark Wielaard
  2022-08-16 21:36 ` Mark Wielaard
  1 sibling, 0 replies; 15+ messages in thread
From: Mark Wielaard @ 2022-08-15 13:00 UTC (permalink / raw)
  To: Overseers mailing list; +Cc: Simon Marchi

[-- Attachment #1: Type: text/plain, Size: 1901 bytes --]

Hi,

On Sat, Aug 13, 2022 at 04:14:03PM +0200, Mark Wielaard via Overseers wrote:
> Looking at the mailman2inbox.sh script I have a few suggestions (I can
> make them to the script myself, but don't know if you are currently
> editing/running it):
> 
> - public-inbox-init should probably use -V2 (see above). You can then
>   also use -j JOBS to speed up the import.
> 
> - --indexlevel shuld be full to make the Xapian searching more useful
>   (this is the default, so you can also not set it). Note that this
>   also affects the incremental updating done by public-inbox-mda.
> 
> - You want to kill public-inbox-httpd using -SIGHUP so it just reloads
>   the new config files. Yo also want to kill the other daemons,
>   public-inbox-imapd and public-inbox-nntpd
> 
> - The --ng name should be based on the primary domain name (see
>   above). I don't know how to determine that easily though. Maybe
>   mailman knows, then we can also set the initial ADDRESS properly.

And mailman does know, but you need to be in the mailman group to
generate the lists. We support 3 virtual domains, sourceware.org,
cygwin.com and gcc.gnu.org. Using /usr/lib/mailman/bin/list_lists we
can generate lists per domain that only include advertised, public
archived lists. There are 212 sourceware.org lists, 11 cygwin.com
lists and 28 gcc.gnu.org lists.

Attached is the output of:

/usr/lib/mailman/bin/list_lists -b -a -p -V sourceware.org > sourceware.org.lists
/usr/lib/mailman/bin/list_lists -b -a -p -V cygwin.com > cygwin.com.lists
/usr/lib/mailman/bin/list_lists -b -a -p -V gcc.gnu.org > gcc.gnu.org.lists

I placed the same in the inbox homedir under mailman.lists/ so it can
be used as input to the import script.

For sourceware.org lists @sourceware.cygnus.org and
@sources.redhat.com should be alternate/historical names. For
cygwin.com lists @cygwin.org should be an alternate name.

Cheers,

Mark

[-- Attachment #2: sourceware.org.lists --]
[-- Type: text/plain, Size: 2931 bytes --]

anonymous
archer
archer-commits
autobook-cvs
autobook-webpages-cvs
autoconf-cvs
autoconf-webpages-cvs
bfd
binutils
binutils-cvs
binutils-webpages-cvs
buildbot
bunsen
bzip2-cvs
bzip2-devel
bzip2-webpages-cvs
c++-embedded
c++-embedded-cvs
c++-embedded-webpages-cvs
catapult-cvs
catapult-webpages-cvs
cgen
cgen-cvs
cgen-prs
cgen-webpages-cvs
cluster-cvs
cluster-webpages-cvs
crossgcc
debugedit
dm-cvs
dm-webpages-cvs
docbook-tools-announce
docbook-tools-cvs
docbook-tools-discuss
docbook-tools-hackers
docbook-tools-webpages-cvs
dominion-announce
dominion-cvs
dominion-discuss
dominion-hackers
dominion-webpages-cvs
dwz
eclipse
ecos-announce
ecos-bugs
ecos-cvs
ecos-devel
ecos-discuss
ecos-maintainers
ecos-patches
ecos-webpages-cvs
elfutils-devel
elix
elix-announce
elix-cvs
elix-webpages-cvs
frysk
frysk-bugzilla
frysk-cvs
frysk-testresults
frysk-webpages-cvs
gas2
gdb
gdb-announce
gdb-cvs
gdb-patches
gdb-patches-prs
gdb-prs
gdb-testers
gdb-testresults
gdb-webpages-cvs
gdbadmin
gettext-alpha
gettext-announce
gettext-cvs
gettext-webpages-cvs
glibc-bugs
glibc-bugs-regex
glibc-cvs
glibc-webpages-cvs
global
gnats-admin
gnats-announce
gnats-cvs
gnats-devel
gnats-prs
gnats-webpages-cvs
gnu-gabi
gsl-announce
gsl-cvs
gsl-discuss
gsl-webpages-cvs
guile-cvs
guile-emacs
guile-emacs-cvs
guile-gtk
guile-prs
guile-webpages-cvs
infinity
insight
insight-announce
insight-cvs
insight-prs
insight-webpages-cvs
installshell
installshell-cvs
inti
inti-cvs
inti-webpages-cvs
ip-over-scsi-cvs
ip-over-scsi-webpages-cvs
jffs2-cvs
jffs2-webpages-cvs
kawa
kawa-cvs
kawa-webpages-cvs
libabigail
libabigail-webpages-cvs
libaio
libaio-cvs
libaio-webpages-cvs
libc-alpha
libc-alpha1
libc-announce
libc-hacker
libc-help
libc-locales
libc-ports
libc-stable
libc-testresults
libffi-announce
libffi-cvs
libffi-discuss
libffi-webpages-cvs
lvm-cvs
lvm-webpages-cvs
lvm2-cvs
lvm2-webpages-cvs
mailer-daemon
mauve-announce
mauve-cvs
mauve-discuss
mauve-patches
mauve-webpages-cvs
mingw-cvs
mingw-dvlpr
netresolve
newlib
newlib-cvs
newlib-webpages-cvs
patchutils-cvs
patchutils-list
patchutils-webpages-cvs
piranha-webpages-cvs
prelink
prelink-svn
psim-cvs
psim-webpages-cvs
pthreads-win32
pthreads-win32-cvs
pthreads-win32-webpages-cvs
rda
rhdb
rhdb-admin
rhdb-announce
rhdb-cc
rhdb-cvs
rhdb-explain
rhdb-installer
rhdb-jdbc
rhdb-utils
rhdb-webpages-cvs
rhl-cvs
rhug-cvs
rhug-rhats
rpm2html
rpm2html-cvs
rpm2html-prs
sharutils-alpha
sharutils-announce
sharutils-cvs
sharutils-webpages-cvs
sid
sid-announce
sid-cvs
sid-webpages-cvs
sourcenav
sourcenav-announce
sourcenav-cvs
sourcenav-prs
sourcenav-webpages-cvs
sourceware-announce
sourceware-cvs
sourceware-cvs-sourceware
sourceware-cvs-sourceware-webpages
sourceware-infra-cvs
sourceware-webpages-cvs
springfield
src-cvs
systemtap
systemtap-cvs
systemtap-webpages-cvs
testcvs-cvs
testcvs-webpages-cvs
webmaster
win32-x11-cvs
win32-x11-webpages-cvs
xconq-announce
xconq-cvs
xconq-prs
xconq-webpages-cvs
xconq7

[-- Attachment #3: cygwin.com.lists --]
[-- Type: text/plain, Size: 161 bytes --]

cygwin
cygwin-announce
cygwin-apps
cygwin-apps-cvs
cygwin-cvs
cygwin-developers
cygwin-licensing
cygwin-patches
cygwin-talk
cygwin-webpages-cvs
cygwin-xfree-cvs

[-- Attachment #4: gcc.gnu.org.lists --]
[-- Type: text/plain, Size: 322 bytes --]

fortran
gcc
gcc-announce
gcc-bugs
gcc-cvs
gcc-cvs-testrun
gcc-cvs-wwwdocs
gcc-help
gcc-maintainers
gcc-patches
gcc-ppc
gcc-prs
gcc-regression
gcc-rust
gcc-testlist
gcc-testresults
gccadmin
gnutools-advocacy
java
java-announce
java-cvs
java-patches
java-prs
jit
libstdc++
libstdc++-cvs
libstdc++-prs
libstdc++-webpages-cvs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: inbox.sourceware.org experiment
  2022-08-13 14:14 inbox.sourceware.org experiment Mark Wielaard
  2022-08-15 13:00 ` Mark Wielaard
@ 2022-08-16 21:36 ` Mark Wielaard
  2022-08-16 22:10   ` Frank Ch. Eigler
  1 sibling, 1 reply; 15+ messages in thread
From: Mark Wielaard @ 2022-08-16 21:36 UTC (permalink / raw)
  To: Overseers mailing list; +Cc: Simon Marchi

Hi,

On Sat, Aug 13, 2022 at 04:14:03PM +0200, Mark Wielaard via Overseers wrote:
> Looking at the mailman2inbox.sh script I have a few suggestions (I can
> make them to the script myself, but don't know if you are currently
> editing/running it):
> 
> - public-inbox-init should probably use -V2 (see above). You can then
>   also use -j JOBS to speed up the import.
> 
> - --indexlevel shuld be full to make the Xapian searching more useful
>   (this is the default, so you can also not set it). Note that this
>   also affects the incremental updating done by public-inbox-mda.
> 
> - You want to kill public-inbox-httpd using -SIGHUP so it just reloads
>   the new config files. Yo also want to kill the other daemons,
>   public-inbox-imapd and public-inbox-nntpd
> 
> - The --ng name should be based on the primary domain name (see
>   above). I don't know how to determine that easily though. Maybe
>   mailman knows, then we can also set the initial ADDRESS properly.
> 
> The formail -s public-inbox-mda seems to work well for batch
> importing, but is it efficient enough for keeping the importing up to
> date? It looks like the last .mbox file is just really big and new
> messages are appended at the end, so we would be trying to import all
> messages all the ime. And how do we make sure it is triggered when new
> messages come in?

It turns out public-inbox does support importing a full mbox in one
go. But it doesn't have a nice binary for it yet. There is however
scripts/import_vger_from_mbox in upstream git which is easily adapted
(just remove the vger specific filtering).

I put this in the inbox homedir as import_from_mbox.  And to test I
remove the already imported elfutils-devel and reimported it using the
import_from_mbox script using:

$ public-inbox-init -V2 --ng inbox.sourceware.elfutils-devel -L full elfutils-devel /home/inbox/lists/elfutils-devel https://inbox.sourceware.org/elfutils-devel elfutils@sourceware.org elfutils-devel@lists.fedorahosted.org

$ ./import_from_mbox elfutils-devel elfutils-devel@lists.fedorahosted.org lists/elfutils-devel < /sourceware/projects/elfutils-home/elfutils-devel.nospam.mbox

$ for i in /var/lib/mailman/archives/private/elfutils-devel.mbox/*mbox; do ./import_from_mbox elfutils-devel elfutils-devel@sourceware.org lists/elfutils-devel < $i; done

Note this is V2 plus full indexing and includes and extra historical
elfutils-devel.nospam.mbox

Surprisingly this only took ~30 seconds in total.

The elfutils-devel.nospam.mbox doesn't contain enough headers to do
proper threading unfortunately. But the full index does make it
possible to match on similar subject.

I don't have a solution for keeping the archive up to date. Parsing
mboxes is really discouraged upstream because it needs reparsing all
messages and there is no locking mechanism for mboxes so if mailman
writes to the mbox and public-inbox reads from it odd things can
happen.

One way to make it work with public-inbox-watch is to subscribe the
inbox user to each list and create a Maildir of messages. But then the
message headers will have been rewritten by mailman. So it would be
better to somehow get the inbox user the messages before mailman sees
them, or somehow get the inbox user a copy of the message as mailman
would add to the mbox archive instead of what it sents to list
subscribers.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: inbox.sourceware.org experiment
  2022-08-16 21:36 ` Mark Wielaard
@ 2022-08-16 22:10   ` Frank Ch. Eigler
  2022-08-17 12:25     ` Mark Wielaard
  0 siblings, 1 reply; 15+ messages in thread
From: Frank Ch. Eigler @ 2022-08-16 22:10 UTC (permalink / raw)
  To: Overseers mailing list; +Cc: Mark Wielaard, Simon Marchi

Hi -

> It turns out public-inbox does support importing a full mbox in one
> go. But it doesn't have a nice binary for it yet. There is however
> scripts/import_vger_from_mbox in upstream git which is easily adapted
> (just remove the vger specific filtering).

This is already 99% done for the sourceware mailing lists.

> [...]
> Note this is V2 plus full indexing and includes and extra historical
> elfutils-devel.nospam.mbox

Is there a need for "full" indexing as opposed to "basic"?  I don't
see why we'd need another text search engine for this stuff, we
already have.  The basic "v1" with basic indexing seems fine and
effective for web and nntp.


> [...]
> I don't have a solution for keeping the archive up to date. [...]

We can hack a postfix->|mailman and |inbox-mda alias-fork
and dual pipe delivery for each mailing list.

- FChE

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: inbox.sourceware.org experiment
  2022-08-16 22:10   ` Frank Ch. Eigler
@ 2022-08-17 12:25     ` Mark Wielaard
  2022-08-17 13:24       ` Frank Ch. Eigler
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Wielaard @ 2022-08-17 12:25 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Overseers mailing list, Simon Marchi

Hi Frank,

On Tue, Aug 16, 2022 at 06:10:58PM -0400, Frank Ch. Eigler wrote:
> > It turns out public-inbox does support importing a full mbox in one
> > go. But it doesn't have a nice binary for it yet. There is however
> > scripts/import_vger_from_mbox in upstream git which is easily adapted
> > (just remove the vger specific filtering).
> 
> This is already 99% done for the sourceware mailing lists.

Nice. Was this done using the mailman2inbox.sh script? I believe that
is still generating v1 archives. Which is why I regenerated the
elfutils-devel one.

> > [...]
> > Note this is V2 plus full indexing and includes and extra historical
> > elfutils-devel.nospam.mbox
> 
> Is there a need for "full" indexing as opposed to "basic"?  I don't
> see why we'd need another text search engine for this stuff, we
> already have.  The basic "v1" with basic indexing seems fine and
> effective for web and nntp.

Note that full indexing is separate from using v1 or v2 archives.

I don't think we should be using v1 archives, those or deprecated
upstream and they strongly recommend using v2 archives which are much
more scalable. Reimporting the lists as v2 archives using the
import_from_mbox script should be much more efficient and can be done
in a couple of hours instead of days.

A full index does not just make full text search of the mailinglist
really fast, it also indexes addresses, date ranges, subjects, headers,
body, attachments, etc. And the results are also available as mbox. So
you would then be able to easily express "give me all emails/threads
in gcc-patches from the last 6 months that discuss dwarf2out.cc where
I was not the sender or one of the receivers" and then download the
whole mbox or browse all those messages/threads online. See
e.g. https://inbox.sourceware.org/elfutils-devel/_/text/help/ for the
xapian queries you can execute.

> > [...]
> > I don't have a solution for keeping the archive up to date. [...]
> 
> We can hack a postfix->|mailman and |inbox-mda alias-fork
> and dual pipe delivery for each mailing list.

That would be great. But I would need some time reading up on
postfix/mailman configs. Do you have an example of where/how this hack
would be done?

Thanks,

Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: inbox.sourceware.org experiment
  2022-08-17 12:25     ` Mark Wielaard
@ 2022-08-17 13:24       ` Frank Ch. Eigler
  2022-08-17 21:18         ` Mark Wielaard
  0 siblings, 1 reply; 15+ messages in thread
From: Frank Ch. Eigler @ 2022-08-17 13:24 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Overseers mailing list, Simon Marchi

Hi -

> [...]
> > Is there a need for "full" indexing as opposed to "basic"?  I don't
> > see why we'd need another text search engine for this stuff, we
> > already have.  The basic "v1" with basic indexing seems fine and
> > effective for web and nntp.
> [...]
> I don't think we should be using v1 archives, those or deprecated
> upstream and they strongly recommend using v2 archives which are much
> more scalable.

Given that v1 is the default of public-inbox-init, they can't be that bad.

> Reimporting the lists as v2 archives using the import_from_mbox
> script should be much more efficient and can be done in a couple of
> hours instead of days.

That speed is nice, but I suspect that's not a v1/v2 representation
efficiency issue but something else.


> A full index does not just make full text search of the mailinglist
> really fast, it also indexes addresses, date ranges, subjects, headers,
> body, attachments, etc. And the results are also available as mbox. So
> you would then be able to easily express "give me all emails/threads
> in gcc-patches from the last 6 months that discuss dwarf2out.cc where
> I was not the sender or one of the receivers" and then download the
> whole mbox or browse all those messages/threads online.  [...]

Yes, understood that the extra indexing can do extra searches.  My
question was about utility/need for this.  For elfutils-devel, note
that the full xapian indexes are about 10x the size of the
git-compressed email archive, whereas in the case of the systemtap
import, it's only about 0.2x, so there is a serious cost/benefit
question.

(In both v1 and v2 cases, the git representation of the mailboxes is
about 60% of the size of the raw mbox files.  That's pretty puny
compression TBH, I expected much better.)


> That would be great. But I would need some time reading up on
> postfix/mailman configs. Do you have an example of where/how this hack
> would be done?

postfix delivers mailing list traffic via /etc/mailman/aliases,
e.g.:

autobook-cvs:             "|/usr/local/mailman/mailman post autobook-cvs"
autobook-cvs-bounces:     "|/usr/local/mailman/mailman bounces autobook-cvs"
autobook-cvs-confirm:     "|/usr/local/mailman/mailman confirm autobook-cvs"
autobook-cvs-join:        "|/usr/local/mailman/mailman join autobook-cvs"

I would use a script to generate a new config file from that, so that the
primary mailing list incoming aliases are forked:

autobook-cvs:             autobook-cvs-mailman, autobook-cvs-inbox
autobook-cvs-mailman:     "|/usr/local/mailman/mailman post autobook-cvs"
autobook-cvs-inbox:       "|env SOMETHING /usr/bin/public-inbox-mda SOMETHING"
autobook-cvs-bounces:     "|/usr/local/mailman/mailman bounces autobook-cvs"
autobook-cvs-confirm:     "|/usr/local/mailman/mailman confirm autobook-cvs"
autobook-cvs-join:        "|/usr/local/mailman/mailman join autobook-cvs"

and then switch postfix to this alias file instead.

- FChE

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: inbox.sourceware.org experiment
  2022-08-17 13:24       ` Frank Ch. Eigler
@ 2022-08-17 21:18         ` Mark Wielaard
  2022-08-17 21:33           ` Frank Ch. Eigler
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Wielaard @ 2022-08-17 21:18 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Overseers mailing list, Simon Marchi

Hi Frank,

On Wed, Aug 17, 2022 at 09:24:56AM -0400, Frank Ch. Eigler wrote:
> > I don't think we should be using v1 archives, those or deprecated
> > upstream and they strongly recommend using v2 archives which are much
> > more scalable.
> 
> Given that v1 is the default of public-inbox-init, they can't be that bad.

Looks like it is just for backward compatibility. They actively warn
against using it for new installations and strongly recommend using
-V2. See also the public-inbox-init, public-inbox-v1-format and
public-inbox-v2-format man pages.

I don't expect support for v1 will disappear, but new projects around
public-inbox, like lei, only support v2. So it is better to simply use
the v2 format from the start.

> > Reimporting the lists as v2 archives using the import_from_mbox
> > script should be much more efficient and can be done in a couple of
> > hours instead of days.
> 
> That speed is nice, but I suspect that's not a v1/v2 representation
> efficiency issue but something else.

The v2 format allows parallel imports so it defaults to using multiple
threads. Also using the import_from_mbox script allows to stream the
import of messages using just one perl process per mbox instead of
starting a new perl process per message.

> Yes, understood that the extra indexing can do extra searches.  My
> question was about utility/need for this.

The use seems obvious to me for anybody using the web based archives
to generate tailored message/mbox results, specifically date ranged
searches seem pretty mandatory since otherwise you essentially just
need to keep clicking, next, next, next. But also to get specific
messages based on author or subject. On specific use case for
public-inbox is to not have to be subscribed to a list to read it or
to have a local copy to search through it (even if it makes mirroring
a mailinglist easy, but not everybody has the space or network to do
that).

> For elfutils-devel, note
> that the full xapian indexes are about 10x the size of the
> git-compressed email archive, whereas in the case of the systemtap
> import, it's only about 0.2x, so there is a serious cost/benefit
> question.

That is a concern and much bigger than I anticipated. So we should
probably only enable full indexing for active discussion and patch
lists and keep it at basic for autogenerated lists like -cvs or
old/inactive lists.

> > That would be great. But I would need some time reading up on
> > postfix/mailman configs. Do you have an example of where/how this hack
> > would be done?
> 
> postfix delivers mailing list traffic via /etc/mailman/aliases,
> e.g.:
> 
> autobook-cvs:             "|/usr/local/mailman/mailman post autobook-cvs"
> autobook-cvs-bounces:     "|/usr/local/mailman/mailman bounces autobook-cvs"
> autobook-cvs-confirm:     "|/usr/local/mailman/mailman confirm autobook-cvs"
> autobook-cvs-join:        "|/usr/local/mailman/mailman join autobook-cvs"
> 
> I would use a script to generate a new config file from that, so that the
> primary mailing list incoming aliases are forked:
> 
> autobook-cvs:             autobook-cvs-mailman, autobook-cvs-inbox
> autobook-cvs-mailman:     "|/usr/local/mailman/mailman post autobook-cvs"
> autobook-cvs-inbox:       "|env SOMETHING /usr/bin/public-inbox-mda SOMETHING"
> autobook-cvs-bounces:     "|/usr/local/mailman/mailman bounces autobook-cvs"
> autobook-cvs-confirm:     "|/usr/local/mailman/mailman confirm autobook-cvs"
> autobook-cvs-join:        "|/usr/local/mailman/mailman join autobook-cvs"
> 
> and then switch postfix to this alias file instead.

OK that could work and should be easy to generate combining
/etc/mailman/aliases with the lists in
/home/inbox/.public-inbox/config

So this is before mailman sees the message, so we do need to do a
spam-check. And I think postfix sets ORIGINAL_RECIPIENT already, we
just need to make sure it is one of the addresses for a list in the
config.

But what generates /etc/mailman/aliases itself?  Can we hook into that
to trigger generation of this aliases-inbox file? Otherwise if we add
a new mailman list it won't work. And do we need to update/regenerate
/etc/aliases.db and/or /etc/mailman/aliases.db ?

Cheers,

Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: inbox.sourceware.org experiment
  2022-08-17 21:18         ` Mark Wielaard
@ 2022-08-17 21:33           ` Frank Ch. Eigler
  2022-08-18 13:50             ` Mark Wielaard
  0 siblings, 1 reply; 15+ messages in thread
From: Frank Ch. Eigler @ 2022-08-17 21:33 UTC (permalink / raw)
  To: Mark Wielaard; +Cc: Overseers mailing list, Simon Marchi

Hi -

> [...]
> > Yes, understood that the extra indexing can do extra searches.  My
> > question was about utility/need for this.
> 
> The use seems obvious to me for anybody using the web based archives
> to generate tailored message/mbox results, specifically date ranged
> searches seem pretty mandatory since otherwise you essentially just
> need to keep clicking, next, next, next.

I was under the impression that your main interest in p-i was the easy
addressability and availability of raw emails, for use such as with
git-am.  Are there other users pining for this kind of thing?

> But also to get specific messages based on author or subject. On
> specific use case for public-inbox is to not have to be subscribed
> to a list to read it [...]

You are expecting people to use the xapian query language for this
stuff?  Mailman offers that style of click-click browsing already.


> [...]
> So this is before mailman sees the message, so we do need to do a
> spam-check.

No, postfix already spam checks everything upon receipt, before delivery.

> And I think postfix sets ORIGINAL_RECIPIENT already, we just need to
> make sure it is one of the addresses for a list in the config.

This shouldn't be something that we need to write code to do, if it needs
to be done at all.


> But what generates /etc/mailman/aliases itself?  Can we hook into that
> to trigger generation of this aliases-inbox file? Otherwise if we add
> a new mailman list it won't work.

It must be some mailman administrative script.  Just crontab another
one.

> And do we need to update/regenerate
> /etc/aliases.db and/or /etc/mailman/aliases.db ?

The proposal is to not touch /etc/aliases* NOR /etc/mailman/aliases*.
The proposal is to generate a new file like
/etc/postfix/mailman-inbox-aliases from /etc/mailman/aliases.  That
new file would be the one postfix would read.  It could be texthash:
rather than hash: so postmap would not even be necessary for updates.
That depends on whether the relevant alias-expansion postfix process
is short- or long-lived.


- FChE

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: inbox.sourceware.org experiment
  2022-08-17 21:33           ` Frank Ch. Eigler
@ 2022-08-18 13:50             ` Mark Wielaard
  2022-08-18 14:40               ` Simon Marchi
  2022-08-23 22:08               ` Mark Wielaard
  0 siblings, 2 replies; 15+ messages in thread
From: Mark Wielaard @ 2022-08-18 13:50 UTC (permalink / raw)
  To: Frank Ch. Eigler; +Cc: Overseers mailing list, Simon Marchi

Hi Frank,

On Wed, Aug 17, 2022 at 05:33:40PM -0400, Frank Ch. Eigler wrote:
> > [...]
> > > Yes, understood that the extra indexing can do extra searches.  My
> > > question was about utility/need for this.
> > 
> > The use seems obvious to me for anybody using the web based archives
> > to generate tailored message/mbox results, specifically date ranged
> > searches seem pretty mandatory since otherwise you essentially just
> > need to keep clicking, next, next, next.
> 
> I was under the impression that your main interest in p-i was the easy
> addressability and availability of raw emails, for use such as with
> git-am.  Are there other users pining for this kind of thing?

I am not sure that is my main interest in public-inbox, but yes, I do
really like public-inbox because it allows tools like b4 (which I have
already tested against our instance) and piem (not tested yet) to
easily pick up and apply patch emails.
https://git.kernel.org/pub/scm/utils/b4/b4.git/tree/README.rst
https://docs.kyleam.com/piem/

I think others will also use those (or similar) tools. But I primarily
expect users to use the public-inbox archives as a way to access the
mailinglists without having to subscribe, but still be able to easily
get the actual (raw) messages (either through git, atom, mbox, nntp or
imap) to follow the conversations. Which I think is the main
interesting thing public-inbox offers.

> > But also to get specific messages based on author or subject. On
> > specific use case for public-inbox is to not have to be subscribed
> > to a list to read it [...]
> 
> You are expecting people to use the xapian query language for this
> stuff?  Mailman offers that style of click-click browsing already.

Not just users, but also tools, yes. And not for clicking through the
archive, but to generate tailored sets of messages they are interested
in. IMHO the public-inbox archives are a lot more usable than the
mailman style archives.

> > [...]
> > So this is before mailman sees the message, so we do need to do a
> > spam-check.
> 
> No, postfix already spam checks everything upon receipt, before delivery.

OK, but mailman still also blocks some messages which I have to
approve/deny as list admin (this only happens once or twice a month,
so maybe that is just spam we have to tolerate?)

> > And I think postfix sets ORIGINAL_RECIPIENT already, we just need to
> > make sure it is one of the addresses for a list in the config.
> 
> This shouldn't be something that we need to write code to do, if it needs
> to be done at all.

OK. Assuming the process runs as the inbox user it will pick up the
/home/inbox/.public-inbox/config file which should have all
information.

> > But what generates /etc/mailman/aliases itself?  Can we hook into that
> > to trigger generation of this aliases-inbox file? Otherwise if we add
> > a new mailman list it won't work.
> 
> It must be some mailman administrative script.  Just crontab another
> one.

Under which account should this crontab run? mailman doesn't seem to
have any crontabs at the moment.

> > And do we need to update/regenerate
> > /etc/aliases.db and/or /etc/mailman/aliases.db ?
> 
> The proposal is to not touch /etc/aliases* NOR /etc/mailman/aliases*.
> The proposal is to generate a new file like
> /etc/postfix/mailman-inbox-aliases from /etc/mailman/aliases.  That
> new file would be the one postfix would read.  It could be texthash:
> rather than hash: so postmap would not even be necessary for updates.
> That depends on whether the relevant alias-expansion postfix process
> is short- or long-lived.

OK, I see the following in /etc/postfix.main:

# CGF 2020-03-08 12:49
alias_maps = hash:/etc/aliases, hash:/etc/mailman/aliases

# CGF 2020-03-18 14:10 EST - newaliases wasn't affecting /etc/mailman/aliases
alias_database = hash:/etc/aliases, hash:/etc/mailman/aliases

So I assume calling newaliases regenerates the hash/.db files.

I can write a script to generate mailman-inbox-aliases this weekend
when I have stable internet access again. Will post to the list before
installing to make sure I don't accidentially break something.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: inbox.sourceware.org experiment
  2022-08-18 13:50             ` Mark Wielaard
@ 2022-08-18 14:40               ` Simon Marchi
  2022-08-21 17:41                 ` Mark Wielaard
  2022-08-23 22:08               ` Mark Wielaard
  1 sibling, 1 reply; 15+ messages in thread
From: Simon Marchi @ 2022-08-18 14:40 UTC (permalink / raw)
  To: Mark Wielaard, Frank Ch. Eigler; +Cc: Overseers mailing list

On 8/18/22 09:50, Mark Wielaard wrote:
> I am not sure that is my main interest in public-inbox, but yes, I do
> really like public-inbox because it allows tools like b4 (which I have
> already tested against our instance) and piem (not tested yet) to
> easily pick up and apply patch emails.
> https://git.kernel.org/pub/scm/utils/b4/b4.git/tree/README.rst
> https://docs.kyleam.com/piem/
>
> I think others will also use those (or similar) tools. But I primarily
> expect users to use the public-inbox archives as a way to access the
> mailinglists without having to subscribe, but still be able to easily
> get the actual (raw) messages (either through git, atom, mbox, nntp or
> imap) to follow the conversations. Which I think is the main
> interesting thing public-inbox offers.

For me it's:

 - Being able to download the raw emails in order to apply patches or to
   properly reply to messages on lists I'm not subscribed to

 - I never thought about the feature Mark mentioned, about download an
   mbox for a given query.  But if you want to download a very long
   patch series to apply it locally, it could be useful.

 - Better display and browsing to read longer threads that span multiple
   months.  For example, trying to follow this thread on Mailman would
   be complicated:

     https://inbox.sourceware.org/gdb-patches/20220428033542.1636284-1-simon.marchi@polymtl.ca/T/#r5f31e373eeb958095add41686e0ae7d1dcac9f1a

 - Search: I find it useful to be able to find a message by Message-ID.
   For instance, I'm reading the message in my client, and I want to
   send someone the link to that message in the web interface.  In my
   instance (pi.simark.ca) I can paste the Message-ID in the search box
   and it gets me directly to the right message.  On
   inbox.sourceware.org, I don't see the same search box, maybe it is
   because of the V1/V2 thing you have been talking about?

 - Not super important, but I like that the URLs to messages contain the
   Message-IDs.  This way, in a distant future where
   inbox.sourceware.org does not exist anymore, someone with the archive
   can still find out which message a given URL refers to.  A bit like
   if I give you this URL:

     https://gitlab.com/gnutools/binutils-gdb/-/commit/243cf0f69c36c4ee09c3c2b0bc7a97dc16119c51

   and Gitlab does not exist anymore, you can still find you which
   commit I am talking about if you have a copy of the binutils-gdb git
   repo.

Also, you were talking about space.  If you want to save some space, I
don't think it's very useful to have the *-cvs lists on there.  And
there are lists that are pretty much dead that you could skip too.

Simon

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: inbox.sourceware.org experiment
  2022-08-18 14:40               ` Simon Marchi
@ 2022-08-21 17:41                 ` Mark Wielaard
  2022-08-23 20:15                   ` Mark Wielaard
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Wielaard @ 2022-08-21 17:41 UTC (permalink / raw)
  To: Overseers mailing list; +Cc: Frank Ch. Eigler, Simon Marchi

Hi Simon,

On Thu, Aug 18, 2022 at 10:40:00AM -0400, Simon Marchi via Overseers wrote:
> For me it's:
> 
>  - Being able to download the raw emails in order to apply patches or to
>    properly reply to messages on lists I'm not subscribed to
> 
>  - I never thought about the feature Mark mentioned, about download an
>    mbox for a given query.  But if you want to download a very long
>    patch series to apply it locally, it could be useful.

Note that b4 (and piem if you use emacs) help with this. It will
create a local mbox containing the whole series based on any
message-id in the thread. It can also look for a newer series and
collect (and add) Reviewed-by tags looking through the messages.

>  - Better display and browsing to read longer threads that span multiple
>    months.  For example, trying to follow this thread on Mailman would
>    be complicated:
> 
>      https://inbox.sourceware.org/gdb-patches/20220428033542.1636284-1-simon.marchi@polymtl.ca/T/#r5f31e373eeb958095add41686e0ae7d1dcac9f1a
> 
>  - Search: I find it useful to be able to find a message by Message-ID.
>    For instance, I'm reading the message in my client, and I want to
>    send someone the link to that message in the web interface.  In my
>    instance (pi.simark.ca) I can paste the Message-ID in the search box
>    and it gets me directly to the right message.  On
>    inbox.sourceware.org, I don't see the same search box, maybe it is
>    because of the V1/V2 thing you have been talking about?

You can always add the message-id to the URL directly, as you did
above (add /T/ to get the full thread that message belongs to). You
didn't see the search box because the lists have not been full
indexed. I reimported some lists (see below) and now you should also
be able to use the search box in most lists.

>  - Not super important, but I like that the URLs to messages contain the
>    Message-IDs.  This way, in a distant future where
>    inbox.sourceware.org does not exist anymore, someone with the archive
>    can still find out which message a given URL refers to.  A bit like
>    if I give you this URL:
> 
>      https://gitlab.com/gnutools/binutils-gdb/-/commit/243cf0f69c36c4ee09c3c2b0bc7a97dc16119c51
> 
>    and Gitlab does not exist anymore, you can still find you which
>    commit I am talking about if you have a copy of the binutils-gdb git
>    repo.

Yes! It do think that is super important. It also allows you to add a
message link: in a commit message pointing to the original patch
submission and discussion.

> Also, you were talking about space.  If you want to save some space, I
> don't think it's very useful to have the *-cvs lists on there.

Yeah, there are also -prs/-bugs lists which are better searched
through bugzilla and -testresult lists that can be index/searched
through bunsen.

> And there are lists that are pretty much dead that you could skip
> too.

To preserve history lets not skip "dead" lists unless they are
archived at some new location. I think we are responsible for keeping
the history of old projects/lists.

I did reimport some of the lists as V2 archives with full indexing. So
you can now easily search through the following cygwin lists:

cygwin, cygwin-announce, cygwin-apps, cygwin-developers,
cygwin-licensing, cygwin-patches and cygwin-talk

The following gcc lists:

fortran, gcc, gcc-announce, gcc-help, gcc-patches, gcc-rust,
gnutools-advocacy, java, java-announce, java-patches, jit and
libstdc++

But https://inbox.sourceware.org/libstdc++ doesn't work, I suspect the
++ should be URL escaped somehow.

And the following sourceware lists:

archer, bfd, binutils, buildbot, bunsen, bzip2-devel, c++-embedded,
cgen, crossgcc, debugedit, docbook-tools-announce,
docbook-tools-discuss, dominion-hackers, dwz, eclipse, ecos-announce,
ecos-devel, ecos-discuss, ecos-maintainers, ecos-patches,
elfutils-devel, elix, elix-announce, frysk, gas2, gdb, gdb-announce,
gdb-patches, gnats-announce, gnats-devel, gnu-gabi, gsl-announce,
gsl-discuss, guile-emacs, guile-gtk, infinity, insight,
insight-announce, installshell, kawa, libabigail, libc-alpha,
libc-announce, libc-hacker, libc-help, libc-locales, libc-ports,
libc-stable, libffi-announce, libffi-discuss, mauve-discuss,
mauve-patches, mingw-dvlpr, netresolve, newlib, patchutils-list,
prelink, pthreads-win32, rda, rhdb, rhdb-admin, rhdb-cc, rhdb-explain,
rhug-rhats, sharutils-alpha, sid, sid-announce, sourcenav,
sourcenav-announce, sourceware-announce, springfield, systemtap,
xconq-announce and xconq7

See also the
mailman.lists/{cygwin.com,gcc.gnu.org,sourceware.org}.lists.full lists
and import_{cygwin,gcc,sourceware}_from_mbox scripts in the inbox
homedir.

I did remove the "test" list, and the cronjob that kept it
populated. But all other lists have been kept as V1 and basic
indexing.

There are some lists which never seen any messages. I think we should
remove them because they probably won't see any messages ever. And it
makes looking for real lists more difficult (it is ~25% of the lists,
97 out of the total 260 lists are just empty).

anonymous, autobook-cvs, autobook-webpages-cvs, autoconf-cvs,
autoconf-webpages-cvs, binutils-webpages-cvs, bzip2-cvs,
bzip2-webpages-cvs, catapult-cvs, catapult-webpages-cvs,
c++-embedded-cvs, c++-embedded-webpages-cvs, cgen-prs,
cgen-webpages-cvs, cluster-webpages-cvs, cygwin-webpages-cvs, dm-cvs,
dm-webpages-cvs, docbook-tools-hackers, docbook-tools-webpages-cvs,
dominion-announce, dominion-cvs, dominion-discuss,
dominion-webpages-cvs, ecos-webpages-cvs, elix-cvs, elix-webpages-cvs,
gcc-cvs-testrun, gcc-maintainers, gcc-ppc, gcc-sc, gcc-testlist,
gdb-webpages-cvs, gettext-alpha, gettext-announce,
gettext-webpages-cvs, glibc-webpages-cvs, global, gnats-admin,
gnats-webpages-cvs, gsl-webpages-cvs, guile-emacs-cvs,
guile-webpages-cvs, insight-cvs, insight-webpages-cvs, inti-cvs, inti,
inti-webpages-cvs, ip-over-scsi-cvs, ip-over-scsi-webpages-cvs,
java-cvs, jffs2-webpages-cvs, kawa-cvs, kawa-webpages-cvs, libaio,
libaio-webpages-cvs, libc-alpha1, libffi-cvs, libffi-webpages-cvs,
libstdc++-webpages-cvs, mailer-daemon, mauve-announce, mauve-cvs,
mauve-webpages-cvs, newlib-webpages-cvs, piranha-webpages-cvs,
postmaster, prelink-svn, psim-cvs, psim-webpages-cvs,
pthreads-win32-cvs, pthreads-win32-webpages-cvs, rhdb-installer,
rhdb-jdbc, rhdb-utils, root, rpm2html, rpm2html-prs,
sharutils-announce, sharutils-cvs, sharutils-webpages-cvs,
sid-webpages-cvs, sourcemaster, sourcenav-prs, sourceware-cvs,
sourceware-cvs-sourceware, sourceware-cvs-sourceware-webpages,
sourceware-infra-cvs, sourceware-webpages-cvs, systemtap-webpages-cvs,
testcvs-cvs, testcvs-webpages-cvs, webmaster, win32-x11-cvs,
win32-x11-webpages-cvs, xconq-prs and xconq-webpages-cvs

There are also the following 9 lists, which are either private (in
which case they show up as empty above) or not publicly advertised
lists:

cygwin-xfree, cygwin-xfree-announce, gcc-sc, mailman, overseers,
postmaster, root, sourcemaster and test-list.

The cygwin lists might still be interesting. Likewise for this list
overseers. But the others probably should be removed.

Opinions?

Thanks,

Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: inbox.sourceware.org experiment
  2022-08-21 17:41                 ` Mark Wielaard
@ 2022-08-23 20:15                   ` Mark Wielaard
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Wielaard @ 2022-08-23 20:15 UTC (permalink / raw)
  To: Overseers mailing list; +Cc: Simon Marchi

Hi,

On Sun, Aug 21, 2022 at 07:41:43PM +0200, Mark Wielaard via Overseers wrote:
> I did reimport some of the lists as V2 archives with full indexing. So
> you can now easily search through the following cygwin lists:
> 
> cygwin, cygwin-announce, cygwin-apps, cygwin-developers,
> cygwin-licensing, cygwin-patches and cygwin-talk
> 
> The following gcc lists:
> 
> fortran, gcc, gcc-announce, gcc-help, gcc-patches, gcc-rust,
> gnutools-advocacy, java, java-announce, java-patches, jit and
> libstdc++
> 
> But https://inbox.sourceware.org/libstdc++ doesn't work, I suspect the
> ++ should be URL escaped somehow.

This has been fixed now with a minimal inbox_re_workaround.psgi as
Eric suggested.

> And the following sourceware lists:
> 
> archer, bfd, binutils, buildbot, bunsen, bzip2-devel, c++-embedded,
> cgen, crossgcc, debugedit, docbook-tools-announce,
> docbook-tools-discuss, dominion-hackers, dwz, eclipse, ecos-announce,
> ecos-devel, ecos-discuss, ecos-maintainers, ecos-patches,
> elfutils-devel, elix, elix-announce, frysk, gas2, gdb, gdb-announce,
> gdb-patches, gnats-announce, gnats-devel, gnu-gabi, gsl-announce,
> gsl-discuss, guile-emacs, guile-gtk, infinity, insight,
> insight-announce, installshell, kawa, libabigail, libc-alpha,
> libc-announce, libc-hacker, libc-help, libc-locales, libc-ports,
> libc-stable, libffi-announce, libffi-discuss, mauve-discuss,
> mauve-patches, mingw-dvlpr, netresolve, newlib, patchutils-list,
> prelink, pthreads-win32, rda, rhdb, rhdb-admin, rhdb-cc, rhdb-explain,
> rhug-rhats, sharutils-alpha, sid, sid-announce, sourcenav,
> sourcenav-announce, sourceware-announce, springfield, systemtap,
> xconq-announce and xconq7
> 
> See also the
> mailman.lists/{cygwin.com,gcc.gnu.org,sourceware.org}.lists.full lists
> and import_{cygwin,gcc,sourceware}_from_mbox scripts in the inbox
> homedir.

It should be said that this increases the disk usage by ~3.5x. These
lists used to take ~9GB of storage, now they take ~32GB. The total
public-inbox lists storage is now 52GB. There is still 250G free
space. And when we are happy we can reclaim the lists.old 9GB of
storage.

> I did remove the "test" list, and the cronjob that kept it
> populated. But all other lists have been kept as V1 and basic
> indexing.
> 
> There are some lists which never seen any messages. I think we should
> remove them because they probably won't see any messages ever. And it
> makes looking for real lists more difficult (it is ~25% of the lists,
> 97 out of the total 260 lists are just empty).
> 
> anonymous, autobook-cvs, autobook-webpages-cvs, autoconf-cvs,
> autoconf-webpages-cvs, binutils-webpages-cvs, bzip2-cvs,
> bzip2-webpages-cvs, catapult-cvs, catapult-webpages-cvs,
> c++-embedded-cvs, c++-embedded-webpages-cvs, cgen-prs,
> cgen-webpages-cvs, cluster-webpages-cvs, cygwin-webpages-cvs, dm-cvs,
> dm-webpages-cvs, docbook-tools-hackers, docbook-tools-webpages-cvs,
> dominion-announce, dominion-cvs, dominion-discuss,
> dominion-webpages-cvs, ecos-webpages-cvs, elix-cvs, elix-webpages-cvs,
> gcc-cvs-testrun, gcc-maintainers, gcc-ppc, gcc-sc, gcc-testlist,
> gdb-webpages-cvs, gettext-alpha, gettext-announce,
> gettext-webpages-cvs, glibc-webpages-cvs, global, gnats-admin,
> gnats-webpages-cvs, gsl-webpages-cvs, guile-emacs-cvs,
> guile-webpages-cvs, insight-cvs, insight-webpages-cvs, inti-cvs, inti,
> inti-webpages-cvs, ip-over-scsi-cvs, ip-over-scsi-webpages-cvs,
> java-cvs, jffs2-webpages-cvs, kawa-cvs, kawa-webpages-cvs, libaio,
> libaio-webpages-cvs, libc-alpha1, libffi-cvs, libffi-webpages-cvs,
> libstdc++-webpages-cvs, mailer-daemon, mauve-announce, mauve-cvs,
> mauve-webpages-cvs, newlib-webpages-cvs, piranha-webpages-cvs,
> postmaster, prelink-svn, psim-cvs, psim-webpages-cvs,
> pthreads-win32-cvs, pthreads-win32-webpages-cvs, rhdb-installer,
> rhdb-jdbc, rhdb-utils, root, rpm2html, rpm2html-prs,
> sharutils-announce, sharutils-cvs, sharutils-webpages-cvs,
> sid-webpages-cvs, sourcemaster, sourcenav-prs, sourceware-cvs,
> sourceware-cvs-sourceware, sourceware-cvs-sourceware-webpages,
> sourceware-infra-cvs, sourceware-webpages-cvs, systemtap-webpages-cvs,
> testcvs-cvs, testcvs-webpages-cvs, webmaster, win32-x11-cvs,
> win32-x11-webpages-cvs, xconq-prs and xconq-webpages-cvs

I removed them all. If people want them back it will be easy since
they didn't contain any messages to begin with.

> There are also the following 9 lists, which are either private (in
> which case they show up as empty above) or not publicly advertised
> lists:
> 
> cygwin-xfree, cygwin-xfree-announce, gcc-sc, mailman, overseers,
> postmaster, root, sourcemaster and test-list.
> 
> The cygwin lists might still be interesting. Likewise for this list
> overseers. But the others probably should be removed.

I kept cygwin-xfree, cygwin-xfree-announce, overseers and test-list
(which I used to test :)

This leaves us with 162 public-inbox lists.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: inbox.sourceware.org experiment
  2022-08-18 13:50             ` Mark Wielaard
  2022-08-18 14:40               ` Simon Marchi
@ 2022-08-23 22:08               ` Mark Wielaard
  2022-08-24 10:05                 ` Mark Wielaard
  1 sibling, 1 reply; 15+ messages in thread
From: Mark Wielaard @ 2022-08-23 22:08 UTC (permalink / raw)
  To: Overseers mailing list; +Cc: Frank Ch. Eigler, Simon Marchi

Hi,

On Thu, Aug 18, 2022 at 03:50:54PM +0200, Mark Wielaard via Overseers wrote:
> > > And do we need to update/regenerate
> > > /etc/aliases.db and/or /etc/mailman/aliases.db ?
> > 
> > The proposal is to not touch /etc/aliases* NOR /etc/mailman/aliases*.
> > The proposal is to generate a new file like
> > /etc/postfix/mailman-inbox-aliases from /etc/mailman/aliases.  That
> > new file would be the one postfix would read.  It could be texthash:
> > rather than hash: so postmap would not even be necessary for updates.
> > That depends on whether the relevant alias-expansion postfix process
> > is short- or long-lived.
> 
> OK, I see the following in /etc/postfix.main:
> 
> # CGF 2020-03-08 12:49
> alias_maps = hash:/etc/aliases, hash:/etc/mailman/aliases
> 
> # CGF 2020-03-18 14:10 EST - newaliases wasn't affecting /etc/mailman/aliases
> alias_database = hash:/etc/aliases, hash:/etc/mailman/aliases
> 
> So I assume calling newaliases regenerates the hash/.db files.
> 
> I can write a script to generate mailman-inbox-aliases this weekend
> when I have stable internet access again. Will post to the list before
> installing to make sure I don't accidentially break something.

Sorry this took a bit longer. But I wanted to make sure I got it right.
I solved it slightly simpler by installing a /home/inbox/.forward with:
|/usr/bin/public-inbox-mda

And then simply add the inbox user as extra recipient. So the STANZA
looks like:

# STANZA START: test-list
# CREATED: Sat Mar  7 13:49:45 2020
test-list:             "|/usr/local/mailman/mailman post test-list", inbox
test-list-bounces:     "|/usr/local/mailman/mailman bounces test-list"
test-list-confirm:     "|/usr/local/mailman/mailman confirm test-list"
test-list-join:        "|/usr/local/mailman/mailman join test-list"
test-list-leave:       "|/usr/local/mailman/mailman leave test-list"
test-list-owner:       "|/usr/local/mailman/mailman owner test-list"
test-list-request:     "|/usr/local/mailman/mailman request test-list"
test-list-subscribe:   "|/usr/local/mailman/mailman subscribe test-list"
test-list-unsubscribe: "|/usr/local/mailman/mailman unsubscribe test-list"
# STANZA END: test-list

The script to generate those is in
/etc/mailman/mailman-aliases-to-inbox.sh
And the postfix main.cf has been updated to use the generated
/etc/mailman/aliases-inbox

The only thing I don't know is how to automate the
/etc/mailman/mailman-aliases-to-inbox.sh running when new lists are
added. Should this be a mailman trigger or cronjob check?

Thanks,

Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: inbox.sourceware.org experiment
  2022-08-23 22:08               ` Mark Wielaard
@ 2022-08-24 10:05                 ` Mark Wielaard
  2022-08-24 21:06                   ` Mark Wielaard
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Wielaard @ 2022-08-24 10:05 UTC (permalink / raw)
  To: Overseers mailing list; +Cc: Simon Marchi

Hi,

On Wed, Aug 24, 2022 at 12:08:51AM +0200, Mark Wielaard via Overseers wrote:
> Sorry this took a bit longer. But I wanted to make sure I got it right.
> I solved it slightly simpler by installing a /home/inbox/.forward with:
> |/usr/bin/public-inbox-mda

This is now |/home/inbox/public-inbox-mda-true.sh
Which does:
/usr/bin/public-inbox-mda --no-precheck 2>&1 | ts >> /home/inbox/log/public-inbox-mda.out.log || true

|| true to make sure any errors don't cause a bounce.  --no-precheck
because public-inbox-mda is very picky and rejects various emails that
seem just fine. And a timestamped log of errors goes to
/home/inbox/log/public-inbox-mda.out.log

> And then simply add the inbox user as extra recipient. So the STANZA
> looks like:
> 
> # STANZA START: test-list
> # CREATED: Sat Mar  7 13:49:45 2020
> test-list:             "|/usr/local/mailman/mailman post test-list", inbox
> test-list-bounces:     "|/usr/local/mailman/mailman bounces test-list"
> test-list-confirm:     "|/usr/local/mailman/mailman confirm test-list"
> test-list-join:        "|/usr/local/mailman/mailman join test-list"
> test-list-leave:       "|/usr/local/mailman/mailman leave test-list"
> test-list-owner:       "|/usr/local/mailman/mailman owner test-list"
> test-list-request:     "|/usr/local/mailman/mailman request test-list"
> test-list-subscribe:   "|/usr/local/mailman/mailman subscribe test-list"
> test-list-unsubscribe: "|/usr/local/mailman/mailman unsubscribe test-list"
> # STANZA END: test-list
> 
> The script to generate those is in
> /etc/mailman/mailman-aliases-to-inbox.sh
> And the postfix main.cf has been updated to use the generated
> /etc/mailman/aliases-inbox

This was a little too naive, public-inbox-mda does ignore emails to
addresses it doesn't know about, but some addresses generated odd/bad
loops. In particular the "root" list (now removed by Frank) and the
mailman and postmaster lists (I removed the inbox recipient by hand).

The script really should be updated to only add inbox to those mailman
post lists it is archiving.

> The only thing I don't know is how to automate the
> /etc/mailman/mailman-aliases-to-inbox.sh running when new lists are
> added. Should this be a mailman trigger or cronjob check?

So once automated make sure the above changes are also done
automatically.

I noticed two issues some lists seem to have a bad/corrupt xapian
database and generate an error while indexing (gcc-patches). emails
with slashes / in the Message-ID sometimes get wrongly escaped and
appear to not be in the archive while they really are.
e.g. the message I am replying to shows as:
https://inbox.sourceware.org/overseers/YwVP8+LHvyLzUG%2F+@wildebeest.org/
But should be:
https://inbox.sourceware.org/overseers/YwVP8+LHvyLzUG/+@wildebeest.org/

Cheers,

Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: inbox.sourceware.org experiment
  2022-08-24 10:05                 ` Mark Wielaard
@ 2022-08-24 21:06                   ` Mark Wielaard
  0 siblings, 0 replies; 15+ messages in thread
From: Mark Wielaard @ 2022-08-24 21:06 UTC (permalink / raw)
  To: Overseers mailing list; +Cc: Simon Marchi

Hi,

On Wed, Aug 24, 2022 at 12:05:03PM +0200, Mark Wielaard via Overseers wrote:
> I noticed two issues some lists seem to have a bad/corrupt xapian
> database and generate an error while indexing (gcc-patches).

I tried reindexing and compacting the largest lists. This did not
help. But the compacting did reduce the disk size of the xapian
indexes by 10GB (!).

There is now a bit more logging in
/home/inbox/logs/public-inbox-mda.out.log

It looks like this error:

rollback ineffective with AutoCommit enabled at
/usr/share/perl5/vendor_perl/PublicInbox/V2Writable.pm line 621.
checkpoint: Exception: Error writing block 147232
shard close: Exception: Error writing block 147236

Only happens after importing a new gcc-patches message. The message
isn't fully indexed, but can be referenced normally. It won't show up
in full text searches though. I haven't figured out why. I'll ask
upstream how the better debug this.

> emails with slashes / in the Message-ID sometimes get wrongly
> escaped and appear to not be in the archive while they really are.
> e.g. the message I am replying to shows as:
> https://inbox.sourceware.org/overseers/YwVP8+LHvyLzUG%2F+@wildebeest.org/
> But should be:
> https://inbox.sourceware.org/overseers/YwVP8+LHvyLzUG/+@wildebeest.org/

This isn't a big deal except when the / is at the end of the
Message-ID. Which unfortunately happens for bugzilla emails which end
in @http.sourceware.org/bugzilla/ that last slash seems to be a real
problem. Don't know a workaround for that yet.

You see public-inbox does know about the Message-ID by searching for:
https://inbox.sourceware.org/libabigail/bug-29464-9487@http.sourceware.org/bugzilla//
Which will suggest that actual URL as "partial match" but then when
following that link the slashes get escaped again... Will ask upstream
if there is any solution for this.

Finally there are some lists that accept HTML emails (by stripping off
the HTML part). public-inbox however simply rejects those emails.

*** We only accept plain-text mail, No HTML ***

Again, we should ask upstream if there could be an option to accept
just the plain/text part of such emails.

Note that such emails do end up in the .public-inbox/emergency mailbox
so in theory we could remove the text/html part and then reinsert the
message.

So there are some issues, but in general I think it works just fine
now.

Cheers,

Mark

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-08-24 21:06 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-13 14:14 inbox.sourceware.org experiment Mark Wielaard
2022-08-15 13:00 ` Mark Wielaard
2022-08-16 21:36 ` Mark Wielaard
2022-08-16 22:10   ` Frank Ch. Eigler
2022-08-17 12:25     ` Mark Wielaard
2022-08-17 13:24       ` Frank Ch. Eigler
2022-08-17 21:18         ` Mark Wielaard
2022-08-17 21:33           ` Frank Ch. Eigler
2022-08-18 13:50             ` Mark Wielaard
2022-08-18 14:40               ` Simon Marchi
2022-08-21 17:41                 ` Mark Wielaard
2022-08-23 20:15                   ` Mark Wielaard
2022-08-23 22:08               ` Mark Wielaard
2022-08-24 10:05                 ` Mark Wielaard
2022-08-24 21:06                   ` Mark Wielaard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).