* inbox.sourceware.org experiment @ 2022-08-13 14:14 Mark Wielaard 2022-08-15 13:00 ` Mark Wielaard 2022-08-16 21:36 ` Mark Wielaard 0 siblings, 2 replies; 15+ messages in thread From: Mark Wielaard @ 2022-08-13 14:14 UTC (permalink / raw) To: overseers; +Cc: Simon Marchi Hi, It looks like our public-inbox experiment at https://inbox.sourceware.org/ is starting to work out. Currently only I and Simon have access to the inbox account through ssh, but I think we can automate it enough to not need any manual intervention unless now lists are added. But please ask if you want to help with the setup. I have setup sourceware-vhost-inbox.conf with corresponding letsencrypt certificate. And public-inbox-nntpd, public-inbox-imapd and public-inbox-httpd through systemd socket and service files. So you should be able to access the mailboxes through git mirroring, https, mbox downloads, atom feeds, nntp and imap. You can already look at the experimental setup per list, e.g. web-archive: https://inbox.sourceware.org/elfutils-devel/ individual messages and mbox per thread instructions: https://inbox.sourceware.org/elfutils-devel/_/text/help/ git mirror instructions: https://inbox.sourceware.org/elfutils-devel/_/text/mirror/ atom feed: https://inbox.sourceware.org/elfutils-devel/new.atom imap: imap://inbox.sourceware.org/ (readonly, port 143, any user/pass) nntp: nntp://inbox.sourceware.org/ (readonly, port 119) Note that nntp group names and imap folder names might still change. All current mailboxes are imported/mirrored as public-inbox-v1-format but for scalability we will want to import them into public-inbox-v2-format (this also parallelizes xapian indexing and uses an sqlite database). It looks like the inbox user can access the original emails to the lists before mailman mangles the headers, but it cannot easily see for which domain (sourceware, gcc, cygwin, ecos, etc.) they are. It would be nice if we could name the news groups/folders after the primary domain e.g. inbox.sourceware.elfutils-devel, inbox.gcc.gcc-patches, inbox.cygwin.cygwin-talk. The inbox.sourceware.test group at https://inbox.sourceware.org/test is a simple mirror of http://try.public-inbox.org/test/ and I will remove it soon (plus the cronjob that does the mirroring). Looking at the mailman2inbox.sh script I have a few suggestions (I can make them to the script myself, but don't know if you are currently editing/running it): - public-inbox-init should probably use -V2 (see above). You can then also use -j JOBS to speed up the import. - --indexlevel shuld be full to make the Xapian searching more useful (this is the default, so you can also not set it). Note that this also affects the incremental updating done by public-inbox-mda. - You want to kill public-inbox-httpd using -SIGHUP so it just reloads the new config files. Yo also want to kill the other daemons, public-inbox-imapd and public-inbox-nntpd - The --ng name should be based on the primary domain name (see above). I don't know how to determine that easily though. Maybe mailman knows, then we can also set the initial ADDRESS properly. The formail -s public-inbox-mda seems to work well for batch importing, but is it efficient enough for keeping the importing up to date? It looks like the last .mbox file is just really big and new messages are appended at the end, so we would be trying to import all messages all the ime. And how do we make sure it is triggered when new messages come in? Cheers, Mark ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: inbox.sourceware.org experiment 2022-08-13 14:14 inbox.sourceware.org experiment Mark Wielaard @ 2022-08-15 13:00 ` Mark Wielaard 2022-08-16 21:36 ` Mark Wielaard 1 sibling, 0 replies; 15+ messages in thread From: Mark Wielaard @ 2022-08-15 13:00 UTC (permalink / raw) To: Overseers mailing list; +Cc: Simon Marchi [-- Attachment #1: Type: text/plain, Size: 1901 bytes --] Hi, On Sat, Aug 13, 2022 at 04:14:03PM +0200, Mark Wielaard via Overseers wrote: > Looking at the mailman2inbox.sh script I have a few suggestions (I can > make them to the script myself, but don't know if you are currently > editing/running it): > > - public-inbox-init should probably use -V2 (see above). You can then > also use -j JOBS to speed up the import. > > - --indexlevel shuld be full to make the Xapian searching more useful > (this is the default, so you can also not set it). Note that this > also affects the incremental updating done by public-inbox-mda. > > - You want to kill public-inbox-httpd using -SIGHUP so it just reloads > the new config files. Yo also want to kill the other daemons, > public-inbox-imapd and public-inbox-nntpd > > - The --ng name should be based on the primary domain name (see > above). I don't know how to determine that easily though. Maybe > mailman knows, then we can also set the initial ADDRESS properly. And mailman does know, but you need to be in the mailman group to generate the lists. We support 3 virtual domains, sourceware.org, cygwin.com and gcc.gnu.org. Using /usr/lib/mailman/bin/list_lists we can generate lists per domain that only include advertised, public archived lists. There are 212 sourceware.org lists, 11 cygwin.com lists and 28 gcc.gnu.org lists. Attached is the output of: /usr/lib/mailman/bin/list_lists -b -a -p -V sourceware.org > sourceware.org.lists /usr/lib/mailman/bin/list_lists -b -a -p -V cygwin.com > cygwin.com.lists /usr/lib/mailman/bin/list_lists -b -a -p -V gcc.gnu.org > gcc.gnu.org.lists I placed the same in the inbox homedir under mailman.lists/ so it can be used as input to the import script. For sourceware.org lists @sourceware.cygnus.org and @sources.redhat.com should be alternate/historical names. For cygwin.com lists @cygwin.org should be an alternate name. Cheers, Mark [-- Attachment #2: sourceware.org.lists --] [-- Type: text/plain, Size: 2931 bytes --] anonymous archer archer-commits autobook-cvs autobook-webpages-cvs autoconf-cvs autoconf-webpages-cvs bfd binutils binutils-cvs binutils-webpages-cvs buildbot bunsen bzip2-cvs bzip2-devel bzip2-webpages-cvs c++-embedded c++-embedded-cvs c++-embedded-webpages-cvs catapult-cvs catapult-webpages-cvs cgen cgen-cvs cgen-prs cgen-webpages-cvs cluster-cvs cluster-webpages-cvs crossgcc debugedit dm-cvs dm-webpages-cvs docbook-tools-announce docbook-tools-cvs docbook-tools-discuss docbook-tools-hackers docbook-tools-webpages-cvs dominion-announce dominion-cvs dominion-discuss dominion-hackers dominion-webpages-cvs dwz eclipse ecos-announce ecos-bugs ecos-cvs ecos-devel ecos-discuss ecos-maintainers ecos-patches ecos-webpages-cvs elfutils-devel elix elix-announce elix-cvs elix-webpages-cvs frysk frysk-bugzilla frysk-cvs frysk-testresults frysk-webpages-cvs gas2 gdb gdb-announce gdb-cvs gdb-patches gdb-patches-prs gdb-prs gdb-testers gdb-testresults gdb-webpages-cvs gdbadmin gettext-alpha gettext-announce gettext-cvs gettext-webpages-cvs glibc-bugs glibc-bugs-regex glibc-cvs glibc-webpages-cvs global gnats-admin gnats-announce gnats-cvs gnats-devel gnats-prs gnats-webpages-cvs gnu-gabi gsl-announce gsl-cvs gsl-discuss gsl-webpages-cvs guile-cvs guile-emacs guile-emacs-cvs guile-gtk guile-prs guile-webpages-cvs infinity insight insight-announce insight-cvs insight-prs insight-webpages-cvs installshell installshell-cvs inti inti-cvs inti-webpages-cvs ip-over-scsi-cvs ip-over-scsi-webpages-cvs jffs2-cvs jffs2-webpages-cvs kawa kawa-cvs kawa-webpages-cvs libabigail libabigail-webpages-cvs libaio libaio-cvs libaio-webpages-cvs libc-alpha libc-alpha1 libc-announce libc-hacker libc-help libc-locales libc-ports libc-stable libc-testresults libffi-announce libffi-cvs libffi-discuss libffi-webpages-cvs lvm-cvs lvm-webpages-cvs lvm2-cvs lvm2-webpages-cvs mailer-daemon mauve-announce mauve-cvs mauve-discuss mauve-patches mauve-webpages-cvs mingw-cvs mingw-dvlpr netresolve newlib newlib-cvs newlib-webpages-cvs patchutils-cvs patchutils-list patchutils-webpages-cvs piranha-webpages-cvs prelink prelink-svn psim-cvs psim-webpages-cvs pthreads-win32 pthreads-win32-cvs pthreads-win32-webpages-cvs rda rhdb rhdb-admin rhdb-announce rhdb-cc rhdb-cvs rhdb-explain rhdb-installer rhdb-jdbc rhdb-utils rhdb-webpages-cvs rhl-cvs rhug-cvs rhug-rhats rpm2html rpm2html-cvs rpm2html-prs sharutils-alpha sharutils-announce sharutils-cvs sharutils-webpages-cvs sid sid-announce sid-cvs sid-webpages-cvs sourcenav sourcenav-announce sourcenav-cvs sourcenav-prs sourcenav-webpages-cvs sourceware-announce sourceware-cvs sourceware-cvs-sourceware sourceware-cvs-sourceware-webpages sourceware-infra-cvs sourceware-webpages-cvs springfield src-cvs systemtap systemtap-cvs systemtap-webpages-cvs testcvs-cvs testcvs-webpages-cvs webmaster win32-x11-cvs win32-x11-webpages-cvs xconq-announce xconq-cvs xconq-prs xconq-webpages-cvs xconq7 [-- Attachment #3: cygwin.com.lists --] [-- Type: text/plain, Size: 161 bytes --] cygwin cygwin-announce cygwin-apps cygwin-apps-cvs cygwin-cvs cygwin-developers cygwin-licensing cygwin-patches cygwin-talk cygwin-webpages-cvs cygwin-xfree-cvs [-- Attachment #4: gcc.gnu.org.lists --] [-- Type: text/plain, Size: 322 bytes --] fortran gcc gcc-announce gcc-bugs gcc-cvs gcc-cvs-testrun gcc-cvs-wwwdocs gcc-help gcc-maintainers gcc-patches gcc-ppc gcc-prs gcc-regression gcc-rust gcc-testlist gcc-testresults gccadmin gnutools-advocacy java java-announce java-cvs java-patches java-prs jit libstdc++ libstdc++-cvs libstdc++-prs libstdc++-webpages-cvs ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: inbox.sourceware.org experiment 2022-08-13 14:14 inbox.sourceware.org experiment Mark Wielaard 2022-08-15 13:00 ` Mark Wielaard @ 2022-08-16 21:36 ` Mark Wielaard 2022-08-16 22:10 ` Frank Ch. Eigler 1 sibling, 1 reply; 15+ messages in thread From: Mark Wielaard @ 2022-08-16 21:36 UTC (permalink / raw) To: Overseers mailing list; +Cc: Simon Marchi Hi, On Sat, Aug 13, 2022 at 04:14:03PM +0200, Mark Wielaard via Overseers wrote: > Looking at the mailman2inbox.sh script I have a few suggestions (I can > make them to the script myself, but don't know if you are currently > editing/running it): > > - public-inbox-init should probably use -V2 (see above). You can then > also use -j JOBS to speed up the import. > > - --indexlevel shuld be full to make the Xapian searching more useful > (this is the default, so you can also not set it). Note that this > also affects the incremental updating done by public-inbox-mda. > > - You want to kill public-inbox-httpd using -SIGHUP so it just reloads > the new config files. Yo also want to kill the other daemons, > public-inbox-imapd and public-inbox-nntpd > > - The --ng name should be based on the primary domain name (see > above). I don't know how to determine that easily though. Maybe > mailman knows, then we can also set the initial ADDRESS properly. > > The formail -s public-inbox-mda seems to work well for batch > importing, but is it efficient enough for keeping the importing up to > date? It looks like the last .mbox file is just really big and new > messages are appended at the end, so we would be trying to import all > messages all the ime. And how do we make sure it is triggered when new > messages come in? It turns out public-inbox does support importing a full mbox in one go. But it doesn't have a nice binary for it yet. There is however scripts/import_vger_from_mbox in upstream git which is easily adapted (just remove the vger specific filtering). I put this in the inbox homedir as import_from_mbox. And to test I remove the already imported elfutils-devel and reimported it using the import_from_mbox script using: $ public-inbox-init -V2 --ng inbox.sourceware.elfutils-devel -L full elfutils-devel /home/inbox/lists/elfutils-devel https://inbox.sourceware.org/elfutils-devel elfutils@sourceware.org elfutils-devel@lists.fedorahosted.org $ ./import_from_mbox elfutils-devel elfutils-devel@lists.fedorahosted.org lists/elfutils-devel < /sourceware/projects/elfutils-home/elfutils-devel.nospam.mbox $ for i in /var/lib/mailman/archives/private/elfutils-devel.mbox/*mbox; do ./import_from_mbox elfutils-devel elfutils-devel@sourceware.org lists/elfutils-devel < $i; done Note this is V2 plus full indexing and includes and extra historical elfutils-devel.nospam.mbox Surprisingly this only took ~30 seconds in total. The elfutils-devel.nospam.mbox doesn't contain enough headers to do proper threading unfortunately. But the full index does make it possible to match on similar subject. I don't have a solution for keeping the archive up to date. Parsing mboxes is really discouraged upstream because it needs reparsing all messages and there is no locking mechanism for mboxes so if mailman writes to the mbox and public-inbox reads from it odd things can happen. One way to make it work with public-inbox-watch is to subscribe the inbox user to each list and create a Maildir of messages. But then the message headers will have been rewritten by mailman. So it would be better to somehow get the inbox user the messages before mailman sees them, or somehow get the inbox user a copy of the message as mailman would add to the mbox archive instead of what it sents to list subscribers. Cheers, Mark ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: inbox.sourceware.org experiment 2022-08-16 21:36 ` Mark Wielaard @ 2022-08-16 22:10 ` Frank Ch. Eigler 2022-08-17 12:25 ` Mark Wielaard 0 siblings, 1 reply; 15+ messages in thread From: Frank Ch. Eigler @ 2022-08-16 22:10 UTC (permalink / raw) To: Overseers mailing list; +Cc: Mark Wielaard, Simon Marchi Hi - > It turns out public-inbox does support importing a full mbox in one > go. But it doesn't have a nice binary for it yet. There is however > scripts/import_vger_from_mbox in upstream git which is easily adapted > (just remove the vger specific filtering). This is already 99% done for the sourceware mailing lists. > [...] > Note this is V2 plus full indexing and includes and extra historical > elfutils-devel.nospam.mbox Is there a need for "full" indexing as opposed to "basic"? I don't see why we'd need another text search engine for this stuff, we already have. The basic "v1" with basic indexing seems fine and effective for web and nntp. > [...] > I don't have a solution for keeping the archive up to date. [...] We can hack a postfix->|mailman and |inbox-mda alias-fork and dual pipe delivery for each mailing list. - FChE ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: inbox.sourceware.org experiment 2022-08-16 22:10 ` Frank Ch. Eigler @ 2022-08-17 12:25 ` Mark Wielaard 2022-08-17 13:24 ` Frank Ch. Eigler 0 siblings, 1 reply; 15+ messages in thread From: Mark Wielaard @ 2022-08-17 12:25 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Overseers mailing list, Simon Marchi Hi Frank, On Tue, Aug 16, 2022 at 06:10:58PM -0400, Frank Ch. Eigler wrote: > > It turns out public-inbox does support importing a full mbox in one > > go. But it doesn't have a nice binary for it yet. There is however > > scripts/import_vger_from_mbox in upstream git which is easily adapted > > (just remove the vger specific filtering). > > This is already 99% done for the sourceware mailing lists. Nice. Was this done using the mailman2inbox.sh script? I believe that is still generating v1 archives. Which is why I regenerated the elfutils-devel one. > > [...] > > Note this is V2 plus full indexing and includes and extra historical > > elfutils-devel.nospam.mbox > > Is there a need for "full" indexing as opposed to "basic"? I don't > see why we'd need another text search engine for this stuff, we > already have. The basic "v1" with basic indexing seems fine and > effective for web and nntp. Note that full indexing is separate from using v1 or v2 archives. I don't think we should be using v1 archives, those or deprecated upstream and they strongly recommend using v2 archives which are much more scalable. Reimporting the lists as v2 archives using the import_from_mbox script should be much more efficient and can be done in a couple of hours instead of days. A full index does not just make full text search of the mailinglist really fast, it also indexes addresses, date ranges, subjects, headers, body, attachments, etc. And the results are also available as mbox. So you would then be able to easily express "give me all emails/threads in gcc-patches from the last 6 months that discuss dwarf2out.cc where I was not the sender or one of the receivers" and then download the whole mbox or browse all those messages/threads online. See e.g. https://inbox.sourceware.org/elfutils-devel/_/text/help/ for the xapian queries you can execute. > > [...] > > I don't have a solution for keeping the archive up to date. [...] > > We can hack a postfix->|mailman and |inbox-mda alias-fork > and dual pipe delivery for each mailing list. That would be great. But I would need some time reading up on postfix/mailman configs. Do you have an example of where/how this hack would be done? Thanks, Mark ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: inbox.sourceware.org experiment 2022-08-17 12:25 ` Mark Wielaard @ 2022-08-17 13:24 ` Frank Ch. Eigler 2022-08-17 21:18 ` Mark Wielaard 0 siblings, 1 reply; 15+ messages in thread From: Frank Ch. Eigler @ 2022-08-17 13:24 UTC (permalink / raw) To: Mark Wielaard; +Cc: Overseers mailing list, Simon Marchi Hi - > [...] > > Is there a need for "full" indexing as opposed to "basic"? I don't > > see why we'd need another text search engine for this stuff, we > > already have. The basic "v1" with basic indexing seems fine and > > effective for web and nntp. > [...] > I don't think we should be using v1 archives, those or deprecated > upstream and they strongly recommend using v2 archives which are much > more scalable. Given that v1 is the default of public-inbox-init, they can't be that bad. > Reimporting the lists as v2 archives using the import_from_mbox > script should be much more efficient and can be done in a couple of > hours instead of days. That speed is nice, but I suspect that's not a v1/v2 representation efficiency issue but something else. > A full index does not just make full text search of the mailinglist > really fast, it also indexes addresses, date ranges, subjects, headers, > body, attachments, etc. And the results are also available as mbox. So > you would then be able to easily express "give me all emails/threads > in gcc-patches from the last 6 months that discuss dwarf2out.cc where > I was not the sender or one of the receivers" and then download the > whole mbox or browse all those messages/threads online. [...] Yes, understood that the extra indexing can do extra searches. My question was about utility/need for this. For elfutils-devel, note that the full xapian indexes are about 10x the size of the git-compressed email archive, whereas in the case of the systemtap import, it's only about 0.2x, so there is a serious cost/benefit question. (In both v1 and v2 cases, the git representation of the mailboxes is about 60% of the size of the raw mbox files. That's pretty puny compression TBH, I expected much better.) > That would be great. But I would need some time reading up on > postfix/mailman configs. Do you have an example of where/how this hack > would be done? postfix delivers mailing list traffic via /etc/mailman/aliases, e.g.: autobook-cvs: "|/usr/local/mailman/mailman post autobook-cvs" autobook-cvs-bounces: "|/usr/local/mailman/mailman bounces autobook-cvs" autobook-cvs-confirm: "|/usr/local/mailman/mailman confirm autobook-cvs" autobook-cvs-join: "|/usr/local/mailman/mailman join autobook-cvs" I would use a script to generate a new config file from that, so that the primary mailing list incoming aliases are forked: autobook-cvs: autobook-cvs-mailman, autobook-cvs-inbox autobook-cvs-mailman: "|/usr/local/mailman/mailman post autobook-cvs" autobook-cvs-inbox: "|env SOMETHING /usr/bin/public-inbox-mda SOMETHING" autobook-cvs-bounces: "|/usr/local/mailman/mailman bounces autobook-cvs" autobook-cvs-confirm: "|/usr/local/mailman/mailman confirm autobook-cvs" autobook-cvs-join: "|/usr/local/mailman/mailman join autobook-cvs" and then switch postfix to this alias file instead. - FChE ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: inbox.sourceware.org experiment 2022-08-17 13:24 ` Frank Ch. Eigler @ 2022-08-17 21:18 ` Mark Wielaard 2022-08-17 21:33 ` Frank Ch. Eigler 0 siblings, 1 reply; 15+ messages in thread From: Mark Wielaard @ 2022-08-17 21:18 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Overseers mailing list, Simon Marchi Hi Frank, On Wed, Aug 17, 2022 at 09:24:56AM -0400, Frank Ch. Eigler wrote: > > I don't think we should be using v1 archives, those or deprecated > > upstream and they strongly recommend using v2 archives which are much > > more scalable. > > Given that v1 is the default of public-inbox-init, they can't be that bad. Looks like it is just for backward compatibility. They actively warn against using it for new installations and strongly recommend using -V2. See also the public-inbox-init, public-inbox-v1-format and public-inbox-v2-format man pages. I don't expect support for v1 will disappear, but new projects around public-inbox, like lei, only support v2. So it is better to simply use the v2 format from the start. > > Reimporting the lists as v2 archives using the import_from_mbox > > script should be much more efficient and can be done in a couple of > > hours instead of days. > > That speed is nice, but I suspect that's not a v1/v2 representation > efficiency issue but something else. The v2 format allows parallel imports so it defaults to using multiple threads. Also using the import_from_mbox script allows to stream the import of messages using just one perl process per mbox instead of starting a new perl process per message. > Yes, understood that the extra indexing can do extra searches. My > question was about utility/need for this. The use seems obvious to me for anybody using the web based archives to generate tailored message/mbox results, specifically date ranged searches seem pretty mandatory since otherwise you essentially just need to keep clicking, next, next, next. But also to get specific messages based on author or subject. On specific use case for public-inbox is to not have to be subscribed to a list to read it or to have a local copy to search through it (even if it makes mirroring a mailinglist easy, but not everybody has the space or network to do that). > For elfutils-devel, note > that the full xapian indexes are about 10x the size of the > git-compressed email archive, whereas in the case of the systemtap > import, it's only about 0.2x, so there is a serious cost/benefit > question. That is a concern and much bigger than I anticipated. So we should probably only enable full indexing for active discussion and patch lists and keep it at basic for autogenerated lists like -cvs or old/inactive lists. > > That would be great. But I would need some time reading up on > > postfix/mailman configs. Do you have an example of where/how this hack > > would be done? > > postfix delivers mailing list traffic via /etc/mailman/aliases, > e.g.: > > autobook-cvs: "|/usr/local/mailman/mailman post autobook-cvs" > autobook-cvs-bounces: "|/usr/local/mailman/mailman bounces autobook-cvs" > autobook-cvs-confirm: "|/usr/local/mailman/mailman confirm autobook-cvs" > autobook-cvs-join: "|/usr/local/mailman/mailman join autobook-cvs" > > I would use a script to generate a new config file from that, so that the > primary mailing list incoming aliases are forked: > > autobook-cvs: autobook-cvs-mailman, autobook-cvs-inbox > autobook-cvs-mailman: "|/usr/local/mailman/mailman post autobook-cvs" > autobook-cvs-inbox: "|env SOMETHING /usr/bin/public-inbox-mda SOMETHING" > autobook-cvs-bounces: "|/usr/local/mailman/mailman bounces autobook-cvs" > autobook-cvs-confirm: "|/usr/local/mailman/mailman confirm autobook-cvs" > autobook-cvs-join: "|/usr/local/mailman/mailman join autobook-cvs" > > and then switch postfix to this alias file instead. OK that could work and should be easy to generate combining /etc/mailman/aliases with the lists in /home/inbox/.public-inbox/config So this is before mailman sees the message, so we do need to do a spam-check. And I think postfix sets ORIGINAL_RECIPIENT already, we just need to make sure it is one of the addresses for a list in the config. But what generates /etc/mailman/aliases itself? Can we hook into that to trigger generation of this aliases-inbox file? Otherwise if we add a new mailman list it won't work. And do we need to update/regenerate /etc/aliases.db and/or /etc/mailman/aliases.db ? Cheers, Mark ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: inbox.sourceware.org experiment 2022-08-17 21:18 ` Mark Wielaard @ 2022-08-17 21:33 ` Frank Ch. Eigler 2022-08-18 13:50 ` Mark Wielaard 0 siblings, 1 reply; 15+ messages in thread From: Frank Ch. Eigler @ 2022-08-17 21:33 UTC (permalink / raw) To: Mark Wielaard; +Cc: Overseers mailing list, Simon Marchi Hi - > [...] > > Yes, understood that the extra indexing can do extra searches. My > > question was about utility/need for this. > > The use seems obvious to me for anybody using the web based archives > to generate tailored message/mbox results, specifically date ranged > searches seem pretty mandatory since otherwise you essentially just > need to keep clicking, next, next, next. I was under the impression that your main interest in p-i was the easy addressability and availability of raw emails, for use such as with git-am. Are there other users pining for this kind of thing? > But also to get specific messages based on author or subject. On > specific use case for public-inbox is to not have to be subscribed > to a list to read it [...] You are expecting people to use the xapian query language for this stuff? Mailman offers that style of click-click browsing already. > [...] > So this is before mailman sees the message, so we do need to do a > spam-check. No, postfix already spam checks everything upon receipt, before delivery. > And I think postfix sets ORIGINAL_RECIPIENT already, we just need to > make sure it is one of the addresses for a list in the config. This shouldn't be something that we need to write code to do, if it needs to be done at all. > But what generates /etc/mailman/aliases itself? Can we hook into that > to trigger generation of this aliases-inbox file? Otherwise if we add > a new mailman list it won't work. It must be some mailman administrative script. Just crontab another one. > And do we need to update/regenerate > /etc/aliases.db and/or /etc/mailman/aliases.db ? The proposal is to not touch /etc/aliases* NOR /etc/mailman/aliases*. The proposal is to generate a new file like /etc/postfix/mailman-inbox-aliases from /etc/mailman/aliases. That new file would be the one postfix would read. It could be texthash: rather than hash: so postmap would not even be necessary for updates. That depends on whether the relevant alias-expansion postfix process is short- or long-lived. - FChE ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: inbox.sourceware.org experiment 2022-08-17 21:33 ` Frank Ch. Eigler @ 2022-08-18 13:50 ` Mark Wielaard 2022-08-18 14:40 ` Simon Marchi 2022-08-23 22:08 ` Mark Wielaard 0 siblings, 2 replies; 15+ messages in thread From: Mark Wielaard @ 2022-08-18 13:50 UTC (permalink / raw) To: Frank Ch. Eigler; +Cc: Overseers mailing list, Simon Marchi Hi Frank, On Wed, Aug 17, 2022 at 05:33:40PM -0400, Frank Ch. Eigler wrote: > > [...] > > > Yes, understood that the extra indexing can do extra searches. My > > > question was about utility/need for this. > > > > The use seems obvious to me for anybody using the web based archives > > to generate tailored message/mbox results, specifically date ranged > > searches seem pretty mandatory since otherwise you essentially just > > need to keep clicking, next, next, next. > > I was under the impression that your main interest in p-i was the easy > addressability and availability of raw emails, for use such as with > git-am. Are there other users pining for this kind of thing? I am not sure that is my main interest in public-inbox, but yes, I do really like public-inbox because it allows tools like b4 (which I have already tested against our instance) and piem (not tested yet) to easily pick up and apply patch emails. https://git.kernel.org/pub/scm/utils/b4/b4.git/tree/README.rst https://docs.kyleam.com/piem/ I think others will also use those (or similar) tools. But I primarily expect users to use the public-inbox archives as a way to access the mailinglists without having to subscribe, but still be able to easily get the actual (raw) messages (either through git, atom, mbox, nntp or imap) to follow the conversations. Which I think is the main interesting thing public-inbox offers. > > But also to get specific messages based on author or subject. On > > specific use case for public-inbox is to not have to be subscribed > > to a list to read it [...] > > You are expecting people to use the xapian query language for this > stuff? Mailman offers that style of click-click browsing already. Not just users, but also tools, yes. And not for clicking through the archive, but to generate tailored sets of messages they are interested in. IMHO the public-inbox archives are a lot more usable than the mailman style archives. > > [...] > > So this is before mailman sees the message, so we do need to do a > > spam-check. > > No, postfix already spam checks everything upon receipt, before delivery. OK, but mailman still also blocks some messages which I have to approve/deny as list admin (this only happens once or twice a month, so maybe that is just spam we have to tolerate?) > > And I think postfix sets ORIGINAL_RECIPIENT already, we just need to > > make sure it is one of the addresses for a list in the config. > > This shouldn't be something that we need to write code to do, if it needs > to be done at all. OK. Assuming the process runs as the inbox user it will pick up the /home/inbox/.public-inbox/config file which should have all information. > > But what generates /etc/mailman/aliases itself? Can we hook into that > > to trigger generation of this aliases-inbox file? Otherwise if we add > > a new mailman list it won't work. > > It must be some mailman administrative script. Just crontab another > one. Under which account should this crontab run? mailman doesn't seem to have any crontabs at the moment. > > And do we need to update/regenerate > > /etc/aliases.db and/or /etc/mailman/aliases.db ? > > The proposal is to not touch /etc/aliases* NOR /etc/mailman/aliases*. > The proposal is to generate a new file like > /etc/postfix/mailman-inbox-aliases from /etc/mailman/aliases. That > new file would be the one postfix would read. It could be texthash: > rather than hash: so postmap would not even be necessary for updates. > That depends on whether the relevant alias-expansion postfix process > is short- or long-lived. OK, I see the following in /etc/postfix.main: # CGF 2020-03-08 12:49 alias_maps = hash:/etc/aliases, hash:/etc/mailman/aliases # CGF 2020-03-18 14:10 EST - newaliases wasn't affecting /etc/mailman/aliases alias_database = hash:/etc/aliases, hash:/etc/mailman/aliases So I assume calling newaliases regenerates the hash/.db files. I can write a script to generate mailman-inbox-aliases this weekend when I have stable internet access again. Will post to the list before installing to make sure I don't accidentially break something. Cheers, Mark ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: inbox.sourceware.org experiment 2022-08-18 13:50 ` Mark Wielaard @ 2022-08-18 14:40 ` Simon Marchi 2022-08-21 17:41 ` Mark Wielaard 2022-08-23 22:08 ` Mark Wielaard 1 sibling, 1 reply; 15+ messages in thread From: Simon Marchi @ 2022-08-18 14:40 UTC (permalink / raw) To: Mark Wielaard, Frank Ch. Eigler; +Cc: Overseers mailing list On 8/18/22 09:50, Mark Wielaard wrote: > I am not sure that is my main interest in public-inbox, but yes, I do > really like public-inbox because it allows tools like b4 (which I have > already tested against our instance) and piem (not tested yet) to > easily pick up and apply patch emails. > https://git.kernel.org/pub/scm/utils/b4/b4.git/tree/README.rst > https://docs.kyleam.com/piem/ > > I think others will also use those (or similar) tools. But I primarily > expect users to use the public-inbox archives as a way to access the > mailinglists without having to subscribe, but still be able to easily > get the actual (raw) messages (either through git, atom, mbox, nntp or > imap) to follow the conversations. Which I think is the main > interesting thing public-inbox offers. For me it's: - Being able to download the raw emails in order to apply patches or to properly reply to messages on lists I'm not subscribed to - I never thought about the feature Mark mentioned, about download an mbox for a given query. But if you want to download a very long patch series to apply it locally, it could be useful. - Better display and browsing to read longer threads that span multiple months. For example, trying to follow this thread on Mailman would be complicated: https://inbox.sourceware.org/gdb-patches/20220428033542.1636284-1-simon.marchi@polymtl.ca/T/#r5f31e373eeb958095add41686e0ae7d1dcac9f1a - Search: I find it useful to be able to find a message by Message-ID. For instance, I'm reading the message in my client, and I want to send someone the link to that message in the web interface. In my instance (pi.simark.ca) I can paste the Message-ID in the search box and it gets me directly to the right message. On inbox.sourceware.org, I don't see the same search box, maybe it is because of the V1/V2 thing you have been talking about? - Not super important, but I like that the URLs to messages contain the Message-IDs. This way, in a distant future where inbox.sourceware.org does not exist anymore, someone with the archive can still find out which message a given URL refers to. A bit like if I give you this URL: https://gitlab.com/gnutools/binutils-gdb/-/commit/243cf0f69c36c4ee09c3c2b0bc7a97dc16119c51 and Gitlab does not exist anymore, you can still find you which commit I am talking about if you have a copy of the binutils-gdb git repo. Also, you were talking about space. If you want to save some space, I don't think it's very useful to have the *-cvs lists on there. And there are lists that are pretty much dead that you could skip too. Simon ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: inbox.sourceware.org experiment 2022-08-18 14:40 ` Simon Marchi @ 2022-08-21 17:41 ` Mark Wielaard 2022-08-23 20:15 ` Mark Wielaard 0 siblings, 1 reply; 15+ messages in thread From: Mark Wielaard @ 2022-08-21 17:41 UTC (permalink / raw) To: Overseers mailing list; +Cc: Frank Ch. Eigler, Simon Marchi Hi Simon, On Thu, Aug 18, 2022 at 10:40:00AM -0400, Simon Marchi via Overseers wrote: > For me it's: > > - Being able to download the raw emails in order to apply patches or to > properly reply to messages on lists I'm not subscribed to > > - I never thought about the feature Mark mentioned, about download an > mbox for a given query. But if you want to download a very long > patch series to apply it locally, it could be useful. Note that b4 (and piem if you use emacs) help with this. It will create a local mbox containing the whole series based on any message-id in the thread. It can also look for a newer series and collect (and add) Reviewed-by tags looking through the messages. > - Better display and browsing to read longer threads that span multiple > months. For example, trying to follow this thread on Mailman would > be complicated: > > https://inbox.sourceware.org/gdb-patches/20220428033542.1636284-1-simon.marchi@polymtl.ca/T/#r5f31e373eeb958095add41686e0ae7d1dcac9f1a > > - Search: I find it useful to be able to find a message by Message-ID. > For instance, I'm reading the message in my client, and I want to > send someone the link to that message in the web interface. In my > instance (pi.simark.ca) I can paste the Message-ID in the search box > and it gets me directly to the right message. On > inbox.sourceware.org, I don't see the same search box, maybe it is > because of the V1/V2 thing you have been talking about? You can always add the message-id to the URL directly, as you did above (add /T/ to get the full thread that message belongs to). You didn't see the search box because the lists have not been full indexed. I reimported some lists (see below) and now you should also be able to use the search box in most lists. > - Not super important, but I like that the URLs to messages contain the > Message-IDs. This way, in a distant future where > inbox.sourceware.org does not exist anymore, someone with the archive > can still find out which message a given URL refers to. A bit like > if I give you this URL: > > https://gitlab.com/gnutools/binutils-gdb/-/commit/243cf0f69c36c4ee09c3c2b0bc7a97dc16119c51 > > and Gitlab does not exist anymore, you can still find you which > commit I am talking about if you have a copy of the binutils-gdb git > repo. Yes! It do think that is super important. It also allows you to add a message link: in a commit message pointing to the original patch submission and discussion. > Also, you were talking about space. If you want to save some space, I > don't think it's very useful to have the *-cvs lists on there. Yeah, there are also -prs/-bugs lists which are better searched through bugzilla and -testresult lists that can be index/searched through bunsen. > And there are lists that are pretty much dead that you could skip > too. To preserve history lets not skip "dead" lists unless they are archived at some new location. I think we are responsible for keeping the history of old projects/lists. I did reimport some of the lists as V2 archives with full indexing. So you can now easily search through the following cygwin lists: cygwin, cygwin-announce, cygwin-apps, cygwin-developers, cygwin-licensing, cygwin-patches and cygwin-talk The following gcc lists: fortran, gcc, gcc-announce, gcc-help, gcc-patches, gcc-rust, gnutools-advocacy, java, java-announce, java-patches, jit and libstdc++ But https://inbox.sourceware.org/libstdc++ doesn't work, I suspect the ++ should be URL escaped somehow. And the following sourceware lists: archer, bfd, binutils, buildbot, bunsen, bzip2-devel, c++-embedded, cgen, crossgcc, debugedit, docbook-tools-announce, docbook-tools-discuss, dominion-hackers, dwz, eclipse, ecos-announce, ecos-devel, ecos-discuss, ecos-maintainers, ecos-patches, elfutils-devel, elix, elix-announce, frysk, gas2, gdb, gdb-announce, gdb-patches, gnats-announce, gnats-devel, gnu-gabi, gsl-announce, gsl-discuss, guile-emacs, guile-gtk, infinity, insight, insight-announce, installshell, kawa, libabigail, libc-alpha, libc-announce, libc-hacker, libc-help, libc-locales, libc-ports, libc-stable, libffi-announce, libffi-discuss, mauve-discuss, mauve-patches, mingw-dvlpr, netresolve, newlib, patchutils-list, prelink, pthreads-win32, rda, rhdb, rhdb-admin, rhdb-cc, rhdb-explain, rhug-rhats, sharutils-alpha, sid, sid-announce, sourcenav, sourcenav-announce, sourceware-announce, springfield, systemtap, xconq-announce and xconq7 See also the mailman.lists/{cygwin.com,gcc.gnu.org,sourceware.org}.lists.full lists and import_{cygwin,gcc,sourceware}_from_mbox scripts in the inbox homedir. I did remove the "test" list, and the cronjob that kept it populated. But all other lists have been kept as V1 and basic indexing. There are some lists which never seen any messages. I think we should remove them because they probably won't see any messages ever. And it makes looking for real lists more difficult (it is ~25% of the lists, 97 out of the total 260 lists are just empty). anonymous, autobook-cvs, autobook-webpages-cvs, autoconf-cvs, autoconf-webpages-cvs, binutils-webpages-cvs, bzip2-cvs, bzip2-webpages-cvs, catapult-cvs, catapult-webpages-cvs, c++-embedded-cvs, c++-embedded-webpages-cvs, cgen-prs, cgen-webpages-cvs, cluster-webpages-cvs, cygwin-webpages-cvs, dm-cvs, dm-webpages-cvs, docbook-tools-hackers, docbook-tools-webpages-cvs, dominion-announce, dominion-cvs, dominion-discuss, dominion-webpages-cvs, ecos-webpages-cvs, elix-cvs, elix-webpages-cvs, gcc-cvs-testrun, gcc-maintainers, gcc-ppc, gcc-sc, gcc-testlist, gdb-webpages-cvs, gettext-alpha, gettext-announce, gettext-webpages-cvs, glibc-webpages-cvs, global, gnats-admin, gnats-webpages-cvs, gsl-webpages-cvs, guile-emacs-cvs, guile-webpages-cvs, insight-cvs, insight-webpages-cvs, inti-cvs, inti, inti-webpages-cvs, ip-over-scsi-cvs, ip-over-scsi-webpages-cvs, java-cvs, jffs2-webpages-cvs, kawa-cvs, kawa-webpages-cvs, libaio, libaio-webpages-cvs, libc-alpha1, libffi-cvs, libffi-webpages-cvs, libstdc++-webpages-cvs, mailer-daemon, mauve-announce, mauve-cvs, mauve-webpages-cvs, newlib-webpages-cvs, piranha-webpages-cvs, postmaster, prelink-svn, psim-cvs, psim-webpages-cvs, pthreads-win32-cvs, pthreads-win32-webpages-cvs, rhdb-installer, rhdb-jdbc, rhdb-utils, root, rpm2html, rpm2html-prs, sharutils-announce, sharutils-cvs, sharutils-webpages-cvs, sid-webpages-cvs, sourcemaster, sourcenav-prs, sourceware-cvs, sourceware-cvs-sourceware, sourceware-cvs-sourceware-webpages, sourceware-infra-cvs, sourceware-webpages-cvs, systemtap-webpages-cvs, testcvs-cvs, testcvs-webpages-cvs, webmaster, win32-x11-cvs, win32-x11-webpages-cvs, xconq-prs and xconq-webpages-cvs There are also the following 9 lists, which are either private (in which case they show up as empty above) or not publicly advertised lists: cygwin-xfree, cygwin-xfree-announce, gcc-sc, mailman, overseers, postmaster, root, sourcemaster and test-list. The cygwin lists might still be interesting. Likewise for this list overseers. But the others probably should be removed. Opinions? Thanks, Mark ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: inbox.sourceware.org experiment 2022-08-21 17:41 ` Mark Wielaard @ 2022-08-23 20:15 ` Mark Wielaard 0 siblings, 0 replies; 15+ messages in thread From: Mark Wielaard @ 2022-08-23 20:15 UTC (permalink / raw) To: Overseers mailing list; +Cc: Simon Marchi Hi, On Sun, Aug 21, 2022 at 07:41:43PM +0200, Mark Wielaard via Overseers wrote: > I did reimport some of the lists as V2 archives with full indexing. So > you can now easily search through the following cygwin lists: > > cygwin, cygwin-announce, cygwin-apps, cygwin-developers, > cygwin-licensing, cygwin-patches and cygwin-talk > > The following gcc lists: > > fortran, gcc, gcc-announce, gcc-help, gcc-patches, gcc-rust, > gnutools-advocacy, java, java-announce, java-patches, jit and > libstdc++ > > But https://inbox.sourceware.org/libstdc++ doesn't work, I suspect the > ++ should be URL escaped somehow. This has been fixed now with a minimal inbox_re_workaround.psgi as Eric suggested. > And the following sourceware lists: > > archer, bfd, binutils, buildbot, bunsen, bzip2-devel, c++-embedded, > cgen, crossgcc, debugedit, docbook-tools-announce, > docbook-tools-discuss, dominion-hackers, dwz, eclipse, ecos-announce, > ecos-devel, ecos-discuss, ecos-maintainers, ecos-patches, > elfutils-devel, elix, elix-announce, frysk, gas2, gdb, gdb-announce, > gdb-patches, gnats-announce, gnats-devel, gnu-gabi, gsl-announce, > gsl-discuss, guile-emacs, guile-gtk, infinity, insight, > insight-announce, installshell, kawa, libabigail, libc-alpha, > libc-announce, libc-hacker, libc-help, libc-locales, libc-ports, > libc-stable, libffi-announce, libffi-discuss, mauve-discuss, > mauve-patches, mingw-dvlpr, netresolve, newlib, patchutils-list, > prelink, pthreads-win32, rda, rhdb, rhdb-admin, rhdb-cc, rhdb-explain, > rhug-rhats, sharutils-alpha, sid, sid-announce, sourcenav, > sourcenav-announce, sourceware-announce, springfield, systemtap, > xconq-announce and xconq7 > > See also the > mailman.lists/{cygwin.com,gcc.gnu.org,sourceware.org}.lists.full lists > and import_{cygwin,gcc,sourceware}_from_mbox scripts in the inbox > homedir. It should be said that this increases the disk usage by ~3.5x. These lists used to take ~9GB of storage, now they take ~32GB. The total public-inbox lists storage is now 52GB. There is still 250G free space. And when we are happy we can reclaim the lists.old 9GB of storage. > I did remove the "test" list, and the cronjob that kept it > populated. But all other lists have been kept as V1 and basic > indexing. > > There are some lists which never seen any messages. I think we should > remove them because they probably won't see any messages ever. And it > makes looking for real lists more difficult (it is ~25% of the lists, > 97 out of the total 260 lists are just empty). > > anonymous, autobook-cvs, autobook-webpages-cvs, autoconf-cvs, > autoconf-webpages-cvs, binutils-webpages-cvs, bzip2-cvs, > bzip2-webpages-cvs, catapult-cvs, catapult-webpages-cvs, > c++-embedded-cvs, c++-embedded-webpages-cvs, cgen-prs, > cgen-webpages-cvs, cluster-webpages-cvs, cygwin-webpages-cvs, dm-cvs, > dm-webpages-cvs, docbook-tools-hackers, docbook-tools-webpages-cvs, > dominion-announce, dominion-cvs, dominion-discuss, > dominion-webpages-cvs, ecos-webpages-cvs, elix-cvs, elix-webpages-cvs, > gcc-cvs-testrun, gcc-maintainers, gcc-ppc, gcc-sc, gcc-testlist, > gdb-webpages-cvs, gettext-alpha, gettext-announce, > gettext-webpages-cvs, glibc-webpages-cvs, global, gnats-admin, > gnats-webpages-cvs, gsl-webpages-cvs, guile-emacs-cvs, > guile-webpages-cvs, insight-cvs, insight-webpages-cvs, inti-cvs, inti, > inti-webpages-cvs, ip-over-scsi-cvs, ip-over-scsi-webpages-cvs, > java-cvs, jffs2-webpages-cvs, kawa-cvs, kawa-webpages-cvs, libaio, > libaio-webpages-cvs, libc-alpha1, libffi-cvs, libffi-webpages-cvs, > libstdc++-webpages-cvs, mailer-daemon, mauve-announce, mauve-cvs, > mauve-webpages-cvs, newlib-webpages-cvs, piranha-webpages-cvs, > postmaster, prelink-svn, psim-cvs, psim-webpages-cvs, > pthreads-win32-cvs, pthreads-win32-webpages-cvs, rhdb-installer, > rhdb-jdbc, rhdb-utils, root, rpm2html, rpm2html-prs, > sharutils-announce, sharutils-cvs, sharutils-webpages-cvs, > sid-webpages-cvs, sourcemaster, sourcenav-prs, sourceware-cvs, > sourceware-cvs-sourceware, sourceware-cvs-sourceware-webpages, > sourceware-infra-cvs, sourceware-webpages-cvs, systemtap-webpages-cvs, > testcvs-cvs, testcvs-webpages-cvs, webmaster, win32-x11-cvs, > win32-x11-webpages-cvs, xconq-prs and xconq-webpages-cvs I removed them all. If people want them back it will be easy since they didn't contain any messages to begin with. > There are also the following 9 lists, which are either private (in > which case they show up as empty above) or not publicly advertised > lists: > > cygwin-xfree, cygwin-xfree-announce, gcc-sc, mailman, overseers, > postmaster, root, sourcemaster and test-list. > > The cygwin lists might still be interesting. Likewise for this list > overseers. But the others probably should be removed. I kept cygwin-xfree, cygwin-xfree-announce, overseers and test-list (which I used to test :) This leaves us with 162 public-inbox lists. Cheers, Mark ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: inbox.sourceware.org experiment 2022-08-18 13:50 ` Mark Wielaard 2022-08-18 14:40 ` Simon Marchi @ 2022-08-23 22:08 ` Mark Wielaard 2022-08-24 10:05 ` Mark Wielaard 1 sibling, 1 reply; 15+ messages in thread From: Mark Wielaard @ 2022-08-23 22:08 UTC (permalink / raw) To: Overseers mailing list; +Cc: Frank Ch. Eigler, Simon Marchi Hi, On Thu, Aug 18, 2022 at 03:50:54PM +0200, Mark Wielaard via Overseers wrote: > > > And do we need to update/regenerate > > > /etc/aliases.db and/or /etc/mailman/aliases.db ? > > > > The proposal is to not touch /etc/aliases* NOR /etc/mailman/aliases*. > > The proposal is to generate a new file like > > /etc/postfix/mailman-inbox-aliases from /etc/mailman/aliases. That > > new file would be the one postfix would read. It could be texthash: > > rather than hash: so postmap would not even be necessary for updates. > > That depends on whether the relevant alias-expansion postfix process > > is short- or long-lived. > > OK, I see the following in /etc/postfix.main: > > # CGF 2020-03-08 12:49 > alias_maps = hash:/etc/aliases, hash:/etc/mailman/aliases > > # CGF 2020-03-18 14:10 EST - newaliases wasn't affecting /etc/mailman/aliases > alias_database = hash:/etc/aliases, hash:/etc/mailman/aliases > > So I assume calling newaliases regenerates the hash/.db files. > > I can write a script to generate mailman-inbox-aliases this weekend > when I have stable internet access again. Will post to the list before > installing to make sure I don't accidentially break something. Sorry this took a bit longer. But I wanted to make sure I got it right. I solved it slightly simpler by installing a /home/inbox/.forward with: |/usr/bin/public-inbox-mda And then simply add the inbox user as extra recipient. So the STANZA looks like: # STANZA START: test-list # CREATED: Sat Mar 7 13:49:45 2020 test-list: "|/usr/local/mailman/mailman post test-list", inbox test-list-bounces: "|/usr/local/mailman/mailman bounces test-list" test-list-confirm: "|/usr/local/mailman/mailman confirm test-list" test-list-join: "|/usr/local/mailman/mailman join test-list" test-list-leave: "|/usr/local/mailman/mailman leave test-list" test-list-owner: "|/usr/local/mailman/mailman owner test-list" test-list-request: "|/usr/local/mailman/mailman request test-list" test-list-subscribe: "|/usr/local/mailman/mailman subscribe test-list" test-list-unsubscribe: "|/usr/local/mailman/mailman unsubscribe test-list" # STANZA END: test-list The script to generate those is in /etc/mailman/mailman-aliases-to-inbox.sh And the postfix main.cf has been updated to use the generated /etc/mailman/aliases-inbox The only thing I don't know is how to automate the /etc/mailman/mailman-aliases-to-inbox.sh running when new lists are added. Should this be a mailman trigger or cronjob check? Thanks, Mark ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: inbox.sourceware.org experiment 2022-08-23 22:08 ` Mark Wielaard @ 2022-08-24 10:05 ` Mark Wielaard 2022-08-24 21:06 ` Mark Wielaard 0 siblings, 1 reply; 15+ messages in thread From: Mark Wielaard @ 2022-08-24 10:05 UTC (permalink / raw) To: Overseers mailing list; +Cc: Simon Marchi Hi, On Wed, Aug 24, 2022 at 12:08:51AM +0200, Mark Wielaard via Overseers wrote: > Sorry this took a bit longer. But I wanted to make sure I got it right. > I solved it slightly simpler by installing a /home/inbox/.forward with: > |/usr/bin/public-inbox-mda This is now |/home/inbox/public-inbox-mda-true.sh Which does: /usr/bin/public-inbox-mda --no-precheck 2>&1 | ts >> /home/inbox/log/public-inbox-mda.out.log || true || true to make sure any errors don't cause a bounce. --no-precheck because public-inbox-mda is very picky and rejects various emails that seem just fine. And a timestamped log of errors goes to /home/inbox/log/public-inbox-mda.out.log > And then simply add the inbox user as extra recipient. So the STANZA > looks like: > > # STANZA START: test-list > # CREATED: Sat Mar 7 13:49:45 2020 > test-list: "|/usr/local/mailman/mailman post test-list", inbox > test-list-bounces: "|/usr/local/mailman/mailman bounces test-list" > test-list-confirm: "|/usr/local/mailman/mailman confirm test-list" > test-list-join: "|/usr/local/mailman/mailman join test-list" > test-list-leave: "|/usr/local/mailman/mailman leave test-list" > test-list-owner: "|/usr/local/mailman/mailman owner test-list" > test-list-request: "|/usr/local/mailman/mailman request test-list" > test-list-subscribe: "|/usr/local/mailman/mailman subscribe test-list" > test-list-unsubscribe: "|/usr/local/mailman/mailman unsubscribe test-list" > # STANZA END: test-list > > The script to generate those is in > /etc/mailman/mailman-aliases-to-inbox.sh > And the postfix main.cf has been updated to use the generated > /etc/mailman/aliases-inbox This was a little too naive, public-inbox-mda does ignore emails to addresses it doesn't know about, but some addresses generated odd/bad loops. In particular the "root" list (now removed by Frank) and the mailman and postmaster lists (I removed the inbox recipient by hand). The script really should be updated to only add inbox to those mailman post lists it is archiving. > The only thing I don't know is how to automate the > /etc/mailman/mailman-aliases-to-inbox.sh running when new lists are > added. Should this be a mailman trigger or cronjob check? So once automated make sure the above changes are also done automatically. I noticed two issues some lists seem to have a bad/corrupt xapian database and generate an error while indexing (gcc-patches). emails with slashes / in the Message-ID sometimes get wrongly escaped and appear to not be in the archive while they really are. e.g. the message I am replying to shows as: https://inbox.sourceware.org/overseers/YwVP8+LHvyLzUG%2F+@wildebeest.org/ But should be: https://inbox.sourceware.org/overseers/YwVP8+LHvyLzUG/+@wildebeest.org/ Cheers, Mark ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: inbox.sourceware.org experiment 2022-08-24 10:05 ` Mark Wielaard @ 2022-08-24 21:06 ` Mark Wielaard 0 siblings, 0 replies; 15+ messages in thread From: Mark Wielaard @ 2022-08-24 21:06 UTC (permalink / raw) To: Overseers mailing list; +Cc: Simon Marchi Hi, On Wed, Aug 24, 2022 at 12:05:03PM +0200, Mark Wielaard via Overseers wrote: > I noticed two issues some lists seem to have a bad/corrupt xapian > database and generate an error while indexing (gcc-patches). I tried reindexing and compacting the largest lists. This did not help. But the compacting did reduce the disk size of the xapian indexes by 10GB (!). There is now a bit more logging in /home/inbox/logs/public-inbox-mda.out.log It looks like this error: rollback ineffective with AutoCommit enabled at /usr/share/perl5/vendor_perl/PublicInbox/V2Writable.pm line 621. checkpoint: Exception: Error writing block 147232 shard close: Exception: Error writing block 147236 Only happens after importing a new gcc-patches message. The message isn't fully indexed, but can be referenced normally. It won't show up in full text searches though. I haven't figured out why. I'll ask upstream how the better debug this. > emails with slashes / in the Message-ID sometimes get wrongly > escaped and appear to not be in the archive while they really are. > e.g. the message I am replying to shows as: > https://inbox.sourceware.org/overseers/YwVP8+LHvyLzUG%2F+@wildebeest.org/ > But should be: > https://inbox.sourceware.org/overseers/YwVP8+LHvyLzUG/+@wildebeest.org/ This isn't a big deal except when the / is at the end of the Message-ID. Which unfortunately happens for bugzilla emails which end in @http.sourceware.org/bugzilla/ that last slash seems to be a real problem. Don't know a workaround for that yet. You see public-inbox does know about the Message-ID by searching for: https://inbox.sourceware.org/libabigail/bug-29464-9487@http.sourceware.org/bugzilla// Which will suggest that actual URL as "partial match" but then when following that link the slashes get escaped again... Will ask upstream if there is any solution for this. Finally there are some lists that accept HTML emails (by stripping off the HTML part). public-inbox however simply rejects those emails. *** We only accept plain-text mail, No HTML *** Again, we should ask upstream if there could be an option to accept just the plain/text part of such emails. Note that such emails do end up in the .public-inbox/emergency mailbox so in theory we could remove the text/html part and then reinsert the message. So there are some issues, but in general I think it works just fine now. Cheers, Mark ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2022-08-24 21:06 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-13 14:14 inbox.sourceware.org experiment Mark Wielaard 2022-08-15 13:00 ` Mark Wielaard 2022-08-16 21:36 ` Mark Wielaard 2022-08-16 22:10 ` Frank Ch. Eigler 2022-08-17 12:25 ` Mark Wielaard 2022-08-17 13:24 ` Frank Ch. Eigler 2022-08-17 21:18 ` Mark Wielaard 2022-08-17 21:33 ` Frank Ch. Eigler 2022-08-18 13:50 ` Mark Wielaard 2022-08-18 14:40 ` Simon Marchi 2022-08-21 17:41 ` Mark Wielaard 2022-08-23 20:15 ` Mark Wielaard 2022-08-23 22:08 ` Mark Wielaard 2022-08-24 10:05 ` Mark Wielaard 2022-08-24 21:06 ` Mark Wielaard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).