* Re: Stability of pipermail ml archive URLs
2020-05-06 14:44 ` Frank Ch. Eigler
@ 2020-05-06 14:54 ` Arseny Solokha
2020-05-06 15:24 ` Christopher Faylor
2020-05-06 14:55 ` Arseny Solokha
2020-05-07 9:48 ` Thomas Schwinge
2 siblings, 1 reply; 18+ messages in thread
From: Arseny Solokha @ 2020-05-06 14:54 UTC (permalink / raw)
To: Frank Ch. Eigler, Jakub Jelinek, Overseers mailing list; +Cc: GCC Development
Hi,
>> https://gcc.gnu.org/pipermail/gcc/2020-February/232205.html
>> Looking around, the last two months of gcc now have very small
>> numbers, but e.g. on gcc-patches the mails have very high numbers like
>> 545238.html. Can pipermail provide stable URLs at all? We really
>> need those, we reference those in commit messages, other mails, bugzilla
>> etc.
>
> Argh, that is a problem, sorry. We get mailman to regenerate web
> archives for example in the case of spam that has gone through. Our
> recipe has been to delete the spam from the apropriate .mbox, but this
> does renumber things.
>
> The big vs. little numbers are probably an accidental function of
> whether the email .mbox files were processed chronologically or not.
> I'll tweak the mrefresh script to make sure it's chronological; that
> should avoid gross jumps like that. I believe gcc-patches just wasn't
> regenerated for spam removal whereas others have. There should not be
> gross jumps in the future, except we'll have to regenerate everything
> one more time. :-(
>
> Small jumps though --- darn, we'd have to do something else with spam
> in the mbox, maybe replace it somehow in situ with something else. Or
> catch it so quickly that subsequent URLs aren't archived anywhere
> important.
>
> It would be good to have another way of making permanent URLs for
> individual messages in mailing list archives.
may I also chime in with a related (to some extent), even though a separate
issue? It seems URL rewriting rules designed to replace old-style
https://gcc.gnu.org/ml/<list name>/current
URLs pointing to monthly digests to current ones
https://gcc.gnu.org/pipermail/<list name>/<year-month>/date.html#end
broke with onset of May. I mean, if I type
https://gcc.gnu.org/ml/gcc/current
I still get
https://gcc.gnu.org/pipermail/gcc/2020-April/date.html#end
(note 2020-April) instead.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Stability of pipermail ml archive URLs
2020-05-06 14:44 ` Frank Ch. Eigler
2020-05-06 14:54 ` Arseny Solokha
@ 2020-05-06 14:55 ` Arseny Solokha
2020-05-07 9:48 ` Thomas Schwinge
2 siblings, 0 replies; 18+ messages in thread
From: Arseny Solokha @ 2020-05-06 14:55 UTC (permalink / raw)
To: Frank Ch. Eigler, Jakub Jelinek, Overseers mailing list; +Cc: GCC Development
Hi,
>> https://gcc.gnu.org/pipermail/gcc/2020-February/232205.html
>> Looking around, the last two months of gcc now have very small
>> numbers, but e.g. on gcc-patches the mails have very high numbers like
>> 545238.html. Can pipermail provide stable URLs at all? We really
>> need those, we reference those in commit messages, other mails, bugzilla
>> etc.
>
> Argh, that is a problem, sorry. We get mailman to regenerate web
> archives for example in the case of spam that has gone through. Our
> recipe has been to delete the spam from the apropriate .mbox, but this
> does renumber things.
>
> The big vs. little numbers are probably an accidental function of
> whether the email .mbox files were processed chronologically or not.
> I'll tweak the mrefresh script to make sure it's chronological; that
> should avoid gross jumps like that. I believe gcc-patches just wasn't
> regenerated for spam removal whereas others have. There should not be
> gross jumps in the future, except we'll have to regenerate everything
> one more time. :-(
>
> Small jumps though --- darn, we'd have to do something else with spam
> in the mbox, maybe replace it somehow in situ with something else. Or
> catch it so quickly that subsequent URLs aren't archived anywhere
> important.
>
> It would be good to have another way of making permanent URLs for
> individual messages in mailing list archives.
may I also chime in with a related (to some extent), even though a separate
issue? It seems URL rewriting rules designed to replace old-style
https://gcc.gnu.org/ml/<list name>/current
URLs pointing to monthly digests to current ones
https://gcc.gnu.org/pipermail/<list name>/<year-month>/date.html#end
broke with onset of May. I mean, if I type
https://gcc.gnu.org/ml/gcc/current
I still get
https://gcc.gnu.org/pipermail/gcc/2020-April/date.html#end
(note 2020-April) instead.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Stability of pipermail ml archive URLs
2020-05-06 14:44 ` Frank Ch. Eigler
2020-05-06 14:54 ` Arseny Solokha
2020-05-06 14:55 ` Arseny Solokha
@ 2020-05-07 9:48 ` Thomas Schwinge
2020-05-07 10:14 ` Frank Ch. Eigler
` (2 more replies)
2 siblings, 3 replies; 18+ messages in thread
From: Thomas Schwinge @ 2020-05-07 9:48 UTC (permalink / raw)
To: Frank Ch. Eigler, Jakub Jelinek, overseers; +Cc: gcc
Hi!
On 2020-05-06T10:44:46-0400, "Frank Ch. Eigler via Gcc" <gcc@gcc.gnu.org> wrote:
>> Can pipermail provide stable URLs at all? We really
>> need those, we reference those in commit messages, other mails, bugzilla
>> etc.
> It would be good to have another way of making permanent URLs for
> individual messages in mailing list archives.
Look up by Message-ID?
<http://mid.mail-archive.com/20200506141139.GJ2375@tucnak>, for example.
See <https://en.wikipedia.org/wiki/Message-ID>, etc. The idea is that
for all practical purposes, Message-IDs are "sufficiently unique".
(Compare conceptually to the Git SHA-1 hashes.)
Such a service is not currently available on sourceware, but it'd be
possible to implement: as messages come in, you'd build a database
mapping from the Message-ID header to "current Mailman's Pipermail URL".
(That's one reason why when posting such links I used to use Gmane's
Message-ID lookup, now using The Mail Archive's. The other reason is
that compared to Mailman's Pipermail these services don't artificially
break discussion threads at month boundaries.)
By the way, the public-inbox software
(<https://public-inbox.org/README.html>), as recently mentioned in a
different thread discussing deficiencies of Mailman's Pipermail, also
does support this:
<https://public-inbox.org/libc-alpha/129c8494-bfd0-87f0-ddb5-e56f6d4a6e0c@gotplt.org>
(random example). (I have not yet really looked into that software
myself, but from the little I read about it, it seems conceptually
simple, "easy", good.)
If there's sufficient interest (users) and commitment (overseers), we
could install this on sourceware, in addition to what we've currently
got?
Grüße
Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Stability of pipermail ml archive URLs
2020-05-07 9:48 ` Thomas Schwinge
@ 2020-05-07 10:14 ` Frank Ch. Eigler
2020-05-07 15:54 ` Christopher Faylor
2020-05-07 15:48 ` Christopher Faylor
2020-05-07 19:23 ` Segher Boessenkool
2 siblings, 1 reply; 18+ messages in thread
From: Frank Ch. Eigler @ 2020-05-07 10:14 UTC (permalink / raw)
To: Thomas Schwinge; +Cc: Jakub Jelinek, overseers, gcc
Hi -
> Such a service is not currently available on sourceware, but it'd be
> possible to implement: as messages come in, you'd build a database
> mapping from the Message-ID header to "current Mailman's Pipermail URL".
I was thinking we might be able to trick pipermail (the web archiver
component) to simply name the message web urls after some function of
the message-id instead of the sequence number. Will give this a try
very shortly.
- FChE
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Stability of pipermail ml archive URLs
2020-05-07 10:14 ` Frank Ch. Eigler
@ 2020-05-07 15:54 ` Christopher Faylor
2020-05-07 17:56 ` Frank Ch. Eigler
0 siblings, 1 reply; 18+ messages in thread
From: Christopher Faylor @ 2020-05-07 15:54 UTC (permalink / raw)
To: overseers, gcc
On Thu, May 07, 2020 at 06:14:55AM -0400, Frank Ch. Eigler wrote:
>>Such a service is not currently available on sourceware, but it'd be
>>possible to implement: as messages come in, you'd build a database
>>mapping from the Message-ID header to "current Mailman's Pipermail
>>URL".
>
>I was thinking we might be able to trick pipermail (the web archiver
>component) to simply name the message web urls after some function of
>the message-id instead of the sequence number. Will give this a try
>very shortly.
I just want to go on record as saying that I think this is a bad idea.
We can fix this problem simply without redesigning pipermail. The
problem that we're seeing is caused by a script that I wrote to migrate
ezmlm to mailman. The fix for the problem is "Don't run that script".
But, if we are going to make this level of change to pipermail we might
as well go wild and just implement all of the other things that people
want and forget about our supposed desire to use "supported" software.
Changing pipermail to use message-id's rather than sequence numbers
negates the argument that we want to be standard since we likely won't
be able to get this change in upstream. I doubt that mailman2
developers will want to consider this major a change in a product that
is supposedly close to EOL.
cgf
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Stability of pipermail ml archive URLs
2020-05-07 15:54 ` Christopher Faylor
@ 2020-05-07 17:56 ` Frank Ch. Eigler
2020-05-07 18:27 ` Christopher Faylor
0 siblings, 1 reply; 18+ messages in thread
From: Frank Ch. Eigler @ 2020-05-07 17:56 UTC (permalink / raw)
To: overseers, gcc
Hi -
> >I was thinking we might be able to trick pipermail (the web archiver
> >component) to simply name the message web urls after some function of
> >the message-id instead of the sequence number. Will give this a try
> >very shortly.
>
> I just want to go on record as saying that I think this is a bad idea.
> We can fix this problem simply without redesigning pipermail.
If the idea requires more than a dozenish lines of code, then I agree
it's not worth doing. "redesigning" - indeed no thanks.
> The problem that we're seeing is caused by a script that I wrote to
> migrate ezmlm to mailman. The fix for the problem is "Don't run
> that script".
Yeah, but that is the official mailman2 method for this. Spam/malware
that gets through can sit in multiple locations unless we clean it out
in the proper thorough manner, through the entire pipeline (which
starts with the mbox files). Not super keen on building much
complexity that operates on all the intermediate results and html
files.
- FChE
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Stability of pipermail ml archive URLs
2020-05-07 17:56 ` Frank Ch. Eigler
@ 2020-05-07 18:27 ` Christopher Faylor
0 siblings, 0 replies; 18+ messages in thread
From: Christopher Faylor @ 2020-05-07 18:27 UTC (permalink / raw)
To: overseers, gcc
On Thu, May 07, 2020 at 01:56:04PM -0400, Frank Ch. Eigler wrote:
>>>I was thinking we might be able to trick pipermail (the web archiver
>>>component) to simply name the message web urls after some function of
>>>the message-id instead of the sequence number. Will give this a try
>>>very shortly.
>>
>>I just want to go on record as saying that I think this is a bad idea.
>>We can fix this problem simply without redesigning pipermail.
>
>If the idea requires more than a dozenish lines of code, then I agree
>it's not worth doing. "redesigning" - indeed no thanks.
I'd call a major change to the way that mailman archives files a
"redesign".
>>The problem that we're seeing is caused by a script that I wrote to
>>migrate ezmlm to mailman. The fix for the problem is "Don't run that
>>script".
>
>Yeah, but that is the official mailman2 method for this.
One recommended method is to edit the mbox file and leave the message
around but blank and then regenerate the archive. But, that could cause
renumbering issues.
They also mention what I'm suggesting - edit the mbox and html files and
leave the content blank. You'd have to be careful not to step on incoming
email in that scenario, of course.
https://wiki.list.org/DOC/How%20can%20I%20remove%20a%20post%20from%20the%20list%20archive%20or%20remove%20an%20entire%20archive%3F
The above mentions that the message would be in three places which are
easily editable. There is also prev and next links which apparently
live in a database but there are scripts available to fix that too.
Spam used to be in multiple places when we were running ezmlm. It never
occurred to me that we needed to modify ezmlm to deal with the issue. I
used to get rid of viruses using a "mlzap" script that hit the right
files. That technique should work here too.
OTOH, maybe we should just give up on mailman2 and move to something
more modern even if we can't use dnf to install it on RHEL. I'm surely
not a fan of mailman2. If we have to do head-standing to get it to work
the way we want then maybe we should just move on and forget that we
said we wanted to use something "stable".
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Stability of pipermail ml archive URLs
2020-05-07 9:48 ` Thomas Schwinge
2020-05-07 10:14 ` Frank Ch. Eigler
@ 2020-05-07 15:48 ` Christopher Faylor
2020-05-07 19:23 ` Segher Boessenkool
2 siblings, 0 replies; 18+ messages in thread
From: Christopher Faylor @ 2020-05-07 15:48 UTC (permalink / raw)
To: gcc, overseers
On Thu, May 07, 2020 at 11:48:18AM +0200, Thomas Schwinge wrote:
>On 2020-05-06T10:44:46-0400, "Frank Ch. Eigler via Gcc"
><gcc@gcc.gnu.org> wrote:
>>>Can pipermail provide stable URLs at all? We really need those, we
>>>reference those in commit messages, other mails, bugzilla etc.
>
>>It would be good to have another way of making permanent URLs for
>>individual messages in mailing list archives.
>
>Look up by Message-ID?
><http://mid.mail-archive.com/20200506141139.GJ2375@tucnak>, for
>example. See <https://en.wikipedia.org/wiki/Message-ID>, etc. The
>idea is that for all practical purposes, Message-IDs are "sufficiently
>unique". (Compare conceptually to the Git SHA-1 hashes.)
IMO, we're making way too big a deal out of this. The message archives
are changing because we are resequencing them. Mailman doesn't, AFAIK,
take it upon itself to randomly renumber them. fche and cgf have been
renumbering them when we remove spam.
If we stopped doing that there would be no issue.
When we were using ezmlm, I was careful not to remove message files when
dealing with spam. We haven't been that careful with mailman and, so,
we're seeing problems.
If we just changed the way that we deal with spam to keep the message
around but blank it out, we wouldn't have this problem.
In addition, when I was migrating the mail archives from ezmlm to mailman
I came across a number of cases where the same message-id was used in
two messages. Possibly it was someone just bouncing email or maybe
it was something else.
Maybe it's a corner case but we wouldn't have to worry about this at all
if we just used mailman's current numbering and didn't ever take it upon
ourselves to rescan the archives.
cgf
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Stability of pipermail ml archive URLs
2020-05-07 9:48 ` Thomas Schwinge
2020-05-07 10:14 ` Frank Ch. Eigler
2020-05-07 15:48 ` Christopher Faylor
@ 2020-05-07 19:23 ` Segher Boessenkool
2020-05-07 20:28 ` Christopher Faylor
2 siblings, 1 reply; 18+ messages in thread
From: Segher Boessenkool @ 2020-05-07 19:23 UTC (permalink / raw)
To: Thomas Schwinge; +Cc: Frank Ch. Eigler, Jakub Jelinek, overseers, gcc
Hi!
On Thu, May 07, 2020 at 11:48:18AM +0200, Thomas Schwinge wrote:
> By the way, the public-inbox software
> (<https://public-inbox.org/README.html>), as recently mentioned in a
> different thread discussing deficiencies of Mailman's Pipermail, also
> does support this:
> <https://public-inbox.org/libc-alpha/129c8494-bfd0-87f0-ddb5-e56f6d4a6e0c@gotplt.org>
> (random example). (I have not yet really looked into that software
> myself, but from the little I read about it, it seems conceptually
> simple, "easy", good.)
>
> If there's sufficient interest (users) and commitment (overseers), we
> could install this on sourceware, in addition to what we've currently
> got?
I would very much like this. *All* of the problems with the current
mail archive, as well as all of the problems with the one we had before,
do not exist with public-inbox.
(It probably has problems all of its own, of course ;-) )
Segher
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Stability of pipermail ml archive URLs
2020-05-07 19:23 ` Segher Boessenkool
@ 2020-05-07 20:28 ` Christopher Faylor
0 siblings, 0 replies; 18+ messages in thread
From: Christopher Faylor @ 2020-05-07 20:28 UTC (permalink / raw)
To: gcc, overseers
On Thu, May 07, 2020 at 02:23:30PM -0500, Segher Boessenkool wrote:
>On Thu, May 07, 2020 at 11:48:18AM +0200, Thomas Schwinge wrote:
>>By the way, the public-inbox software
>>(<https://public-inbox.org/README.html>), as recently mentioned in a
>>different thread discussing deficiencies of Mailman's Pipermail, also
>>does support this:
>><https://public-inbox.org/libc-alpha/129c8494-bfd0-87f0-ddb5-e56f6d4a6e0c@gotplt.org>
>>(random example). (I have not yet really looked into that software
>>myself, but from the little I read about it, it seems conceptually
>>simple, "easy", good.)
>>
>>If there's sufficient interest (users) and commitment (overseers), we
>>could install this on sourceware, in addition to what we've currently
>>got?
>
>I would very much like this. *All* of the problems with the current
>mail archive, as well as all of the problems with the one we had
>before, do not exist with public-inbox.
>
>(It probably has problems all of its own, of course ;-) )
It's been suggested many times both before we rolled out the new
sourceware and after.
I'm not a real fan of the interface but at least it's being supported.
It's just not supported in RHEL 8 right now, as far as I know.
To reiterate our current philosophy: We're trying to use supported
software on sourceware and not have to roll our own and worry about
keeping track of upstream fixes and security issues.
^ permalink raw reply [flat|nested] 18+ messages in thread