* [Bug Infrastructure/30436] New: inbox: strip HTML attachements
@ 2023-05-09 21:55 mark at klomp dot org
2023-06-23 14:22 ` [Bug Infrastructure/30436] " mark at klomp dot org
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: mark at klomp dot org @ 2023-05-09 21:55 UTC (permalink / raw)
To: overseers
https://sourceware.org/bugzilla/show_bug.cgi?id=30436
Bug ID: 30436
Summary: inbox: strip HTML attachements
Product: sourceware
Version: unspecified
Status: NEW
Severity: normal
Priority: P2
Component: Infrastructure
Assignee: overseers at sourceware dot org
Reporter: mark at klomp dot org
Target Milestone: ---
Currently public-inbox just drops emails that have HTML.
public-inbox-mda says:
May 09 20:40:46 *** We only accept plain-text mail, No HTML ***
This is fairly hardcoded into public-inbox.
So we might want to add a filter in front of public-inbox-mda that filters out
any text/html attachments like mailman does.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug Infrastructure/30436] inbox: strip HTML attachements
2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org
@ 2023-06-23 14:22 ` mark at klomp dot org
2023-07-01 22:04 ` mark at klomp dot org
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: mark at klomp dot org @ 2023-06-23 14:22 UTC (permalink / raw)
To: overseers
https://sourceware.org/bugzilla/show_bug.cgi?id=30436
--- Comment #1 from Mark Wielaard <mark at klomp dot org> ---
mimedefang has been installed but not yet configured to do the actual stripping
for the inbox user.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug Infrastructure/30436] inbox: strip HTML attachements
2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org
2023-06-23 14:22 ` [Bug Infrastructure/30436] " mark at klomp dot org
@ 2023-07-01 22:04 ` mark at klomp dot org
2023-07-09 14:10 ` mark at klomp dot org
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: mark at klomp dot org @ 2023-07-01 22:04 UTC (permalink / raw)
To: overseers
https://sourceware.org/bugzilla/show_bug.cgi?id=30436
--- Comment #2 from Mark Wielaard <mark at klomp dot org> ---
With mimedefang we could use the following simple mimedefang-filter:
# -*- Perl -*-
sub filter_end {
my($entity) = @_;
remove_redundant_html_parts($entity);
}
# DO NOT delete the next line, or Perl will complain.
1;
But it isn't clear to me how/if we can use the milter setup to only filter
messages sent to the inbox user, or how to integrate it into the inbox .forward
filter /home/inbox/public-inbox-mda-true.sh
Running mimedefang.pl directly by hand seems to work, but then we need another
wrapper to setup the COMMANDS and interpret the RESULTS as described in
mimedefang-protocol. Maybe such a wrapper already exists?
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug Infrastructure/30436] inbox: strip HTML attachements
2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org
2023-06-23 14:22 ` [Bug Infrastructure/30436] " mark at klomp dot org
2023-07-01 22:04 ` mark at klomp dot org
@ 2023-07-09 14:10 ` mark at klomp dot org
2023-07-09 19:06 ` mark at klomp dot org
2023-07-10 9:08 ` mark at klomp dot org
4 siblings, 0 replies; 6+ messages in thread
From: mark at klomp dot org @ 2023-07-09 14:10 UTC (permalink / raw)
To: overseers
https://sourceware.org/bugzilla/show_bug.cgi?id=30436
--- Comment #3 from Mark Wielaard <mark at klomp dot org> ---
Created attachment 14957
--> https://sourceware.org/bugzilla/attachment.cgi?id=14957&action=edit
remove_redundant_html_parts.pl filter
Trying to use the milter interface might be tricky. But the actual
functionality required from mimedefang can be easily extracted. The attached
remove_redundant_html_parts.pl script acts as a filter that takes as input an
email and either outputs that original email or the email with redundant html
parts removed.
This could be used as filter to public-inbox-mda
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug Infrastructure/30436] inbox: strip HTML attachements
2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org
` (2 preceding siblings ...)
2023-07-09 14:10 ` mark at klomp dot org
@ 2023-07-09 19:06 ` mark at klomp dot org
2023-07-10 9:08 ` mark at klomp dot org
4 siblings, 0 replies; 6+ messages in thread
From: mark at klomp dot org @ 2023-07-09 19:06 UTC (permalink / raw)
To: overseers
https://sourceware.org/bugzilla/show_bug.cgi?id=30436
Mark Wielaard <mark at klomp dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
--- Comment #4 from Mark Wielaard <mark at klomp dot org> ---
(In reply to Mark Wielaard from comment #3)
> Created attachment 14957 [details]
> remove_redundant_html_parts.pl filter
>
> This could be used as filter to public-inbox-mda
This has been installed now as filter-public-inbox-mda-true.sh which is the
.forward script for the inbox calling public-inbox-mda. It seems to work as
intended.
We do still have to (re)import old (rejected by public-inbox) emails containing
HTML. Those are in the pipermail archives (already stripped).
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug Infrastructure/30436] inbox: strip HTML attachements
2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org
` (3 preceding siblings ...)
2023-07-09 19:06 ` mark at klomp dot org
@ 2023-07-10 9:08 ` mark at klomp dot org
4 siblings, 0 replies; 6+ messages in thread
From: mark at klomp dot org @ 2023-07-10 9:08 UTC (permalink / raw)
To: overseers
https://sourceware.org/bugzilla/show_bug.cgi?id=30436
Mark Wielaard <mark at klomp dot org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #5 from Mark Wielaard <mark at klomp dot org> ---
(In reply to Mark Wielaard from comment #4)
> We do still have to (re)import old (rejected by public-inbox) emails
> containing HTML. Those are in the pipermail archives (already stripped).
This was done overnight using the .public-inbox/emergency mailbox (which stores
all rejected messages):
for i in .public-inbox/emergency/cur/*; do orig_to=$(grep ^X-Original-To: $i |
cut -f2 -d\ ); export ORIGINAL_RECIPIENT="$orig_to"; cat $i |
/home/inbox/remove_redundant_html_parts.pl | /usr/bin/public-inbox-mda
--no-precheck; fi; done
which was also a good test of the remove_redundant_html_parts.pl script.
A quick inspection of inbox.sourceware.org now shows messages with (redundant)
HTML parts are now archives as they were with pipermail.
--
You are receiving this mail because:
You are the assignee for the bug.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-07-10 9:08 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org
2023-06-23 14:22 ` [Bug Infrastructure/30436] " mark at klomp dot org
2023-07-01 22:04 ` mark at klomp dot org
2023-07-09 14:10 ` mark at klomp dot org
2023-07-09 19:06 ` mark at klomp dot org
2023-07-10 9:08 ` mark at klomp dot org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).