* [Bug Infrastructure/30436] New: inbox: strip HTML attachements @ 2023-05-09 21:55 mark at klomp dot org 2023-06-23 14:22 ` [Bug Infrastructure/30436] " mark at klomp dot org ` (4 more replies) 0 siblings, 5 replies; 6+ messages in thread From: mark at klomp dot org @ 2023-05-09 21:55 UTC (permalink / raw) To: overseers https://sourceware.org/bugzilla/show_bug.cgi?id=30436 Bug ID: 30436 Summary: inbox: strip HTML attachements Product: sourceware Version: unspecified Status: NEW Severity: normal Priority: P2 Component: Infrastructure Assignee: overseers at sourceware dot org Reporter: mark at klomp dot org Target Milestone: --- Currently public-inbox just drops emails that have HTML. public-inbox-mda says: May 09 20:40:46 *** We only accept plain-text mail, No HTML *** This is fairly hardcoded into public-inbox. So we might want to add a filter in front of public-inbox-mda that filters out any text/html attachments like mailman does. -- You are receiving this mail because: You are the assignee for the bug. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug Infrastructure/30436] inbox: strip HTML attachements 2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org @ 2023-06-23 14:22 ` mark at klomp dot org 2023-07-01 22:04 ` mark at klomp dot org ` (3 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: mark at klomp dot org @ 2023-06-23 14:22 UTC (permalink / raw) To: overseers https://sourceware.org/bugzilla/show_bug.cgi?id=30436 --- Comment #1 from Mark Wielaard <mark at klomp dot org> --- mimedefang has been installed but not yet configured to do the actual stripping for the inbox user. -- You are receiving this mail because: You are the assignee for the bug. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug Infrastructure/30436] inbox: strip HTML attachements 2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org 2023-06-23 14:22 ` [Bug Infrastructure/30436] " mark at klomp dot org @ 2023-07-01 22:04 ` mark at klomp dot org 2023-07-09 14:10 ` mark at klomp dot org ` (2 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: mark at klomp dot org @ 2023-07-01 22:04 UTC (permalink / raw) To: overseers https://sourceware.org/bugzilla/show_bug.cgi?id=30436 --- Comment #2 from Mark Wielaard <mark at klomp dot org> --- With mimedefang we could use the following simple mimedefang-filter: # -*- Perl -*- sub filter_end { my($entity) = @_; remove_redundant_html_parts($entity); } # DO NOT delete the next line, or Perl will complain. 1; But it isn't clear to me how/if we can use the milter setup to only filter messages sent to the inbox user, or how to integrate it into the inbox .forward filter /home/inbox/public-inbox-mda-true.sh Running mimedefang.pl directly by hand seems to work, but then we need another wrapper to setup the COMMANDS and interpret the RESULTS as described in mimedefang-protocol. Maybe such a wrapper already exists? -- You are receiving this mail because: You are the assignee for the bug. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug Infrastructure/30436] inbox: strip HTML attachements 2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org 2023-06-23 14:22 ` [Bug Infrastructure/30436] " mark at klomp dot org 2023-07-01 22:04 ` mark at klomp dot org @ 2023-07-09 14:10 ` mark at klomp dot org 2023-07-09 19:06 ` mark at klomp dot org 2023-07-10 9:08 ` mark at klomp dot org 4 siblings, 0 replies; 6+ messages in thread From: mark at klomp dot org @ 2023-07-09 14:10 UTC (permalink / raw) To: overseers https://sourceware.org/bugzilla/show_bug.cgi?id=30436 --- Comment #3 from Mark Wielaard <mark at klomp dot org> --- Created attachment 14957 --> https://sourceware.org/bugzilla/attachment.cgi?id=14957&action=edit remove_redundant_html_parts.pl filter Trying to use the milter interface might be tricky. But the actual functionality required from mimedefang can be easily extracted. The attached remove_redundant_html_parts.pl script acts as a filter that takes as input an email and either outputs that original email or the email with redundant html parts removed. This could be used as filter to public-inbox-mda -- You are receiving this mail because: You are the assignee for the bug. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug Infrastructure/30436] inbox: strip HTML attachements 2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org ` (2 preceding siblings ...) 2023-07-09 14:10 ` mark at klomp dot org @ 2023-07-09 19:06 ` mark at klomp dot org 2023-07-10 9:08 ` mark at klomp dot org 4 siblings, 0 replies; 6+ messages in thread From: mark at klomp dot org @ 2023-07-09 19:06 UTC (permalink / raw) To: overseers https://sourceware.org/bugzilla/show_bug.cgi?id=30436 Mark Wielaard <mark at klomp dot org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED --- Comment #4 from Mark Wielaard <mark at klomp dot org> --- (In reply to Mark Wielaard from comment #3) > Created attachment 14957 [details] > remove_redundant_html_parts.pl filter > > This could be used as filter to public-inbox-mda This has been installed now as filter-public-inbox-mda-true.sh which is the .forward script for the inbox calling public-inbox-mda. It seems to work as intended. We do still have to (re)import old (rejected by public-inbox) emails containing HTML. Those are in the pipermail archives (already stripped). -- You are receiving this mail because: You are the assignee for the bug. ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Bug Infrastructure/30436] inbox: strip HTML attachements 2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org ` (3 preceding siblings ...) 2023-07-09 19:06 ` mark at klomp dot org @ 2023-07-10 9:08 ` mark at klomp dot org 4 siblings, 0 replies; 6+ messages in thread From: mark at klomp dot org @ 2023-07-10 9:08 UTC (permalink / raw) To: overseers https://sourceware.org/bugzilla/show_bug.cgi?id=30436 Mark Wielaard <mark at klomp dot org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|ASSIGNED |RESOLVED --- Comment #5 from Mark Wielaard <mark at klomp dot org> --- (In reply to Mark Wielaard from comment #4) > We do still have to (re)import old (rejected by public-inbox) emails > containing HTML. Those are in the pipermail archives (already stripped). This was done overnight using the .public-inbox/emergency mailbox (which stores all rejected messages): for i in .public-inbox/emergency/cur/*; do orig_to=$(grep ^X-Original-To: $i | cut -f2 -d\ ); export ORIGINAL_RECIPIENT="$orig_to"; cat $i | /home/inbox/remove_redundant_html_parts.pl | /usr/bin/public-inbox-mda --no-precheck; fi; done which was also a good test of the remove_redundant_html_parts.pl script. A quick inspection of inbox.sourceware.org now shows messages with (redundant) HTML parts are now archives as they were with pipermail. -- You are receiving this mail because: You are the assignee for the bug. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-07-10 9:08 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org 2023-06-23 14:22 ` [Bug Infrastructure/30436] " mark at klomp dot org 2023-07-01 22:04 ` mark at klomp dot org 2023-07-09 14:10 ` mark at klomp dot org 2023-07-09 19:06 ` mark at klomp dot org 2023-07-10 9:08 ` mark at klomp dot org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).