public inbox for overseers@sourceware.org
 help / color / mirror / Atom feed
* [Bug Infrastructure/30436] New: inbox: strip HTML attachements
@ 2023-05-09 21:55 mark at klomp dot org
  2023-06-23 14:22 ` [Bug Infrastructure/30436] " mark at klomp dot org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: mark at klomp dot org @ 2023-05-09 21:55 UTC (permalink / raw)
  To: overseers

https://sourceware.org/bugzilla/show_bug.cgi?id=30436

            Bug ID: 30436
           Summary: inbox: strip HTML attachements
           Product: sourceware
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Infrastructure
          Assignee: overseers at sourceware dot org
          Reporter: mark at klomp dot org
  Target Milestone: ---

Currently public-inbox just drops emails that have HTML.

public-inbox-mda says:
May 09 20:40:46 *** We only accept plain-text mail, No HTML ***

This is fairly hardcoded into public-inbox.

So we might want to add a filter in front of public-inbox-mda that filters out
any text/html attachments like mailman does.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug Infrastructure/30436] inbox: strip HTML attachements
  2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org
@ 2023-06-23 14:22 ` mark at klomp dot org
  2023-07-01 22:04 ` mark at klomp dot org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: mark at klomp dot org @ 2023-06-23 14:22 UTC (permalink / raw)
  To: overseers

https://sourceware.org/bugzilla/show_bug.cgi?id=30436

--- Comment #1 from Mark Wielaard <mark at klomp dot org> ---
mimedefang has been installed but not yet configured to do the actual stripping
for the inbox user.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug Infrastructure/30436] inbox: strip HTML attachements
  2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org
  2023-06-23 14:22 ` [Bug Infrastructure/30436] " mark at klomp dot org
@ 2023-07-01 22:04 ` mark at klomp dot org
  2023-07-09 14:10 ` mark at klomp dot org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: mark at klomp dot org @ 2023-07-01 22:04 UTC (permalink / raw)
  To: overseers

https://sourceware.org/bugzilla/show_bug.cgi?id=30436

--- Comment #2 from Mark Wielaard <mark at klomp dot org> ---
With mimedefang we could use the following simple mimedefang-filter:

# -*- Perl -*-
sub filter_end {
    my($entity) = @_;
    remove_redundant_html_parts($entity);
}
# DO NOT delete the next line, or Perl will complain.
1;

But it isn't clear to me how/if we can use the milter setup to only filter
messages sent to the inbox user, or how to integrate it into the inbox .forward
filter /home/inbox/public-inbox-mda-true.sh

Running mimedefang.pl directly by hand seems to work, but then we need another
wrapper to setup the COMMANDS and interpret the RESULTS as described in
mimedefang-protocol. Maybe such a wrapper already exists?

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug Infrastructure/30436] inbox: strip HTML attachements
  2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org
  2023-06-23 14:22 ` [Bug Infrastructure/30436] " mark at klomp dot org
  2023-07-01 22:04 ` mark at klomp dot org
@ 2023-07-09 14:10 ` mark at klomp dot org
  2023-07-09 19:06 ` mark at klomp dot org
  2023-07-10  9:08 ` mark at klomp dot org
  4 siblings, 0 replies; 6+ messages in thread
From: mark at klomp dot org @ 2023-07-09 14:10 UTC (permalink / raw)
  To: overseers

https://sourceware.org/bugzilla/show_bug.cgi?id=30436

--- Comment #3 from Mark Wielaard <mark at klomp dot org> ---
Created attachment 14957
  --> https://sourceware.org/bugzilla/attachment.cgi?id=14957&action=edit
remove_redundant_html_parts.pl filter

Trying to use the milter interface might be tricky. But the actual
functionality required from mimedefang can be easily extracted. The attached
remove_redundant_html_parts.pl script acts as a filter that takes as input an
email and either outputs that original email or the email with redundant html
parts removed.

This could be used as filter to public-inbox-mda

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug Infrastructure/30436] inbox: strip HTML attachements
  2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org
                   ` (2 preceding siblings ...)
  2023-07-09 14:10 ` mark at klomp dot org
@ 2023-07-09 19:06 ` mark at klomp dot org
  2023-07-10  9:08 ` mark at klomp dot org
  4 siblings, 0 replies; 6+ messages in thread
From: mark at klomp dot org @ 2023-07-09 19:06 UTC (permalink / raw)
  To: overseers

https://sourceware.org/bugzilla/show_bug.cgi?id=30436

Mark Wielaard <mark at klomp dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED

--- Comment #4 from Mark Wielaard <mark at klomp dot org> ---
(In reply to Mark Wielaard from comment #3)
> Created attachment 14957 [details]
> remove_redundant_html_parts.pl filter
> 
> This could be used as filter to public-inbox-mda

This has been installed now as filter-public-inbox-mda-true.sh which is the
.forward script for the inbox calling public-inbox-mda. It seems to work as
intended.

We do still have to (re)import old (rejected by public-inbox) emails containing
HTML. Those are in the pipermail archives (already stripped).

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug Infrastructure/30436] inbox: strip HTML attachements
  2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org
                   ` (3 preceding siblings ...)
  2023-07-09 19:06 ` mark at klomp dot org
@ 2023-07-10  9:08 ` mark at klomp dot org
  4 siblings, 0 replies; 6+ messages in thread
From: mark at klomp dot org @ 2023-07-10  9:08 UTC (permalink / raw)
  To: overseers

https://sourceware.org/bugzilla/show_bug.cgi?id=30436

Mark Wielaard <mark at klomp dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #5 from Mark Wielaard <mark at klomp dot org> ---
(In reply to Mark Wielaard from comment #4)
> We do still have to (re)import old (rejected by public-inbox) emails
> containing HTML. Those are in the pipermail archives (already stripped).

This was done overnight using the .public-inbox/emergency mailbox (which stores
all rejected messages):

for i in .public-inbox/emergency/cur/*; do orig_to=$(grep ^X-Original-To: $i |
cut -f2 -d\ ); export ORIGINAL_RECIPIENT="$orig_to"; cat $i |
/home/inbox/remove_redundant_html_parts.pl | /usr/bin/public-inbox-mda
--no-precheck; fi; done

which was also a good test of the remove_redundant_html_parts.pl script.

A quick inspection of inbox.sourceware.org now shows messages with (redundant)
HTML parts are now archives as they were with pipermail.

-- 
You are receiving this mail because:
You are the assignee for the bug.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-07-10  9:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-09 21:55 [Bug Infrastructure/30436] New: inbox: strip HTML attachements mark at klomp dot org
2023-06-23 14:22 ` [Bug Infrastructure/30436] " mark at klomp dot org
2023-07-01 22:04 ` mark at klomp dot org
2023-07-09 14:10 ` mark at klomp dot org
2023-07-09 19:06 ` mark at klomp dot org
2023-07-10  9:08 ` mark at klomp dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).