public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Max Gautier <mg@max.gautier.name>
To: libc-alpha@sourceware.org
Subject: [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP
Date: Thu,  9 Dec 2021 10:31:48 +0100	[thread overview]
Message-ID: <20211209093152.313872-1-mg@max.gautier.name> (raw)
In-Reply-To: <87blcw9ptq.fsf@oldenburg.str.redhat.com>

I finally took the time to work on this again.

This new series implements UTF-7-IMAP in the UTF-7 module, using, as
advised, the same approach than in iso646.c.

Unresolved issues (would appreciate advice on those):
- There is a slight incoherence (to me) in the UTF-7 RFC[1], and the
  current implementation do not follow it exactly :
  In the "UTF-7 Definition/Rule 2":

  "The '+' signals that subsequent octets are to be interpreted as
  elements of the Modified Base64 alphabet until a character not in that
  alphabet is encountered. Such characters include control characters
  such as carriage returns and line feeds"

  The UTF-7 module implements this by making characters '\n', '\r', '\t'
  part of the "direct characters" set, even though they are not
  according to the definition given by the RFC.

  So these characters should be encoded, but should also be interpreted
  literally and implicitly terminates base64 sequences.
  
  On this, I'm inclined to leave the current behavior as is. Changing it
  might mean breaking things; and I don't see many benefits.

- For UTF-7-IMAP:
  The IMAPv4 RFC (UTF-7-IMAP definition)[2] specifies that :
  
  - The character "&" (0x26) is represented by the two-octet sequence "&-"
  - null shifts ("-&" while in BASE64; note that "&-" while in US-ASCII
    means "&") are not permitted
  - The purpose of these modifications is to correct the following
    problems with UTF-7:
      ...

      5) UTF-7 permits multiple alternate forms to represent the same
         string; in particular, printable US-ASCII characters can be
         represented in encoded form.

   Consider the following cases:

   A- When encoding to UTF-7-IMAP, if we encounter '&' while in base64
   mode, should we:
       1) encode it in base64
       2) terminate the encoding with '-' and use "&-"
   B- When encoding to UTF-7-IMAP, if we encounter "&&" while in
   us-ascii mode, should we:
       1) start base64 mode and encode the two '&' 
       2) encode them as "&-&-"
   It seems to me than for A and B, the solution 2 allows null shifts,
   and solution 1 allows multiples representation.

   However, A-2 and B-2 still feels cleaner to me, since they avoid
   alternate forms for '&'. The arguments can be made that the resulting
   sequences are not null shifts, merely a special case in US-ASCII.
   I've use that approach in PATCH 4/4, but that should be quite easy to
   change if necessary.

- Also, I'm not sure how to add negative test cases, aka, invalid
  sequences which needs to trigger an iconv errors.


Thanks for your time.

[1]: https://datatracker.ietf.org/doc/html/rfc2152
[2]: https://datatracker.ietf.org/doc/html/rfc3501#section-5.1.3



  parent reply	other threads:[~2021-12-09  9:32 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-19 23:06 [PATCH 0/5] iconv: module for MODIFIED-UTF-7 Max Gautier
2020-08-19 23:06 ` [PATCH 1/5] Copy utf-7 module to modified-utf-7 Max Gautier
2020-08-19 23:06 ` [PATCH 2/5] Update gconv-modules file Max Gautier
2020-08-19 23:07 ` [PATCH 3/5] Transform UTF-7 to MODIFIED-UTF-7 Max Gautier
2020-08-19 23:07 ` [PATCH 4/5] Make terminating base64 sequences mandatory Max Gautier
2020-08-19 23:07 ` [PATCH 5/5] Add test case for MODIFIED-UTF-7 Max Gautier
2020-08-20  7:18   ` Andreas Schwab
2020-08-20 15:40     ` [PATCH v2 " Max Gautier
2020-08-20  8:03 ` [PATCH 0/5] iconv: module " Florian Weimer
2020-08-20 15:19   ` Max Gautier
2020-08-20 15:58     ` Florian Weimer
2020-09-02 15:24   ` Max Gautier
2020-09-02 20:01     ` Adhemerval Zanella
2020-09-03  9:47       ` Max Gautier
2020-09-03 10:56         ` Andreas Schwab
2021-01-25  9:02   ` [PATCH v3 0/5] iconv: module for IMAP-UTF-7 Max Gautier
2021-01-25  9:02     ` [PATCH v3 1/5] Copy utf-7 module to modified-utf-7 Max Gautier
2021-01-25  9:31       ` Andreas Schwab
2021-01-25 13:51         ` Max Gautier
2021-02-07  9:42           ` Florian Weimer
2021-02-07 12:29             ` Max Gautier
2021-02-07 12:34               ` Florian Weimer
2021-12-09  9:31             ` Max Gautier [this message]
2021-12-09  9:31               ` [PATCH v4 1/4] iconv: Always encode "optional direct" UTF-7 characters Max Gautier
2022-03-07 12:10                 ` Adhemerval Zanella
2021-12-09  9:31               ` [PATCH v4 2/4] iconv: Better mapping to RFC for UTF-7 Max Gautier
2022-03-07 12:14                 ` Adhemerval Zanella
2022-03-20 16:41                 ` [PATCH v5 " Max Gautier
2022-03-21 11:53                   ` Adhemerval Zanella
2022-03-21 11:59                     ` Adhemerval Zanella
2022-03-21 12:06                       ` Adhemerval Zanella
2022-03-21 14:07                       ` Max Gautier
2021-12-09  9:31               ` [PATCH v4 3/4] iconv: make utf-7.c able to use variants Max Gautier
2022-03-07 12:34                 ` Adhemerval Zanella
2022-03-12 11:07                   ` Max Gautier
2022-03-14 12:17                     ` Adhemerval Zanella
2022-03-20 16:42                 ` [PATCH v5 " Max Gautier
2022-03-21 12:24                   ` Adhemerval Zanella
2021-12-09  9:31               ` [PATCH v4 4/4] iconv: Add UTF-7-IMAP variant in utf-7.c Max Gautier
2022-03-07 12:46                 ` Adhemerval Zanella
2022-03-20 16:43                 ` [PATCH v5 " Max Gautier
2022-03-21 12:24                   ` Adhemerval Zanella
2021-12-17 13:15               ` [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP Max Gautier
2022-01-24 14:19                 ` Adhemerval Zanella
2022-02-10 13:16                   ` Max Gautier
2022-02-10 13:17                     ` Adhemerval Zanella
2022-03-04  8:53                       ` Max Gautier
2022-01-17 14:07               ` Max Gautier
2022-01-24  9:17               ` Max Gautier
2021-01-25  9:02     ` [PATCH v3 2/5] Update gconv-modules file Max Gautier
2021-02-07  9:49       ` Florian Weimer
2021-01-25  9:02     ` [PATCH v3 3/5] Transform UTF-7 to IMAP-UTF-7 Max Gautier
2021-01-25  9:02     ` [PATCH v3 4/5] Make terminating base64 sequences mandatory Max Gautier
2021-02-07  9:45       ` Florian Weimer
2021-01-25  9:02     ` [PATCH v3 5/5] Add test case for IMAP-UTF-7 Max Gautier
2021-02-07  9:49       ` Florian Weimer
2021-03-16 14:39     ` [PATCH v3 5/5][pw utf test] " Siddhesh Poyarekar
2022-03-21 12:28     ` [PATCH v3 0/5] iconv: module " Adhemerval Zanella
2022-03-21 14:09       ` Max Gautier
2021-01-12  9:12 ` [PATCH 0/5] iconv: module for MODIFIED-UTF-7 Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211209093152.313872-1-mg@max.gautier.name \
    --to=mg@max.gautier.name \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).