From: Max Gautier <mg@max.gautier.name>
To: Florian Weimer via Libc-alpha <libc-alpha@sourceware.org>
Cc: Max Gautier <mg@max.gautier.name>
Subject: Re: [PATCH v3 1/5] Copy utf-7 module to modified-utf-7
Date: Sun, 7 Feb 2021 13:29:15 +0100 [thread overview]
Message-ID: <YB/dG6Fxxr/tve51@ol-mgautier.localdomain> (raw)
In-Reply-To: <87blcw9ptq.fsf@oldenburg.str.redhat.com>
* Florian Weimer via Libc-alpha:
> Given that UTF-7 conversion (either variant) is not
> performance-critical, I suggest to have just one implementation file.
>
> You can use step->__data to keep track of which variant is active. See
> iconvdata/iso646.c for an example. There is no need to allocate a
> separate object; you can store the flag directly in the __data member.
I'll work on that.
I might use some advice on specific parts. About the classification of
characters (direct or not etc) :
- utf-7.c
+ utf-7-imap.c
> -static const unsigned char direct_tab[128 / 8] =
> - {
> - 0x00, 0x26, 0x00, 0x00, 0x81, 0xf3, 0xff, 0x87,
> - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
> - };
> -
> static int
> isdirect (uint32_t ch)
> {
> - return (ch < 128 && ((direct_tab[ch >> 3] >> (ch & 7)) & 1));
> -}
> -
> -
> -/* The set of "direct and optional direct characters":
> - A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
> - ! " # $ % & * ; < = > @ [ ] ^ _ ` { | }
> -*/
> -
> -static const unsigned char xdirect_tab[128 / 8] =
> - {
> - 0x00, 0x26, 0x00, 0x00, 0xff, 0xf7, 0xff, 0xff,
> - 0xff, 0xff, 0xff, 0xef, 0xff, 0xff, 0xff, 0x3f
> - };
> -
> -static int
> -isxdirect (uint32_t ch)
> -{
> - return (ch < 128 && ((xdirect_tab[ch >> 3] >> (ch & 7)) & 1));
> + return ((ch == '\n' || ch == '\t' || ch == '\r')
> + || (ch >= 0x20 && ch <= 0x7e && ch != '&'));
> }
>
> -
> -/* The set of "extended base64 characters":
> - A-Z a-z 0-9 + / -
> +/* The set of "modified base64 characters":
> + A-Z a-z 0-9 + , -
> */
>
> -static const unsigned char xbase64_tab[128 / 8] =
> - {
> - 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0xff, 0x03,
> - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
> - };
> -
> static int
> -isxbase64 (uint32_t ch)
> +ismbase64 (uint32_t ch)
> {
> - return (ch < 128 && ((xbase64_tab[ch >> 3] >> (ch & 7)) & 1));
> + return ((ch >= 'a' && ch <= 'z')
> + || (ch >= 'A' && ch <= 'Z')
> + || (ch >= '0' && ch <= '9')
> + || (ch == '+' || ch == ','));
> }
When I initially looked at utf-7.c, the use of the _tab arrays with
magic values and the subsequent shifting didn't make a lot of sense to
me, which is why I modified them like this for utf-7-imap. If they are
in the same file, it's probably better to use the same method.
So do you see any benefits to keeping the old method ?
Testing directly for the actual characters seems a lot more readable to
me, is shorter, and can be mapped to the RFC definition.
I didn't find the reason for using the current method by looking at the
git history. The only one I could think of is performance, but I don't
see how or what it would improve. If someone has some hints, let me
know.
> You could remove the UTF7_ENCODE_OPTIONAL_CHARS from the existing UTF-7
> codec in a first, separate patch.
Do you mean I should modify the utf-7 conversion to not encode the
optional chars ? That would change the result of utf-7 conversions,
wouldn't it ? I'm not opposed to it, but isn't that going to break
things ?
Thanks
--
Max Gautier
mg@max.gautier.name
next prev parent reply other threads:[~2021-02-07 12:29 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-19 23:06 [PATCH 0/5] iconv: module for MODIFIED-UTF-7 Max Gautier
2020-08-19 23:06 ` [PATCH 1/5] Copy utf-7 module to modified-utf-7 Max Gautier
2020-08-19 23:06 ` [PATCH 2/5] Update gconv-modules file Max Gautier
2020-08-19 23:07 ` [PATCH 3/5] Transform UTF-7 to MODIFIED-UTF-7 Max Gautier
2020-08-19 23:07 ` [PATCH 4/5] Make terminating base64 sequences mandatory Max Gautier
2020-08-19 23:07 ` [PATCH 5/5] Add test case for MODIFIED-UTF-7 Max Gautier
2020-08-20 7:18 ` Andreas Schwab
2020-08-20 15:40 ` [PATCH v2 " Max Gautier
2020-08-20 8:03 ` [PATCH 0/5] iconv: module " Florian Weimer
2020-08-20 15:19 ` Max Gautier
2020-08-20 15:58 ` Florian Weimer
2020-09-02 15:24 ` Max Gautier
2020-09-02 20:01 ` Adhemerval Zanella
2020-09-03 9:47 ` Max Gautier
2020-09-03 10:56 ` Andreas Schwab
2021-01-25 9:02 ` [PATCH v3 0/5] iconv: module for IMAP-UTF-7 Max Gautier
2021-01-25 9:02 ` [PATCH v3 1/5] Copy utf-7 module to modified-utf-7 Max Gautier
2021-01-25 9:31 ` Andreas Schwab
2021-01-25 13:51 ` Max Gautier
2021-02-07 9:42 ` Florian Weimer
2021-02-07 12:29 ` Max Gautier [this message]
2021-02-07 12:34 ` Florian Weimer
2021-12-09 9:31 ` [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP Max Gautier
2021-12-09 9:31 ` [PATCH v4 1/4] iconv: Always encode "optional direct" UTF-7 characters Max Gautier
2022-03-07 12:10 ` Adhemerval Zanella
2021-12-09 9:31 ` [PATCH v4 2/4] iconv: Better mapping to RFC for UTF-7 Max Gautier
2022-03-07 12:14 ` Adhemerval Zanella
2022-03-20 16:41 ` [PATCH v5 " Max Gautier
2022-03-21 11:53 ` Adhemerval Zanella
2022-03-21 11:59 ` Adhemerval Zanella
2022-03-21 12:06 ` Adhemerval Zanella
2022-03-21 14:07 ` Max Gautier
2021-12-09 9:31 ` [PATCH v4 3/4] iconv: make utf-7.c able to use variants Max Gautier
2022-03-07 12:34 ` Adhemerval Zanella
2022-03-12 11:07 ` Max Gautier
2022-03-14 12:17 ` Adhemerval Zanella
2022-03-20 16:42 ` [PATCH v5 " Max Gautier
2022-03-21 12:24 ` Adhemerval Zanella
2021-12-09 9:31 ` [PATCH v4 4/4] iconv: Add UTF-7-IMAP variant in utf-7.c Max Gautier
2022-03-07 12:46 ` Adhemerval Zanella
2022-03-20 16:43 ` [PATCH v5 " Max Gautier
2022-03-21 12:24 ` Adhemerval Zanella
2021-12-17 13:15 ` [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP Max Gautier
2022-01-24 14:19 ` Adhemerval Zanella
2022-02-10 13:16 ` Max Gautier
2022-02-10 13:17 ` Adhemerval Zanella
2022-03-04 8:53 ` Max Gautier
2022-01-17 14:07 ` Max Gautier
2022-01-24 9:17 ` Max Gautier
2021-01-25 9:02 ` [PATCH v3 2/5] Update gconv-modules file Max Gautier
2021-02-07 9:49 ` Florian Weimer
2021-01-25 9:02 ` [PATCH v3 3/5] Transform UTF-7 to IMAP-UTF-7 Max Gautier
2021-01-25 9:02 ` [PATCH v3 4/5] Make terminating base64 sequences mandatory Max Gautier
2021-02-07 9:45 ` Florian Weimer
2021-01-25 9:02 ` [PATCH v3 5/5] Add test case for IMAP-UTF-7 Max Gautier
2021-02-07 9:49 ` Florian Weimer
2021-03-16 14:39 ` [PATCH v3 5/5][pw utf test] " Siddhesh Poyarekar
2022-03-21 12:28 ` [PATCH v3 0/5] iconv: module " Adhemerval Zanella
2022-03-21 14:09 ` Max Gautier
2021-01-12 9:12 ` [PATCH 0/5] iconv: module for MODIFIED-UTF-7 Florian Weimer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YB/dG6Fxxr/tve51@ol-mgautier.localdomain \
--to=mg@max.gautier.name \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).