* [PATCH v4 1/4] iconv: Always encode "optional direct" UTF-7 characters
2021-12-09 9:31 ` [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP Max Gautier
@ 2021-12-09 9:31 ` Max Gautier
2022-03-07 12:10 ` Adhemerval Zanella
2021-12-09 9:31 ` [PATCH v4 2/4] iconv: Better mapping to RFC for UTF-7 Max Gautier
` (5 subsequent siblings)
6 siblings, 1 reply; 60+ messages in thread
From: Max Gautier @ 2021-12-09 9:31 UTC (permalink / raw)
To: libc-alpha; +Cc: Max Gautier
Signed-off-by: Max Gautier <mg@max.gautier.name>
---
iconvdata/utf-7.c | 12 ++----------
1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
index 0ed46c948d..9ba0974959 100644
--- a/iconvdata/utf-7.c
+++ b/iconvdata/utf-7.c
@@ -29,14 +29,6 @@
#include <stdlib.h>
-/* Define this to 1 if you want the so-called "optional direct" characters
- ! " # $ % & * ; < = > @ [ ] ^ _ ` { | }
- to be encoded. Define to 0 if you want them to be passed straight
- through, like the so-called "direct" characters.
- We set this to 1 because it's safer.
- */
-#define UTF7_ENCODE_OPTIONAL_CHARS 1
-
/* The set of "direct characters":
A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
@@ -323,7 +315,7 @@ base64 (unsigned int i)
if ((statep->__count & 0x18) == 0) \
{ \
/* base64 encoding inactive */ \
- if (UTF7_ENCODE_OPTIONAL_CHARS ? isdirect (ch) : isxdirect (ch)) \
+ if (isdirect (ch)) \
{ \
*outptr++ = (unsigned char) ch; \
} \
@@ -375,7 +367,7 @@ base64 (unsigned int i)
else \
{ \
/* base64 encoding active */ \
- if (UTF7_ENCODE_OPTIONAL_CHARS ? isdirect (ch) : isxdirect (ch)) \
+ if (isdirect (ch)) \
{ \
/* deactivate base64 encoding */ \
size_t count; \
--
2.34.1
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v4 1/4] iconv: Always encode "optional direct" UTF-7 characters
2021-12-09 9:31 ` [PATCH v4 1/4] iconv: Always encode "optional direct" UTF-7 characters Max Gautier
@ 2022-03-07 12:10 ` Adhemerval Zanella
0 siblings, 0 replies; 60+ messages in thread
From: Adhemerval Zanella @ 2022-03-07 12:10 UTC (permalink / raw)
To: Max Gautier, libc-alpha
On 09/12/2021 06:31, Max Gautier via Libc-alpha wrote:
> Signed-off-by: Max Gautier <mg@max.gautier.name>
LGTM, thanks.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> ---
> iconvdata/utf-7.c | 12 ++----------
> 1 file changed, 2 insertions(+), 10 deletions(-)
>
> diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
> index 0ed46c948d..9ba0974959 100644
> --- a/iconvdata/utf-7.c
> +++ b/iconvdata/utf-7.c
> @@ -29,14 +29,6 @@
> #include <stdlib.h>
>
>
> -/* Define this to 1 if you want the so-called "optional direct" characters
> - ! " # $ % & * ; < = > @ [ ] ^ _ ` { | }
> - to be encoded. Define to 0 if you want them to be passed straight
> - through, like the so-called "direct" characters.
> - We set this to 1 because it's safer.
> - */
> -#define UTF7_ENCODE_OPTIONAL_CHARS 1
> -
>
> /* The set of "direct characters":
> A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
> @@ -323,7 +315,7 @@ base64 (unsigned int i)
> if ((statep->__count & 0x18) == 0) \
> { \
> /* base64 encoding inactive */ \
> - if (UTF7_ENCODE_OPTIONAL_CHARS ? isdirect (ch) : isxdirect (ch)) \
> + if (isdirect (ch)) \
> { \
> *outptr++ = (unsigned char) ch; \
> } \
> @@ -375,7 +367,7 @@ base64 (unsigned int i)
> else \
> { \
> /* base64 encoding active */ \
> - if (UTF7_ENCODE_OPTIONAL_CHARS ? isdirect (ch) : isxdirect (ch)) \
> + if (isdirect (ch)) \
> { \
> /* deactivate base64 encoding */ \
> size_t count; \
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v4 2/4] iconv: Better mapping to RFC for UTF-7
2021-12-09 9:31 ` [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP Max Gautier
2021-12-09 9:31 ` [PATCH v4 1/4] iconv: Always encode "optional direct" UTF-7 characters Max Gautier
@ 2021-12-09 9:31 ` Max Gautier
2022-03-07 12:14 ` Adhemerval Zanella
2022-03-20 16:41 ` [PATCH v5 " Max Gautier
2021-12-09 9:31 ` [PATCH v4 3/4] iconv: make utf-7.c able to use variants Max Gautier
` (4 subsequent siblings)
6 siblings, 2 replies; 60+ messages in thread
From: Max Gautier @ 2021-12-09 9:31 UTC (permalink / raw)
To: libc-alpha; +Cc: Max Gautier
- Direct use of characters instead of arcane arrays
- isxbase64 is not the Modified BASE64 alphabet, but the characters who
needs to trigger an explicit shift back to US-ASCII. Make that clearer
Signed-off-by: Max Gautier <mg@max.gautier.name>
---
iconvdata/utf-7.c | 56 +++++++++++++++++++++++++++--------------------
1 file changed, 32 insertions(+), 24 deletions(-)
diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
index 9ba0974959..ac7d78141a 100644
--- a/iconvdata/utf-7.c
+++ b/iconvdata/utf-7.c
@@ -30,20 +30,27 @@
+static int
+between(uint32_t const ch,
+ uint32_t const lower_bound, uint32_t const upper_bound)
+{
+ return (ch >= lower_bound && ch <= upper_bound);
+}
+
/* The set of "direct characters":
A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
*/
-static const unsigned char direct_tab[128 / 8] =
- {
- 0x00, 0x26, 0x00, 0x00, 0x81, 0xf3, 0xff, 0x87,
- 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
- };
-
static int
isdirect (uint32_t ch)
{
- return (ch < 128 && ((direct_tab[ch >> 3] >> (ch & 7)) & 1));
+ return (between(ch, 'A', 'Z')
+ || between(ch, 'a', 'z')
+ || between(ch, '0', '9')
+ || ch == '\'' || ch == '(' || ch == ')'
+ || between(ch, ',', '/')
+ || ch == ':' || ch == '?'
+ || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
}
@@ -52,33 +59,33 @@ isdirect (uint32_t ch)
! " # $ % & * ; < = > @ [ ] ^ _ ` { | }
*/
-static const unsigned char xdirect_tab[128 / 8] =
- {
- 0x00, 0x26, 0x00, 0x00, 0xff, 0xf7, 0xff, 0xff,
- 0xff, 0xff, 0xff, 0xef, 0xff, 0xff, 0xff, 0x3f
- };
static int
isxdirect (uint32_t ch)
{
- return (ch < 128 && ((xdirect_tab[ch >> 3] >> (ch & 7)) & 1));
+ return (ch == '\t'
+ || ch == '\n'
+ || ch == '\r'
+ || (between(ch, ' ','}')
+ && ch != '+' && ch != '\\')
+ );
}
-/* The set of "extended base64 characters":
+/* Characters which needs to trigger an explicit shift back to US-ASCII (UTF-7
+ only): Modified base64 + '-' (shift back character)
A-Z a-z 0-9 + / -
*/
-static const unsigned char xbase64_tab[128 / 8] =
- {
- 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0xff, 0x03,
- 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
- };
-
static int
-isxbase64 (uint32_t ch)
+needs_explicit_shift (uint32_t ch)
{
- return (ch < 128 && ((xbase64_tab[ch >> 3] >> (ch & 7)) & 1));
+ return (between(ch, 'A', 'Z')
+ || between(ch, 'a', 'z')
+ || between(ch, '/', '9')
+ || ch == '+'
+ || ch == '-'
+ );
}
@@ -372,7 +379,8 @@ base64 (unsigned int i)
/* deactivate base64 encoding */ \
size_t count; \
\
- count = ((statep->__count & 0x18) >= 0x10) + isxbase64 (ch) + 1; \
+ count = ((statep->__count & 0x18) >= 0x10) \
+ + needs_explicit_shift (ch) + 1; \
if (__glibc_unlikely (outptr + count > outend)) \
{ \
result = __GCONV_FULL_OUTPUT; \
@@ -381,7 +389,7 @@ base64 (unsigned int i)
\
if ((statep->__count & 0x18) >= 0x10) \
*outptr++ = base64 ((statep->__count >> 3) & ~3); \
- if (isxbase64 (ch)) \
+ if (needs_explicit_shift (ch)) \
*outptr++ = '-'; \
*outptr++ = (unsigned char) ch; \
statep->__count = 0; \
--
2.34.1
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v4 2/4] iconv: Better mapping to RFC for UTF-7
2021-12-09 9:31 ` [PATCH v4 2/4] iconv: Better mapping to RFC for UTF-7 Max Gautier
@ 2022-03-07 12:14 ` Adhemerval Zanella
2022-03-20 16:41 ` [PATCH v5 " Max Gautier
1 sibling, 0 replies; 60+ messages in thread
From: Adhemerval Zanella @ 2022-03-07 12:14 UTC (permalink / raw)
To: Max Gautier, libc-alpha
On 09/12/2021 06:31, Max Gautier via Libc-alpha wrote:
> - Direct use of characters instead of arcane arrays
> - isxbase64 is not the Modified BASE64 alphabet, but the characters who
> needs to trigger an explicit shift back to US-ASCII. Make that clearer
>
> Signed-off-by: Max Gautier <mg@max.gautier.name>
LGTM with style fixes below.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> ---
> iconvdata/utf-7.c | 56 +++++++++++++++++++++++++++--------------------
> 1 file changed, 32 insertions(+), 24 deletions(-)
>
> diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
> index 9ba0974959..ac7d78141a 100644
> --- a/iconvdata/utf-7.c
> +++ b/iconvdata/utf-7.c
> @@ -30,20 +30,27 @@
>
>
>
> +static int
> +between(uint32_t const ch,
Space before '(') and for other usages below.. Also 'const' does not change much
here.
> + uint32_t const lower_bound, uint32_t const upper_bound)
> +{
> + return (ch >= lower_bound && ch <= upper_bound);
> +}
> +
> /* The set of "direct characters":
> A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
> */
>
> -static const unsigned char direct_tab[128 / 8] =
> - {
> - 0x00, 0x26, 0x00, 0x00, 0x81, 0xf3, 0xff, 0x87,
> - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
> - };
> -
> static int
> isdirect (uint32_t ch)
> {
> - return (ch < 128 && ((direct_tab[ch >> 3] >> (ch & 7)) & 1));
> + return (between(ch, 'A', 'Z')
Ok, it is indeed clear.
> + || between(ch, 'a', 'z')
> + || between(ch, '0', '9')
> + || ch == '\'' || ch == '(' || ch == ')'
> + || between(ch, ',', '/')
> + || ch == ':' || ch == '?'
> + || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
> }
>
>
> @@ -52,33 +59,33 @@ isdirect (uint32_t ch)
> ! " # $ % & * ; < = > @ [ ] ^ _ ` { | }
> */
>
> -static const unsigned char xdirect_tab[128 / 8] =
> - {
> - 0x00, 0x26, 0x00, 0x00, 0xff, 0xf7, 0xff, 0xff,
> - 0xff, 0xff, 0xff, 0xef, 0xff, 0xff, 0xff, 0x3f
> - };
>
> static int
> isxdirect (uint32_t ch)
> {
> - return (ch < 128 && ((xdirect_tab[ch >> 3] >> (ch & 7)) & 1));
> + return (ch == '\t'
> + || ch == '\n'
> + || ch == '\r'
> + || (between(ch, ' ','}')
> + && ch != '+' && ch != '\\')
> + );
> }
>
>
Ok.
> -/* The set of "extended base64 characters":
> +/* Characters which needs to trigger an explicit shift back to US-ASCII (UTF-7
> + only): Modified base64 + '-' (shift back character)
> A-Z a-z 0-9 + / -
> */
>
> -static const unsigned char xbase64_tab[128 / 8] =
> - {
> - 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0xff, 0x03,
> - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
> - };
> -
> static int
> -isxbase64 (uint32_t ch)
> +needs_explicit_shift (uint32_t ch)
> {
> - return (ch < 128 && ((xbase64_tab[ch >> 3] >> (ch & 7)) & 1));
> + return (between(ch, 'A', 'Z')
> + || between(ch, 'a', 'z')
> + || between(ch, '/', '9')
> + || ch == '+'
> + || ch == '-'
> + );
> }
>
>
Ok.
> @@ -372,7 +379,8 @@ base64 (unsigned int i)
> /* deactivate base64 encoding */ \
> size_t count; \
> \
> - count = ((statep->__count & 0x18) >= 0x10) + isxbase64 (ch) + 1; \
> + count = ((statep->__count & 0x18) >= 0x10) \
> + + needs_explicit_shift (ch) + 1; \
> if (__glibc_unlikely (outptr + count > outend)) \
> { \
> result = __GCONV_FULL_OUTPUT; \
> @@ -381,7 +389,7 @@ base64 (unsigned int i)
> \
> if ((statep->__count & 0x18) >= 0x10) \
> *outptr++ = base64 ((statep->__count >> 3) & ~3); \
> - if (isxbase64 (ch)) \
> + if (needs_explicit_shift (ch)) \
> *outptr++ = '-'; \
> *outptr++ = (unsigned char) ch; \
> statep->__count = 0; \
Ok, it just change the function name.
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v5 2/4] iconv: Better mapping to RFC for UTF-7
2021-12-09 9:31 ` [PATCH v4 2/4] iconv: Better mapping to RFC for UTF-7 Max Gautier
2022-03-07 12:14 ` Adhemerval Zanella
@ 2022-03-20 16:41 ` Max Gautier
2022-03-21 11:53 ` Adhemerval Zanella
1 sibling, 1 reply; 60+ messages in thread
From: Max Gautier @ 2022-03-20 16:41 UTC (permalink / raw)
To: libc-alpha; +Cc: mg
- Direct use of characters instead of arcane arrays
- isxbase64 is not the Modified BASE64 alphabet, but the characters who
needs to trigger an explicit shift back to US-ASCII. Make that clearer
Signed-off-by: Max Gautier <mg@max.gautier.name>
---
iconvdata/utf-7.c | 64 ++++++++++++++++++++++++-----------------------
1 file changed, 33 insertions(+), 31 deletions(-)
diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
index 9ba0974959..15f3669ac8 100644
--- a/iconvdata/utf-7.c
+++ b/iconvdata/utf-7.c
@@ -30,20 +30,27 @@
+static bool
+between (uint32_t const ch,
+ uint32_t const lower_bound, uint32_t const upper_bound)
+{
+ return (ch >= lower_bound && ch <= upper_bound);
+}
+
/* The set of "direct characters":
A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
*/
-static const unsigned char direct_tab[128 / 8] =
- {
- 0x00, 0x26, 0x00, 0x00, 0x81, 0xf3, 0xff, 0x87,
- 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
- };
-
-static int
-isdirect (uint32_t ch)
+static bool
+isdirect (uint32_t ch, enum variant var)
{
- return (ch < 128 && ((direct_tab[ch >> 3] >> (ch & 7)) & 1));
+ return (between (ch, 'A', 'Z')
+ || between (ch, 'a', 'z')
+ || between (ch, '0', '9')
+ || ch == '\'' || ch == '(' || ch == ')'
+ || between (ch, ',', '/')
+ || ch == ':' || ch == '?'
+ || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
}
@@ -52,33 +59,27 @@ isdirect (uint32_t ch)
! " # $ % & * ; < = > @ [ ] ^ _ ` { | }
*/
-static const unsigned char xdirect_tab[128 / 8] =
- {
- 0x00, 0x26, 0x00, 0x00, 0xff, 0xf7, 0xff, 0xff,
- 0xff, 0xff, 0xff, 0xef, 0xff, 0xff, 0xff, 0x3f
- };
-
-static int
-isxdirect (uint32_t ch)
+static bool
+isxdirect (uint32_t ch, enum variant var)
{
- return (ch < 128 && ((xdirect_tab[ch >> 3] >> (ch & 7)) & 1));
+ return (ch == '\t'
+ || ch == '\n'
+ || ch == '\r'
+ || (between (ch, ' ', '}') && ch != '+' && ch != '\\'));
}
-/* The set of "extended base64 characters":
+/* Characters which needs to trigger an explicit shift back to US-ASCII (UTF-7
+ only): Modified base64 + '-' (shift back character)
A-Z a-z 0-9 + / -
*/
-static const unsigned char xbase64_tab[128 / 8] =
- {
- 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0xff, 0x03,
- 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
- };
-
-static int
-isxbase64 (uint32_t ch)
+static bool
+needs_explicit_shift (uint32_t ch)
{
- return (ch < 128 && ((xbase64_tab[ch >> 3] >> (ch & 7)) & 1));
+ return (between (ch, 'A', 'Z')
+ || between (ch, 'a', 'z')
+ || between (ch, '/', '9') || ch == '+' || ch == '-');
}
@@ -252,7 +253,7 @@ base64 (unsigned int i)
indeed form a Low Surrogate. */ \
uint32_t wc2 = wch & 0xffff; \
\
- if (! __builtin_expect (wc2 >= 0xdc00 && wc2 < 0xe000, 1)) \
+ if (! __glibc_likely (wc2 >= 0xdc00 && wc2 < 0xe000)) \
{ \
STANDARD_FROM_LOOP_ERR_HANDLER ((statep->__count = 0, 1));\
} \
@@ -372,7 +373,8 @@ base64 (unsigned int i)
/* deactivate base64 encoding */ \
size_t count; \
\
- count = ((statep->__count & 0x18) >= 0x10) + isxbase64 (ch) + 1; \
+ count = ((statep->__count & 0x18) >= 0x10) \
+ + needs_explicit_shift (ch) + 1; \
if (__glibc_unlikely (outptr + count > outend)) \
{ \
result = __GCONV_FULL_OUTPUT; \
@@ -381,7 +383,7 @@ base64 (unsigned int i)
\
if ((statep->__count & 0x18) >= 0x10) \
*outptr++ = base64 ((statep->__count >> 3) & ~3); \
- if (isxbase64 (ch)) \
+ if (needs_explicit_shift (ch)) \
*outptr++ = '-'; \
*outptr++ = (unsigned char) ch; \
statep->__count = 0; \
--
2.35.1
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 2/4] iconv: Better mapping to RFC for UTF-7
2022-03-20 16:41 ` [PATCH v5 " Max Gautier
@ 2022-03-21 11:53 ` Adhemerval Zanella
2022-03-21 11:59 ` Adhemerval Zanella
0 siblings, 1 reply; 60+ messages in thread
From: Adhemerval Zanella @ 2022-03-21 11:53 UTC (permalink / raw)
To: Max Gautier, libc-alpha
On 20/03/2022 13:41, Max Gautier via Libc-alpha wrote:
> - Direct use of characters instead of arcane arrays
> - isxbase64 is not the Modified BASE64 alphabet, but the characters who
> needs to trigger an explicit shift back to US-ASCII. Make that clearer
>
> Signed-off-by: Max Gautier <mg@max.gautier.name>
LGTM, thanks.
Reviewed-by: Adhemerval Zanellla <adhemerval.zanella@linaro.org>
> ---
> iconvdata/utf-7.c | 64 ++++++++++++++++++++++++-----------------------
> 1 file changed, 33 insertions(+), 31 deletions(-)
>
> diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
> index 9ba0974959..15f3669ac8 100644
> --- a/iconvdata/utf-7.c
> +++ b/iconvdata/utf-7.c
> @@ -30,20 +30,27 @@
>
>
>
> +static bool
> +between (uint32_t const ch,
> + uint32_t const lower_bound, uint32_t const upper_bound)
> +{
> + return (ch >= lower_bound && ch <= upper_bound);
> +}
> +
> /* The set of "direct characters":
> A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
> */
>
> -static const unsigned char direct_tab[128 / 8] =
> - {
> - 0x00, 0x26, 0x00, 0x00, 0x81, 0xf3, 0xff, 0x87,
> - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
> - };
> -
> -static int
> -isdirect (uint32_t ch)
> +static bool
> +isdirect (uint32_t ch, enum variant var)
> {
> - return (ch < 128 && ((direct_tab[ch >> 3] >> (ch & 7)) & 1));
> + return (between (ch, 'A', 'Z')
> + || between (ch, 'a', 'z')
> + || between (ch, '0', '9')
> + || ch == '\'' || ch == '(' || ch == ')'
> + || between (ch, ',', '/')
> + || ch == ':' || ch == '?'
> + || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
> }
>
>
> @@ -52,33 +59,27 @@ isdirect (uint32_t ch)
> ! " # $ % & * ; < = > @ [ ] ^ _ ` { | }
> */
>
> -static const unsigned char xdirect_tab[128 / 8] =
> - {
> - 0x00, 0x26, 0x00, 0x00, 0xff, 0xf7, 0xff, 0xff,
> - 0xff, 0xff, 0xff, 0xef, 0xff, 0xff, 0xff, 0x3f
> - };
> -
> -static int
> -isxdirect (uint32_t ch)
> +static bool
> +isxdirect (uint32_t ch, enum variant var)
> {
> - return (ch < 128 && ((xdirect_tab[ch >> 3] >> (ch & 7)) & 1));
> + return (ch == '\t'
> + || ch == '\n'
> + || ch == '\r'
> + || (between (ch, ' ', '}') && ch != '+' && ch != '\\'));
> }
>
>
> -/* The set of "extended base64 characters":
> +/* Characters which needs to trigger an explicit shift back to US-ASCII (UTF-7
> + only): Modified base64 + '-' (shift back character)
> A-Z a-z 0-9 + / -
> */
>
> -static const unsigned char xbase64_tab[128 / 8] =
> - {
> - 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0xff, 0x03,
> - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
> - };
> -
> -static int
> -isxbase64 (uint32_t ch)
> +static bool
> +needs_explicit_shift (uint32_t ch)
> {
> - return (ch < 128 && ((xbase64_tab[ch >> 3] >> (ch & 7)) & 1));
> + return (between (ch, 'A', 'Z')
> + || between (ch, 'a', 'z')
> + || between (ch, '/', '9') || ch == '+' || ch == '-');
> }
>
>
> @@ -252,7 +253,7 @@ base64 (unsigned int i)
> indeed form a Low Surrogate. */ \
> uint32_t wc2 = wch & 0xffff; \
> \
> - if (! __builtin_expect (wc2 >= 0xdc00 && wc2 < 0xe000, 1)) \
> + if (! __glibc_likely (wc2 >= 0xdc00 && wc2 < 0xe000)) \
> { \
> STANDARD_FROM_LOOP_ERR_HANDLER ((statep->__count = 0, 1));\
> } \
> @@ -372,7 +373,8 @@ base64 (unsigned int i)
> /* deactivate base64 encoding */ \
> size_t count; \
> \
> - count = ((statep->__count & 0x18) >= 0x10) + isxbase64 (ch) + 1; \
> + count = ((statep->__count & 0x18) >= 0x10) \
> + + needs_explicit_shift (ch) + 1; \
> if (__glibc_unlikely (outptr + count > outend)) \
> { \
> result = __GCONV_FULL_OUTPUT; \
> @@ -381,7 +383,7 @@ base64 (unsigned int i)
> \
> if ((statep->__count & 0x18) >= 0x10) \
> *outptr++ = base64 ((statep->__count >> 3) & ~3); \
> - if (isxbase64 (ch)) \
> + if (needs_explicit_shift (ch)) \
> *outptr++ = '-'; \
> *outptr++ = (unsigned char) ch; \
> statep->__count = 0; \
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 2/4] iconv: Better mapping to RFC for UTF-7
2022-03-21 11:53 ` Adhemerval Zanella
@ 2022-03-21 11:59 ` Adhemerval Zanella
2022-03-21 12:06 ` Adhemerval Zanella
2022-03-21 14:07 ` Max Gautier
0 siblings, 2 replies; 60+ messages in thread
From: Adhemerval Zanella @ 2022-03-21 11:59 UTC (permalink / raw)
To: Max Gautier, libc-alpha
On 21/03/2022 08:53, Adhemerval Zanella wrote:
>
>
> On 20/03/2022 13:41, Max Gautier via Libc-alpha wrote:
>> - Direct use of characters instead of arcane arrays
>> - isxbase64 is not the Modified BASE64 alphabet, but the characters who
>> needs to trigger an explicit shift back to US-ASCII. Make that clearer
>>
>> Signed-off-by: Max Gautier <mg@max.gautier.name>
>
>
> LGTM, thanks.
>
> Reviewed-by: Adhemerval Zanellla <adhemerval.zanella@linaro.org>
>
>> ---
>> iconvdata/utf-7.c | 64 ++++++++++++++++++++++++-----------------------
>> 1 file changed, 33 insertions(+), 31 deletions(-)
>>
>> diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
>> index 9ba0974959..15f3669ac8 100644
>> --- a/iconvdata/utf-7.c
>> +++ b/iconvdata/utf-7.c
>> @@ -30,20 +30,27 @@
>>
>>
>>
>> +static bool
>> +between (uint32_t const ch,
>> + uint32_t const lower_bound, uint32_t const upper_bound)
>> +{
>> + return (ch >= lower_bound && ch <= upper_bound);
>> +}
>> +
>> /* The set of "direct characters":
>> A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
>> */
>>
>> -static const unsigned char direct_tab[128 / 8] =
>> - {
>> - 0x00, 0x26, 0x00, 0x00, 0x81, 0xf3, 0xff, 0x87,
>> - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
>> - };
>> -
>> -static int
>> -isdirect (uint32_t ch)
>> +static bool
>> +isdirect (uint32_t ch, enum variant var)
>> {
In fact I am seeing this failure:
utf-7.c:45:29: error: ‘enum variant’ declared inside parameter list will not be visible outside of this definition o
r declaration [-Werror]
45 | isdirect (uint32_t ch, enum variant var)
| ^~~~~~~
Since 'enum variant' in only defined on next patch. Usually the best
practice is keep each patch consistent, so could you move the definition
on this patch?
Or I can fix it for you before installing, it is up to you.
>> - return (ch < 128 && ((direct_tab[ch >> 3] >> (ch & 7)) & 1));
>> + return (between (ch, 'A', 'Z')
>> + || between (ch, 'a', 'z')
>> + || between (ch, '0', '9')
>> + || ch == '\'' || ch == '(' || ch == ')'
>> + || between (ch, ',', '/')
>> + || ch == ':' || ch == '?'
>> + || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
>> }
>>
>>
>> @@ -52,33 +59,27 @@ isdirect (uint32_t ch)
>> ! " # $ % & * ; < = > @ [ ] ^ _ ` { | }
>> */
>>
>> -static const unsigned char xdirect_tab[128 / 8] =
>> - {
>> - 0x00, 0x26, 0x00, 0x00, 0xff, 0xf7, 0xff, 0xff,
>> - 0xff, 0xff, 0xff, 0xef, 0xff, 0xff, 0xff, 0x3f
>> - };
>> -
>> -static int
>> -isxdirect (uint32_t ch)
>> +static bool
>> +isxdirect (uint32_t ch, enum variant var)
>> {
>> - return (ch < 128 && ((xdirect_tab[ch >> 3] >> (ch & 7)) & 1));
>> + return (ch == '\t'
>> + || ch == '\n'
>> + || ch == '\r'
>> + || (between (ch, ' ', '}') && ch != '+' && ch != '\\'));
>> }
>>
>>
>> -/* The set of "extended base64 characters":
>> +/* Characters which needs to trigger an explicit shift back to US-ASCII (UTF-7
>> + only): Modified base64 + '-' (shift back character)
>> A-Z a-z 0-9 + / -
>> */
>>
>> -static const unsigned char xbase64_tab[128 / 8] =
>> - {
>> - 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0xff, 0x03,
>> - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
>> - };
>> -
>> -static int
>> -isxbase64 (uint32_t ch)
>> +static bool
>> +needs_explicit_shift (uint32_t ch)
>> {
>> - return (ch < 128 && ((xbase64_tab[ch >> 3] >> (ch & 7)) & 1));
>> + return (between (ch, 'A', 'Z')
>> + || between (ch, 'a', 'z')
>> + || between (ch, '/', '9') || ch == '+' || ch == '-');
>> }
>>
>>
>> @@ -252,7 +253,7 @@ base64 (unsigned int i)
>> indeed form a Low Surrogate. */ \
>> uint32_t wc2 = wch & 0xffff; \
>> \
>> - if (! __builtin_expect (wc2 >= 0xdc00 && wc2 < 0xe000, 1)) \
>> + if (! __glibc_likely (wc2 >= 0xdc00 && wc2 < 0xe000)) \
>> { \
>> STANDARD_FROM_LOOP_ERR_HANDLER ((statep->__count = 0, 1));\
>> } \
>> @@ -372,7 +373,8 @@ base64 (unsigned int i)
>> /* deactivate base64 encoding */ \
>> size_t count; \
>> \
>> - count = ((statep->__count & 0x18) >= 0x10) + isxbase64 (ch) + 1; \
>> + count = ((statep->__count & 0x18) >= 0x10) \
>> + + needs_explicit_shift (ch) + 1; \
>> if (__glibc_unlikely (outptr + count > outend)) \
>> { \
>> result = __GCONV_FULL_OUTPUT; \
>> @@ -381,7 +383,7 @@ base64 (unsigned int i)
>> \
>> if ((statep->__count & 0x18) >= 0x10) \
>> *outptr++ = base64 ((statep->__count >> 3) & ~3); \
>> - if (isxbase64 (ch)) \
>> + if (needs_explicit_shift (ch)) \
>> *outptr++ = '-'; \
>> *outptr++ = (unsigned char) ch; \
>> statep->__count = 0; \
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 2/4] iconv: Better mapping to RFC for UTF-7
2022-03-21 11:59 ` Adhemerval Zanella
@ 2022-03-21 12:06 ` Adhemerval Zanella
2022-03-21 14:07 ` Max Gautier
1 sibling, 0 replies; 60+ messages in thread
From: Adhemerval Zanella @ 2022-03-21 12:06 UTC (permalink / raw)
To: Max Gautier, libc-alpha
On 21/03/2022 08:59, Adhemerval Zanella wrote:
>
>
> On 21/03/2022 08:53, Adhemerval Zanella wrote:
>>
>>
>> On 20/03/2022 13:41, Max Gautier via Libc-alpha wrote:
>>> - Direct use of characters instead of arcane arrays
>>> - isxbase64 is not the Modified BASE64 alphabet, but the characters who
>>> needs to trigger an explicit shift back to US-ASCII. Make that clearer
>>>
>>> Signed-off-by: Max Gautier <mg@max.gautier.name>
>>
>>
>> LGTM, thanks.
>>
>> Reviewed-by: Adhemerval Zanellla <adhemerval.zanella@linaro.org>
>>
>>> ---
>>> iconvdata/utf-7.c | 64 ++++++++++++++++++++++++-----------------------
>>> 1 file changed, 33 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
>>> index 9ba0974959..15f3669ac8 100644
>>> --- a/iconvdata/utf-7.c
>>> +++ b/iconvdata/utf-7.c
>>> @@ -30,20 +30,27 @@
>>>
>>>
>>>
>>> +static bool
>>> +between (uint32_t const ch,
>>> + uint32_t const lower_bound, uint32_t const upper_bound)
>>> +{
>>> + return (ch >= lower_bound && ch <= upper_bound);
>>> +}
>>> +
>>> /* The set of "direct characters":
>>> A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
>>> */
>>>
>>> -static const unsigned char direct_tab[128 / 8] =
>>> - {
>>> - 0x00, 0x26, 0x00, 0x00, 0x81, 0xf3, 0xff, 0x87,
>>> - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
>>> - };
>>> -
>>> -static int
>>> -isdirect (uint32_t ch)
>>> +static bool
>>> +isdirect (uint32_t ch, enum variant var)
>>> {
>
> In fact I am seeing this failure:
>
> utf-7.c:45:29: error: ‘enum variant’ declared inside parameter list will not be visible outside of this definition o
> r declaration [-Werror]
> 45 | isdirect (uint32_t ch, enum variant var)
> | ^~~~~~~
>
> Since 'enum variant' in only defined on next patch. Usually the best
> practice is keep each patch consistent, so could you move the definition
> on this patch?
>
> Or I can fix it for you before installing, it is up to you.
And it does not actually required the variant argument on neither, isdirect or
isxdirect. The obvious fix for this patch is:
diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
index 4a89de235a..815b1891c7 100644
--- a/iconvdata/utf-7.c
+++ b/iconvdata/utf-7.c
@@ -42,7 +42,7 @@ between (uint32_t const ch,
*/
static bool
-isdirect (uint32_t ch, enum variant var)
+isdirect (uint32_t ch)
{
return (between (ch, 'A', 'Z')
|| between (ch, 'a', 'z')
@@ -60,7 +60,7 @@ isdirect (uint32_t ch, enum variant var)
*/
static bool
-isxdirect (uint32_t ch, enum variant var)
+isxdirect (uint32_t ch)
{
return (ch == '\t'
|| ch == '\n'
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 2/4] iconv: Better mapping to RFC for UTF-7
2022-03-21 11:59 ` Adhemerval Zanella
2022-03-21 12:06 ` Adhemerval Zanella
@ 2022-03-21 14:07 ` Max Gautier
1 sibling, 0 replies; 60+ messages in thread
From: Max Gautier @ 2022-03-21 14:07 UTC (permalink / raw)
To: Adhemerval Zanella; +Cc: libc-alpha
On Mon, Mar 21, 2022 at 08:59:27AM -0300, Adhemerval Zanella wrote:
>
>
> On 21/03/2022 08:53, Adhemerval Zanella wrote:
> >
> >
> > On 20/03/2022 13:41, Max Gautier via Libc-alpha wrote:
> >> - Direct use of characters instead of arcane arrays
> >> - isxbase64 is not the Modified BASE64 alphabet, but the characters who
> >> needs to trigger an explicit shift back to US-ASCII. Make that clearer
> >>
> >> Signed-off-by: Max Gautier <mg@max.gautier.name>
> >
> >
> > LGTM, thanks.
> >
> > Reviewed-by: Adhemerval Zanellla <adhemerval.zanella@linaro.org>
> >
> >> ---
> >> iconvdata/utf-7.c | 64 ++++++++++++++++++++++++-----------------------
> >> 1 file changed, 33 insertions(+), 31 deletions(-)
> >>
> >> diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
> >> index 9ba0974959..15f3669ac8 100644
> >> --- a/iconvdata/utf-7.c
> >> +++ b/iconvdata/utf-7.c
> >> @@ -30,20 +30,27 @@
> >>
> >>
> >>
> >> +static bool
> >> +between (uint32_t const ch,
> >> + uint32_t const lower_bound, uint32_t const upper_bound)
> >> +{
> >> + return (ch >= lower_bound && ch <= upper_bound);
> >> +}
> >> +
> >> /* The set of "direct characters":
> >> A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
> >> */
> >>
> >> -static const unsigned char direct_tab[128 / 8] =
> >> - {
> >> - 0x00, 0x26, 0x00, 0x00, 0x81, 0xf3, 0xff, 0x87,
> >> - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07
> >> - };
> >> -
> >> -static int
> >> -isdirect (uint32_t ch)
> >> +static bool
> >> +isdirect (uint32_t ch, enum variant var)
> >> {
>
> In fact I am seeing this failure:
>
> utf-7.c:45:29: error: ‘enum variant’ declared inside parameter list will not be visible outside of this definition o
> r declaration [-Werror]
> 45 | isdirect (uint32_t ch, enum variant var)
> | ^~~~~~~
>
> Since 'enum variant' in only defined on next patch. Usually the best
> practice is keep each patch consistent, so could you move the definition
> on this patch?
>
> Or I can fix it for you before installing, it is up to you.
>
I think I mixed up my patches while integrating the corrections and
style fixes you mentionned, sorry.
No problem for me I you fix it before applying.
Thanks !
--
Max Gautier
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v4 3/4] iconv: make utf-7.c able to use variants
2021-12-09 9:31 ` [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP Max Gautier
2021-12-09 9:31 ` [PATCH v4 1/4] iconv: Always encode "optional direct" UTF-7 characters Max Gautier
2021-12-09 9:31 ` [PATCH v4 2/4] iconv: Better mapping to RFC for UTF-7 Max Gautier
@ 2021-12-09 9:31 ` Max Gautier
2022-03-07 12:34 ` Adhemerval Zanella
2022-03-20 16:42 ` [PATCH v5 " Max Gautier
2021-12-09 9:31 ` [PATCH v4 4/4] iconv: Add UTF-7-IMAP variant in utf-7.c Max Gautier
` (3 subsequent siblings)
6 siblings, 2 replies; 60+ messages in thread
From: Max Gautier @ 2021-12-09 9:31 UTC (permalink / raw)
To: libc-alpha; +Cc: Max Gautier
Add infrastructure in utf-7.c to handle variants. The approach comes from
iso646.c
The variant is defined at gconv_init time and is passed as a
supplementary variable.
Signed-off-by: Max Gautier <mg@max.gautier.name>
---
iconvdata/utf-7.c | 239 +++++++++++++++++++++++++++++++++-------------
1 file changed, 174 insertions(+), 65 deletions(-)
diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
index ac7d78141a..965d4220f1 100644
--- a/iconvdata/utf-7.c
+++ b/iconvdata/utf-7.c
@@ -29,6 +29,24 @@
#include <stdlib.h>
+enum variant
+{
+ UTF7,
+};
+
+/* Must be in the same order as enum variant above. */
+static const char names[] =
+ "UTF-7//\0"
+ "\0";
+
+static uint32_t
+shift_character(enum variant const var)
+{
+ if (var == UTF7)
+ return '+';
+ else
+ abort();
+}
static int
between(uint32_t const ch,
@@ -38,37 +56,43 @@ between(uint32_t const ch,
}
/* The set of "direct characters":
+ FOR UTF-7
A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
*/
static int
-isdirect (uint32_t ch)
+isdirect (uint32_t ch, enum variant var)
{
- return (between(ch, 'A', 'Z')
- || between(ch, 'a', 'z')
- || between(ch, '0', '9')
- || ch == '\'' || ch == '(' || ch == ')'
- || between(ch, ',', '/')
- || ch == ':' || ch == '?'
- || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
+ if (var == UTF7)
+ return (between(ch, 'A', 'Z')
+ || between(ch, 'a', 'z')
+ || between(ch, '0', '9')
+ || ch == '\'' || ch == '(' || ch == ')'
+ || between(ch, ',', '/')
+ || ch == ':' || ch == '?'
+ || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
+ abort();
}
/* The set of "direct and optional direct characters":
+ (UTF-7 only)
A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
! " # $ % & * ; < = > @ [ ] ^ _ ` { | }
*/
-
static int
-isxdirect (uint32_t ch)
+isxdirect (uint32_t ch, enum variant var)
{
- return (ch == '\t'
- || ch == '\n'
- || ch == '\r'
- || (between(ch, ' ','}')
- && ch != '+' && ch != '\\')
- );
+ return(isdirect(ch, var)
+ || (var == UTF7 &&
+ (between(ch, '!', '&')
+ || ch == '*'
+ || between(ch, ';', '@')
+ || (between(ch, '[', '`') && ch != '\\')
+ || between(ch, '{', '}'))
+ )
+ );
}
@@ -91,7 +115,7 @@ needs_explicit_shift (uint32_t ch)
/* Converts a value in the range 0..63 to a base64 encoded char. */
static unsigned char
-base64 (unsigned int i)
+base64 (unsigned int i, enum variant var)
{
if (i < 26)
return i + 'A';
@@ -101,7 +125,7 @@ base64 (unsigned int i)
return i - 52 + '0';
else if (i == 62)
return '+';
- else if (i == 63)
+ else if (i == 63 && var == UTF7)
return '/';
else
abort ();
@@ -109,9 +133,8 @@ base64 (unsigned int i)
/* Definitions used in the body of the `gconv' function. */
-#define CHARSET_NAME "UTF-7//"
-#define DEFINE_INIT 1
-#define DEFINE_FINI 1
+#define DEFINE_INIT 0
+#define DEFINE_FINI 0
#define FROM_LOOP from_utf7_loop
#define TO_LOOP to_utf7_loop
#define MIN_NEEDED_FROM 1
@@ -119,11 +142,27 @@ base64 (unsigned int i)
#define MIN_NEEDED_TO 4
#define MAX_NEEDED_TO 4
#define ONE_DIRECTION 0
+#define FROM_DIRECTION (dir == from_utf7)
#define PREPARE_LOOP \
mbstate_t saved_state; \
- mbstate_t *statep = data->__statep;
-#define EXTRA_LOOP_ARGS , statep
+ mbstate_t *statep = data->__statep; \
+ enum direction dir = ((struct utf7_data *) step->__data)->dir; \
+ enum direction var = ((struct utf7_data *) step->__data)->var;
+#define EXTRA_LOOP_ARGS , statep, var
+
+
+enum direction
+{
+ illegal_dir,
+ from_utf7,
+ to_utf7
+};
+struct utf7_data
+{
+ enum direction dir;
+ enum variant var;
+};
/* Since we might have to reset input pointer we must be able to save
and restore the state. */
@@ -133,6 +172,72 @@ base64 (unsigned int i)
else \
*statep = saved_state
+extern int gconv_init (struct __gconv_step *step);
+int
+gconv_init (struct __gconv_step *step)
+{
+ /* Determine which direction. */
+ struct utf7_data *new_data;
+ enum direction dir = illegal_dir;
+
+ enum variant var = 0;
+ for (const char *name = names; *name != '\0';
+ name = __rawmemchr (name, '\0') + 1)
+ {
+ if (__strcasecmp (step->__from_name, name) == 0)
+ {
+ dir = from_utf7;
+ break;
+ }
+ else if (__strcasecmp (step->__to_name, name) == 0)
+ {
+ dir = to_utf7;
+ break;
+ }
+ ++var;
+ }
+
+ if (__builtin_expect (dir, from_utf7) != illegal_dir)
+ {
+ new_data = malloc (sizeof (*new_data));
+ if (new_data == NULL)
+ return __GCONV_NOMEM;
+
+ new_data->dir = dir;
+ new_data->var = var;
+ step->__data = new_data;
+
+ if (dir == from_utf7)
+ {
+ step->__min_needed_from = MIN_NEEDED_FROM;
+ step->__max_needed_from = MAX_NEEDED_FROM;
+ step->__min_needed_to = MIN_NEEDED_TO;
+ step->__max_needed_to = MAX_NEEDED_TO;
+ }
+ else
+ {
+ step->__min_needed_from = MIN_NEEDED_TO;
+ step->__max_needed_from = MAX_NEEDED_TO;
+ step->__min_needed_to = MIN_NEEDED_FROM;
+ step->__max_needed_to = MAX_NEEDED_FROM;
+ }
+ }
+ else
+ return __GCONV_NOCONV;
+
+ step->__stateful = 1;
+
+ return __GCONV_OK;
+}
+
+extern void gconv_end (struct __gconv_step *data);
+void
+gconv_end (struct __gconv_step *data)
+{
+ free (data->__data);
+}
+
+
/* First define the conversion function from UTF-7 to UCS4.
The state is structured as follows:
@@ -160,13 +265,13 @@ base64 (unsigned int i)
if ((statep->__count >> 3) == 0) \
{ \
/* base64 encoding inactive. */ \
- if (isxdirect (ch)) \
+ if (isxdirect (ch, var)) \
{ \
inptr++; \
put32 (outptr, ch); \
outptr += 4; \
} \
- else if (__glibc_likely (ch == '+')) \
+ else if (__glibc_likely (ch == shift_character(var))) \
{ \
if (__glibc_unlikely (inptr + 2 > inend)) \
{ \
@@ -291,7 +396,7 @@ base64 (unsigned int i)
} \
}
#define LOOP_NEED_FLAGS
-#define EXTRA_LOOP_DECLS , mbstate_t *statep
+#define EXTRA_LOOP_DECLS , mbstate_t *statep, enum variant var
#include <iconv/loop.c>
@@ -322,7 +427,7 @@ base64 (unsigned int i)
if ((statep->__count & 0x18) == 0) \
{ \
/* base64 encoding inactive */ \
- if (isdirect (ch)) \
+ if (isdirect (ch, var)) \
{ \
*outptr++ = (unsigned char) ch; \
} \
@@ -330,7 +435,7 @@ base64 (unsigned int i)
{ \
size_t count; \
\
- if (ch == '+') \
+ if (ch == shift_character(var)) \
count = 2; \
else if (ch < 0x10000) \
count = 3; \
@@ -345,13 +450,13 @@ base64 (unsigned int i)
break; \
} \
\
- *outptr++ = '+'; \
- if (ch == '+') \
+ *outptr++ = shift_character(var); \
+ if (ch == shift_character(var)) \
*outptr++ = '-'; \
else if (ch < 0x10000) \
{ \
- *outptr++ = base64 (ch >> 10); \
- *outptr++ = base64 ((ch >> 4) & 0x3f); \
+ *outptr++ = base64 (ch >> 10, var); \
+ *outptr++ = base64 ((ch >> 4) & 0x3f, var); \
statep->__count = ((ch & 15) << 5) | (3 << 3); \
} \
else if (ch < 0x110000) \
@@ -360,11 +465,11 @@ base64 (unsigned int i)
uint32_t ch2 = 0xdc00 + ((ch - 0x10000) & 0x3ff); \
\
ch = (ch1 << 16) | ch2; \
- *outptr++ = base64 (ch >> 26); \
- *outptr++ = base64 ((ch >> 20) & 0x3f); \
- *outptr++ = base64 ((ch >> 14) & 0x3f); \
- *outptr++ = base64 ((ch >> 8) & 0x3f); \
- *outptr++ = base64 ((ch >> 2) & 0x3f); \
+ *outptr++ = base64 (ch >> 26, var); \
+ *outptr++ = base64 ((ch >> 20) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 14) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 8) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 2) & 0x3f, var); \
statep->__count = ((ch & 3) << 7) | (2 << 3); \
} \
else \
@@ -374,7 +479,7 @@ base64 (unsigned int i)
else \
{ \
/* base64 encoding active */ \
- if (isdirect (ch)) \
+ if (isdirect (ch, var)) \
{ \
/* deactivate base64 encoding */ \
size_t count; \
@@ -388,7 +493,7 @@ base64 (unsigned int i)
} \
\
if ((statep->__count & 0x18) >= 0x10) \
- *outptr++ = base64 ((statep->__count >> 3) & ~3); \
+ *outptr++ = base64 ((statep->__count >> 3) & ~3, var); \
if (needs_explicit_shift (ch)) \
*outptr++ = '-'; \
*outptr++ = (unsigned char) ch; \
@@ -416,22 +521,24 @@ base64 (unsigned int i)
switch ((statep->__count >> 3) & 3) \
{ \
case 1: \
- *outptr++ = base64 (ch >> 10); \
- *outptr++ = base64 ((ch >> 4) & 0x3f); \
+ *outptr++ = base64 (ch >> 10, var); \
+ *outptr++ = base64 ((ch >> 4) & 0x3f, var); \
statep->__count = ((ch & 15) << 5) | (3 << 3); \
break; \
case 2: \
*outptr++ = \
- base64 (((statep->__count >> 3) & ~3) | (ch >> 12)); \
- *outptr++ = base64 ((ch >> 6) & 0x3f); \
- *outptr++ = base64 (ch & 0x3f); \
+ base64 (((statep->__count >> 3) & ~3) | (ch >> 12), \
+ var); \
+ *outptr++ = base64 ((ch >> 6) & 0x3f, var); \
+ *outptr++ = base64 (ch & 0x3f, var); \
statep->__count = (1 << 3); \
break; \
case 3: \
*outptr++ = \
- base64 (((statep->__count >> 3) & ~3) | (ch >> 14)); \
- *outptr++ = base64 ((ch >> 8) & 0x3f); \
- *outptr++ = base64 ((ch >> 2) & 0x3f); \
+ base64 (((statep->__count >> 3) & ~3) | (ch >> 14), \
+ var); \
+ *outptr++ = base64 ((ch >> 8) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 2) & 0x3f, var); \
statep->__count = ((ch & 3) << 7) | (2 << 3); \
break; \
default: \
@@ -447,30 +554,32 @@ base64 (unsigned int i)
switch ((statep->__count >> 3) & 3) \
{ \
case 1: \
- *outptr++ = base64 (ch >> 26); \
- *outptr++ = base64 ((ch >> 20) & 0x3f); \
- *outptr++ = base64 ((ch >> 14) & 0x3f); \
- *outptr++ = base64 ((ch >> 8) & 0x3f); \
- *outptr++ = base64 ((ch >> 2) & 0x3f); \
+ *outptr++ = base64 (ch >> 26, var); \
+ *outptr++ = base64 ((ch >> 20) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 14) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 8) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 2) & 0x3f, var); \
statep->__count = ((ch & 3) << 7) | (2 << 3); \
break; \
case 2: \
*outptr++ = \
- base64 (((statep->__count >> 3) & ~3) | (ch >> 28)); \
- *outptr++ = base64 ((ch >> 22) & 0x3f); \
- *outptr++ = base64 ((ch >> 16) & 0x3f); \
- *outptr++ = base64 ((ch >> 10) & 0x3f); \
- *outptr++ = base64 ((ch >> 4) & 0x3f); \
+ base64 (((statep->__count >> 3) & ~3) | (ch >> 28), \
+ var); \
+ *outptr++ = base64 ((ch >> 22) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 16) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 10) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 4) & 0x3f, var); \
statep->__count = ((ch & 15) << 5) | (3 << 3); \
break; \
case 3: \
*outptr++ = \
- base64 (((statep->__count >> 3) & ~3) | (ch >> 30)); \
- *outptr++ = base64 ((ch >> 24) & 0x3f); \
- *outptr++ = base64 ((ch >> 18) & 0x3f); \
- *outptr++ = base64 ((ch >> 12) & 0x3f); \
- *outptr++ = base64 ((ch >> 6) & 0x3f); \
- *outptr++ = base64 (ch & 0x3f); \
+ base64 (((statep->__count >> 3) & ~3) | (ch >> 30), \
+ var); \
+ *outptr++ = base64 ((ch >> 24) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 18) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 12) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 6) & 0x3f, var); \
+ *outptr++ = base64 (ch & 0x3f, var); \
statep->__count = (1 << 3); \
break; \
default: \
@@ -486,7 +595,7 @@ base64 (unsigned int i)
inptr += 4; \
}
#define LOOP_NEED_FLAGS
-#define EXTRA_LOOP_DECLS , mbstate_t *statep
+#define EXTRA_LOOP_DECLS , mbstate_t *statep, enum variant var
#include <iconv/loop.c>
@@ -516,7 +625,7 @@ base64 (unsigned int i)
{ \
/* Write out the shift sequence. */ \
if ((state & 0x18) >= 0x10) \
- *outbuf++ = base64 ((state >> 3) & ~3); \
+ *outbuf++ = base64 ((state >> 3) & ~3, var); \
*outbuf++ = '-'; \
\
data->__statep->__count = 0; \
--
2.34.1
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v4 3/4] iconv: make utf-7.c able to use variants
2021-12-09 9:31 ` [PATCH v4 3/4] iconv: make utf-7.c able to use variants Max Gautier
@ 2022-03-07 12:34 ` Adhemerval Zanella
2022-03-12 11:07 ` Max Gautier
2022-03-20 16:42 ` [PATCH v5 " Max Gautier
1 sibling, 1 reply; 60+ messages in thread
From: Adhemerval Zanella @ 2022-03-07 12:34 UTC (permalink / raw)
To: Max Gautier, libc-alpha
On 09/12/2021 06:31, Max Gautier via Libc-alpha wrote:
> Add infrastructure in utf-7.c to handle variants. The approach comes from
> iso646.c
> The variant is defined at gconv_init time and is passed as a
> supplementary variable.
>
> Signed-off-by: Max Gautier <mg@max.gautier.name>
Patch looks ok in general, there are style issues that needed to be fixed and some
minor suggestions below.
> ---
> iconvdata/utf-7.c | 239 +++++++++++++++++++++++++++++++++-------------
> 1 file changed, 174 insertions(+), 65 deletions(-)
>
> diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
> index ac7d78141a..965d4220f1 100644
> --- a/iconvdata/utf-7.c
> +++ b/iconvdata/utf-7.c
> @@ -29,6 +29,24 @@
> #include <stdlib.h>
>
>
> +enum variant
> +{
> + UTF7,
> +};
> +
> +/* Must be in the same order as enum variant above. */
> +static const char names[] =
> + "UTF-7//\0"
> + "\0";
> +
> +static uint32_t
> +shift_character(enum variant const var)
> +{
> + if (var == UTF7)
> + return '+';
> + else
> + abort();
> +}
Please use the expected indentation on glibc [1] and other places as well.
[1] https://sourceware.org/glibc/wiki/Style_and_Conventions
>
> static int
> between(uint32_t const ch,
> @@ -38,37 +56,43 @@ between(uint32_t const ch,
> }
>
> /* The set of "direct characters":
> + FOR UTF-7
> A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
> */
>
> static int
> -isdirect (uint32_t ch)
> +isdirect (uint32_t ch, enum variant var)
> {
> - return (between(ch, 'A', 'Z')
> - || between(ch, 'a', 'z')
> - || between(ch, '0', '9')
> - || ch == '\'' || ch == '(' || ch == ')'
> - || between(ch, ',', '/')
> - || ch == ':' || ch == '?'
> - || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
> + if (var == UTF7)
> + return (between(ch, 'A', 'Z')
> + || between(ch, 'a', 'z')
> + || between(ch, '0', '9')
> + || ch == '\'' || ch == '(' || ch == ')'
> + || between(ch, ',', '/')
> + || ch == ':' || ch == '?'
> + || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
> + abort();
> }
>
>
> /* The set of "direct and optional direct characters":
> + (UTF-7 only)
> A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
> ! " # $ % & * ; < = > @ [ ] ^ _ ` { | }
> */
>
> -
> static int
> -isxdirect (uint32_t ch)
> +isxdirect (uint32_t ch, enum variant var)
> {
> - return (ch == '\t'
> - || ch == '\n'
> - || ch == '\r'
> - || (between(ch, ' ','}')
> - && ch != '+' && ch != '\\')
> - );
> + return(isdirect(ch, var)
> + || (var == UTF7 &&
> + (between(ch, '!', '&')
> + || ch == '*'
> + || between(ch, ';', '@')
> + || (between(ch, '[', '`') && ch != '\\')
> + || between(ch, '{', '}'))
> + )
> + );
> }
>
The change is ok, but maybe adding the variant out makes it more clear:
if (var != UTF7)
return 0;
[...]
Also I think since you change it, it would be better to use 'bool' as
return type.
>
> @@ -91,7 +115,7 @@ needs_explicit_shift (uint32_t ch)
>
> /* Converts a value in the range 0..63 to a base64 encoded char. */
> static unsigned char
> -base64 (unsigned int i)
> +base64 (unsigned int i, enum variant var)
> {
> if (i < 26)
> return i + 'A';
> @@ -101,7 +125,7 @@ base64 (unsigned int i)
> return i - 52 + '0';
> else if (i == 62)
> return '+';
> - else if (i == 63)
> + else if (i == 63 && var == UTF7)
> return '/';
> else
> abort ();
> @@ -109,9 +133,8 @@ base64 (unsigned int i)
>
>
Ok.
> /* Definitions used in the body of the `gconv' function. */
> -#define CHARSET_NAME "UTF-7//"
> -#define DEFINE_INIT 1
> -#define DEFINE_FINI 1
> +#define DEFINE_INIT 0
> +#define DEFINE_FINI 0
> #define FROM_LOOP from_utf7_loop
> #define TO_LOOP to_utf7_loop
> #define MIN_NEEDED_FROM 1
> @@ -119,11 +142,27 @@ base64 (unsigned int i)
> #define MIN_NEEDED_TO 4
> #define MAX_NEEDED_TO 4
> #define ONE_DIRECTION 0
> +#define FROM_DIRECTION (dir == from_utf7)
> #define PREPARE_LOOP \
> mbstate_t saved_state; \
> - mbstate_t *statep = data->__statep;
> -#define EXTRA_LOOP_ARGS , statep
> + mbstate_t *statep = data->__statep; \
> + enum direction dir = ((struct utf7_data *) step->__data)->dir; \
> + enum direction var = ((struct utf7_data *) step->__data)->var;
> +#define EXTRA_LOOP_ARGS , statep, var
> +
> +
> +enum direction
> +{
> + illegal_dir,
> + from_utf7,
> + to_utf7
> +};
Style, use two spaces.
>
> +struct utf7_data
> +{
> + enum direction dir;
> + enum variant var;
> +};
>
> /* Since we might have to reset input pointer we must be able to save
> and restore the state. */
> @@ -133,6 +172,72 @@ base64 (unsigned int i)
> else \
> *statep = saved_state
>
> +extern int gconv_init (struct __gconv_step *step);
I think there is no need to add the prototype here.
> +int
> +gconv_init (struct __gconv_step *step)
> +{
> + /* Determine which direction. */
> + struct utf7_data *new_data;
> + enum direction dir = illegal_dir;
> +
> + enum variant var = 0;
> + for (const char *name = names; *name != '\0';
> + name = __rawmemchr (name, '\0') + 1)
> + {
> + if (__strcasecmp (step->__from_name, name) == 0)
> + {
> + dir = from_utf7;
> + break;
> + }
> + else if (__strcasecmp (step->__to_name, name) == 0)
> + {
> + dir = to_utf7;
> + break;
> + }
> + ++var;
> + }
> +
> + if (__builtin_expect (dir, from_utf7) != illegal_dir)
Use __glibc_likely.
> + {
> + new_data = malloc (sizeof (*new_data));
> + if (new_data == NULL)
> + return __GCONV_NOMEM;
> +
> + new_data->dir = dir;
> + new_data->var = var;
> + step->__data = new_data;
> +
> + if (dir == from_utf7)
> + {
> + step->__min_needed_from = MIN_NEEDED_FROM;
> + step->__max_needed_from = MAX_NEEDED_FROM;
> + step->__min_needed_to = MIN_NEEDED_TO;
> + step->__max_needed_to = MAX_NEEDED_TO;
> + }
> + else
> + {
> + step->__min_needed_from = MIN_NEEDED_TO;
> + step->__max_needed_from = MAX_NEEDED_TO;
> + step->__min_needed_to = MIN_NEEDED_FROM;
> + step->__max_needed_to = MAX_NEEDED_FROM;
> + }
> + }
> + else
> + return __GCONV_NOCONV;
> +
> + step->__stateful = 1;
> +
> + return __GCONV_OK;
> +}
> +
> +extern void gconv_end (struct __gconv_step *data);
> +void
> +gconv_end (struct __gconv_step *data)
> +{
> + free (data->__data);
> +}
> +
> +
>
> /* First define the conversion function from UTF-7 to UCS4.
> The state is structured as follows:
> @@ -160,13 +265,13 @@ base64 (unsigned int i)
> if ((statep->__count >> 3) == 0) \
> { \
> /* base64 encoding inactive. */ \
> - if (isxdirect (ch)) \
> + if (isxdirect (ch, var)) \
> { \
> inptr++; \
> put32 (outptr, ch); \
> outptr += 4; \
> } \
> - else if (__glibc_likely (ch == '+')) \
> + else if (__glibc_likely (ch == shift_character(var))) \
> { \
> if (__glibc_unlikely (inptr + 2 > inend)) \
> { \
> @@ -291,7 +396,7 @@ base64 (unsigned int i)
> } \
> }
> #define LOOP_NEED_FLAGS
> -#define EXTRA_LOOP_DECLS , mbstate_t *statep
> +#define EXTRA_LOOP_DECLS , mbstate_t *statep, enum variant var
> #include <iconv/loop.c>
>
>
> @@ -322,7 +427,7 @@ base64 (unsigned int i)
> if ((statep->__count & 0x18) == 0) \
> { \
> /* base64 encoding inactive */ \
> - if (isdirect (ch)) \
> + if (isdirect (ch, var)) \
> { \
> *outptr++ = (unsigned char) ch; \
> } \
> @@ -330,7 +435,7 @@ base64 (unsigned int i)
> { \
> size_t count; \
> \
> - if (ch == '+') \
> + if (ch == shift_character(var)) \
> count = 2; \
> else if (ch < 0x10000) \
> count = 3; \
> @@ -345,13 +450,13 @@ base64 (unsigned int i)
> break; \
> } \
> \
> - *outptr++ = '+'; \
> - if (ch == '+') \
> + *outptr++ = shift_character(var); \
> + if (ch == shift_character(var)) \
> *outptr++ = '-'; \
> else if (ch < 0x10000) \
> { \
> - *outptr++ = base64 (ch >> 10); \
> - *outptr++ = base64 ((ch >> 4) & 0x3f); \
> + *outptr++ = base64 (ch >> 10, var); \
> + *outptr++ = base64 ((ch >> 4) & 0x3f, var); \
> statep->__count = ((ch & 15) << 5) | (3 << 3); \
> } \
> else if (ch < 0x110000) \
> @@ -360,11 +465,11 @@ base64 (unsigned int i)
> uint32_t ch2 = 0xdc00 + ((ch - 0x10000) & 0x3ff); \
> \
> ch = (ch1 << 16) | ch2; \
> - *outptr++ = base64 (ch >> 26); \
> - *outptr++ = base64 ((ch >> 20) & 0x3f); \
> - *outptr++ = base64 ((ch >> 14) & 0x3f); \
> - *outptr++ = base64 ((ch >> 8) & 0x3f); \
> - *outptr++ = base64 ((ch >> 2) & 0x3f); \
> + *outptr++ = base64 (ch >> 26, var); \
> + *outptr++ = base64 ((ch >> 20) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 14) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 8) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 2) & 0x3f, var); \
> statep->__count = ((ch & 3) << 7) | (2 << 3); \
> } \
> else \
> @@ -374,7 +479,7 @@ base64 (unsigned int i)
> else \
> { \
> /* base64 encoding active */ \
> - if (isdirect (ch)) \
> + if (isdirect (ch, var)) \
> { \
> /* deactivate base64 encoding */ \
> size_t count; \
> @@ -388,7 +493,7 @@ base64 (unsigned int i)
> } \
> \
> if ((statep->__count & 0x18) >= 0x10) \
> - *outptr++ = base64 ((statep->__count >> 3) & ~3); \
> + *outptr++ = base64 ((statep->__count >> 3) & ~3, var); \
> if (needs_explicit_shift (ch)) \
> *outptr++ = '-'; \
> *outptr++ = (unsigned char) ch; \
> @@ -416,22 +521,24 @@ base64 (unsigned int i)
> switch ((statep->__count >> 3) & 3) \
> { \
> case 1: \
> - *outptr++ = base64 (ch >> 10); \
> - *outptr++ = base64 ((ch >> 4) & 0x3f); \
> + *outptr++ = base64 (ch >> 10, var); \
> + *outptr++ = base64 ((ch >> 4) & 0x3f, var); \
> statep->__count = ((ch & 15) << 5) | (3 << 3); \
> break; \
> case 2: \
> *outptr++ = \
> - base64 (((statep->__count >> 3) & ~3) | (ch >> 12)); \
> - *outptr++ = base64 ((ch >> 6) & 0x3f); \
> - *outptr++ = base64 (ch & 0x3f); \
> + base64 (((statep->__count >> 3) & ~3) | (ch >> 12), \
> + var); \
> + *outptr++ = base64 ((ch >> 6) & 0x3f, var); \
> + *outptr++ = base64 (ch & 0x3f, var); \
> statep->__count = (1 << 3); \
> break; \
> case 3: \
> *outptr++ = \
> - base64 (((statep->__count >> 3) & ~3) | (ch >> 14)); \
> - *outptr++ = base64 ((ch >> 8) & 0x3f); \
> - *outptr++ = base64 ((ch >> 2) & 0x3f); \
> + base64 (((statep->__count >> 3) & ~3) | (ch >> 14), \
> + var); \
> + *outptr++ = base64 ((ch >> 8) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 2) & 0x3f, var); \
> statep->__count = ((ch & 3) << 7) | (2 << 3); \
> break; \
> default: \
> @@ -447,30 +554,32 @@ base64 (unsigned int i)
> switch ((statep->__count >> 3) & 3) \
> { \
> case 1: \
> - *outptr++ = base64 (ch >> 26); \
> - *outptr++ = base64 ((ch >> 20) & 0x3f); \
> - *outptr++ = base64 ((ch >> 14) & 0x3f); \
> - *outptr++ = base64 ((ch >> 8) & 0x3f); \
> - *outptr++ = base64 ((ch >> 2) & 0x3f); \
> + *outptr++ = base64 (ch >> 26, var); \
> + *outptr++ = base64 ((ch >> 20) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 14) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 8) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 2) & 0x3f, var); \
> statep->__count = ((ch & 3) << 7) | (2 << 3); \
> break; \
> case 2: \
> *outptr++ = \
> - base64 (((statep->__count >> 3) & ~3) | (ch >> 28)); \
> - *outptr++ = base64 ((ch >> 22) & 0x3f); \
> - *outptr++ = base64 ((ch >> 16) & 0x3f); \
> - *outptr++ = base64 ((ch >> 10) & 0x3f); \
> - *outptr++ = base64 ((ch >> 4) & 0x3f); \
> + base64 (((statep->__count >> 3) & ~3) | (ch >> 28), \
> + var); \
> + *outptr++ = base64 ((ch >> 22) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 16) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 10) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 4) & 0x3f, var); \
> statep->__count = ((ch & 15) << 5) | (3 << 3); \
> break; \
> case 3: \
> *outptr++ = \
> - base64 (((statep->__count >> 3) & ~3) | (ch >> 30)); \
> - *outptr++ = base64 ((ch >> 24) & 0x3f); \
> - *outptr++ = base64 ((ch >> 18) & 0x3f); \
> - *outptr++ = base64 ((ch >> 12) & 0x3f); \
> - *outptr++ = base64 ((ch >> 6) & 0x3f); \
> - *outptr++ = base64 (ch & 0x3f); \
> + base64 (((statep->__count >> 3) & ~3) | (ch >> 30), \
> + var); \
> + *outptr++ = base64 ((ch >> 24) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 18) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 12) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 6) & 0x3f, var); \
> + *outptr++ = base64 (ch & 0x3f, var); \
> statep->__count = (1 << 3); \
> break; \
> default: \
> @@ -486,7 +595,7 @@ base64 (unsigned int i)
> inptr += 4; \
> }
Ok
> #define LOOP_NEED_FLAGS
> -#define EXTRA_LOOP_DECLS , mbstate_t *statep
> +#define EXTRA_LOOP_DECLS , mbstate_t *statep, enum variant var
> #include <iconv/loop.c>
>
>
> @@ -516,7 +625,7 @@ base64 (unsigned int i)
> { \
> /* Write out the shift sequence. */ \
> if ((state & 0x18) >= 0x10) \
> - *outbuf++ = base64 ((state >> 3) & ~3); \
> + *outbuf++ = base64 ((state >> 3) & ~3, var); \
> *outbuf++ = '-'; \
> \
> data->__statep->__count = 0; \
Ok.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v4 3/4] iconv: make utf-7.c able to use variants
2022-03-07 12:34 ` Adhemerval Zanella
@ 2022-03-12 11:07 ` Max Gautier
2022-03-14 12:17 ` Adhemerval Zanella
0 siblings, 1 reply; 60+ messages in thread
From: Max Gautier @ 2022-03-12 11:07 UTC (permalink / raw)
To: Adhemerval Zanella; +Cc: libc-alpha
On Mon, Mar 07, 2022 at 09:34:47AM -0300, Adhemerval Zanella wrote:
> > static int
> > -isxdirect (uint32_t ch)
> > +isxdirect (uint32_t ch, enum variant var)
> > {
> > - return (ch == '\t'
> > - || ch == '\n'
> > - || ch == '\r'
> > - || (between(ch, ' ','}')
> > - && ch != '+' && ch != '\\')
> > - );
> > + return(isdirect(ch, var)
> > + || (var == UTF7 &&
> > + (between(ch, '!', '&')
> > + || ch == '*'
> > + || between(ch, ';', '@')
> > + || (between(ch, '[', '`') && ch != '\\')
> > + || between(ch, '{', '}'))
> > + )
> > + );
> > }
> >
>
> The change is ok, but maybe adding the variant out makes it more clear:
>
> if (var != UTF7)
> return 0;
> [...]
>
something like this ?
if (isdirect(ch, var))
return true;
if (var != UTF7)
return false;
return (between(ch, '!', '&')
|| ch == '*'
|| between(ch, ';', '@')
|| (between(ch, '[', '`') && ch != '\\')
|| between(ch, '{', '}'))
);
I'd prefer the single expression form, but that works too.
> Also I think since you change it, it would be better to use 'bool' as
> return type.
Ok. I was not sure whether it was ok to use bool type or not.
Thanks for the review.
--
Max Gautier
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v4 3/4] iconv: make utf-7.c able to use variants
2022-03-12 11:07 ` Max Gautier
@ 2022-03-14 12:17 ` Adhemerval Zanella
0 siblings, 0 replies; 60+ messages in thread
From: Adhemerval Zanella @ 2022-03-14 12:17 UTC (permalink / raw)
To: Max Gautier, libc-alpha
On 12/03/2022 08:07, Max Gautier wrote:
> On Mon, Mar 07, 2022 at 09:34:47AM -0300, Adhemerval Zanella wrote:
>>> static int
>>> -isxdirect (uint32_t ch)
>>> +isxdirect (uint32_t ch, enum variant var)
>>> {
>>> - return (ch == '\t'
>>> - || ch == '\n'
>>> - || ch == '\r'
>>> - || (between(ch, ' ','}')
>>> - && ch != '+' && ch != '\\')
>>> - );
>>> + return(isdirect(ch, var)
>>> + || (var == UTF7 &&
>>> + (between(ch, '!', '&')
>>> + || ch == '*'
>>> + || between(ch, ';', '@')
>>> + || (between(ch, '[', '`') && ch != '\\')
>>> + || between(ch, '{', '}'))
>>> + )
>>> + );
>>> }
>>>
>>
>> The change is ok, but maybe adding the variant out makes it more clear:
>>
>> if (var != UTF7)
>> return 0;
>> [...]
>>
> something like this ?
>
> if (isdirect(ch, var))
> return true;
> if (var != UTF7)
> return false;
> return (between(ch, '!', '&')
> || ch == '*'
> || between(ch, ';', '@')
> || (between(ch, '[', '`') && ch != '\\')
> || between(ch, '{', '}'))
> );
>
Yes, it is slight better for readability (don't forget the space before '('
and the extra parenthesis in return is not required).
> I'd prefer the single expression form, but that works too.
>
>> Also I think since you change it, it would be better to use 'bool' as
>> return type.
>
> Ok. I was not sure whether it was ok to use bool type or not.
We build internally with gnu11, so we can use most of c11 facilities (there
are some spots like atomics that we are still migrating and require extra
care to not call libatomics).
>
> Thanks for the review.
>
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v5 3/4] iconv: make utf-7.c able to use variants
2021-12-09 9:31 ` [PATCH v4 3/4] iconv: make utf-7.c able to use variants Max Gautier
2022-03-07 12:34 ` Adhemerval Zanella
@ 2022-03-20 16:42 ` Max Gautier
2022-03-21 12:24 ` Adhemerval Zanella
1 sibling, 1 reply; 60+ messages in thread
From: Max Gautier @ 2022-03-20 16:42 UTC (permalink / raw)
To: libc-alpha; +Cc: Max Gautier
Add infrastructure in utf-7.c to handle variants. The approach comes from
iso646.c
The variant is defined at gconv_init time and is passed as a
supplementary variable.
Signed-off-by: Max Gautier <mg@max.gautier.name>
---
iconvdata/utf-7.c | 230 ++++++++++++++++++++++++++++++++++------------
1 file changed, 170 insertions(+), 60 deletions(-)
diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
index 15f3669ac8..b639d8ff3e 100644
--- a/iconvdata/utf-7.c
+++ b/iconvdata/utf-7.c
@@ -29,6 +29,24 @@
#include <stdlib.h>
+enum variant
+{
+ UTF7,
+};
+
+/* Must be in the same order as enum variant above. */
+static const char names[] =
+ "UTF-7//\0"
+ "\0";
+
+static uint32_t
+shift_character (enum variant const var)
+{
+ if (var == UTF7)
+ return '+';
+ else
+ abort ();
+}
static bool
between (uint32_t const ch,
@@ -38,23 +56,27 @@ between (uint32_t const ch,
}
/* The set of "direct characters":
+ FOR UTF-7
A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
*/
static bool
isdirect (uint32_t ch, enum variant var)
{
- return (between (ch, 'A', 'Z')
- || between (ch, 'a', 'z')
- || between (ch, '0', '9')
- || ch == '\'' || ch == '(' || ch == ')'
- || between (ch, ',', '/')
- || ch == ':' || ch == '?'
- || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
+ if (var == UTF7)
+ return (between (ch, 'A', 'Z')
+ || between (ch, 'a', 'z')
+ || between (ch, '0', '9')
+ || ch == '\'' || ch == '(' || ch == ')'
+ || between (ch, ',', '/')
+ || ch == ':' || ch == '?'
+ || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
+ abort ();
}
/* The set of "direct and optional direct characters":
+ (UTF-7 only)
A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
! " # $ % & * ; < = > @ [ ] ^ _ ` { | }
*/
@@ -62,10 +84,15 @@ isdirect (uint32_t ch, enum variant var)
static bool
isxdirect (uint32_t ch, enum variant var)
{
- return (ch == '\t'
- || ch == '\n'
- || ch == '\r'
- || (between (ch, ' ', '}') && ch != '+' && ch != '\\'));
+ if (isdirect (ch, var))
+ return true;
+ if (var != UTF7)
+ return false;
+ return between (ch, '!', '&')
+ || ch == '*'
+ || between (ch, ';', '@')
+ || (between (ch, '[', '`') && ch != '\\')
+ || between (ch, '{', '}');
}
@@ -85,7 +112,7 @@ needs_explicit_shift (uint32_t ch)
/* Converts a value in the range 0..63 to a base64 encoded char. */
static unsigned char
-base64 (unsigned int i)
+base64 (unsigned int i, enum variant var)
{
if (i < 26)
return i + 'A';
@@ -95,7 +122,7 @@ base64 (unsigned int i)
return i - 52 + '0';
else if (i == 62)
return '+';
- else if (i == 63)
+ else if (i == 63 && var == UTF7)
return '/';
else
abort ();
@@ -103,9 +130,8 @@ base64 (unsigned int i)
/* Definitions used in the body of the `gconv' function. */
-#define CHARSET_NAME "UTF-7//"
-#define DEFINE_INIT 1
-#define DEFINE_FINI 1
+#define DEFINE_INIT 0
+#define DEFINE_FINI 0
#define FROM_LOOP from_utf7_loop
#define TO_LOOP to_utf7_loop
#define MIN_NEEDED_FROM 1
@@ -113,11 +139,27 @@ base64 (unsigned int i)
#define MIN_NEEDED_TO 4
#define MAX_NEEDED_TO 4
#define ONE_DIRECTION 0
+#define FROM_DIRECTION (dir == from_utf7)
#define PREPARE_LOOP \
mbstate_t saved_state; \
- mbstate_t *statep = data->__statep;
-#define EXTRA_LOOP_ARGS , statep
+ mbstate_t *statep = data->__statep; \
+ enum direction dir = ((struct utf7_data *) step->__data)->dir; \
+ enum direction var = ((struct utf7_data *) step->__data)->var;
+#define EXTRA_LOOP_ARGS , statep, var
+
+enum direction
+{
+ illegal_dir,
+ from_utf7,
+ to_utf7
+};
+
+struct utf7_data
+{
+ enum direction dir;
+ enum variant var;
+};
/* Since we might have to reset input pointer we must be able to save
and restore the state. */
@@ -127,6 +169,70 @@ base64 (unsigned int i)
else \
*statep = saved_state
+int
+gconv_init (struct __gconv_step *step)
+{
+ /* Determine which direction. */
+ struct utf7_data *new_data;
+ enum direction dir = illegal_dir;
+
+ enum variant var = 0;
+ for (const char *name = names; *name != '\0';
+ name = __rawmemchr (name, '\0') + 1)
+ {
+ if (__strcasecmp (step->__from_name, name) == 0)
+ {
+ dir = from_utf7;
+ break;
+ }
+ else if (__strcasecmp (step->__to_name, name) == 0)
+ {
+ dir = to_utf7;
+ break;
+ }
+ ++var;
+ }
+
+ if (__glibc_likely (dir != illegal_dir))
+ {
+ new_data = malloc (sizeof (*new_data));
+ if (new_data == NULL)
+ return __GCONV_NOMEM;
+
+ new_data->dir = dir;
+ new_data->var = var;
+ step->__data = new_data;
+
+ if (dir == from_utf7)
+ {
+ step->__min_needed_from = MIN_NEEDED_FROM;
+ step->__max_needed_from = MAX_NEEDED_FROM;
+ step->__min_needed_to = MIN_NEEDED_TO;
+ step->__max_needed_to = MAX_NEEDED_TO;
+ }
+ else
+ {
+ step->__min_needed_from = MIN_NEEDED_TO;
+ step->__max_needed_from = MAX_NEEDED_TO;
+ step->__min_needed_to = MIN_NEEDED_FROM;
+ step->__max_needed_to = MAX_NEEDED_FROM;
+ }
+ }
+ else
+ return __GCONV_NOCONV;
+
+ step->__stateful = 1;
+
+ return __GCONV_OK;
+}
+
+void
+gconv_end (struct __gconv_step *data)
+{
+ free (data->__data);
+}
+
+
/* First define the conversion function from UTF-7 to UCS4.
The state is structured as follows:
@@ -154,13 +260,13 @@ base64 (unsigned int i)
if ((statep->__count >> 3) == 0) \
{ \
/* base64 encoding inactive. */ \
- if (isxdirect (ch)) \
+ if (isxdirect (ch, var)) \
{ \
inptr++; \
put32 (outptr, ch); \
outptr += 4; \
} \
- else if (__glibc_likely (ch == '+')) \
+ else if (__glibc_likely (ch == shift_character (var))) \
{ \
if (__glibc_unlikely (inptr + 2 > inend)) \
{ \
@@ -285,7 +391,7 @@ base64 (unsigned int i)
} \
}
#define LOOP_NEED_FLAGS
-#define EXTRA_LOOP_DECLS , mbstate_t *statep
+#define EXTRA_LOOP_DECLS , mbstate_t *statep, enum variant var
#include <iconv/loop.c>
@@ -316,7 +422,7 @@ base64 (unsigned int i)
if ((statep->__count & 0x18) == 0) \
{ \
/* base64 encoding inactive */ \
- if (isdirect (ch)) \
+ if (isdirect (ch, var)) \
{ \
*outptr++ = (unsigned char) ch; \
} \
@@ -324,7 +430,7 @@ base64 (unsigned int i)
{ \
size_t count; \
\
- if (ch == '+') \
+ if (ch == shift_character (var)) \
count = 2; \
else if (ch < 0x10000) \
count = 3; \
@@ -339,13 +445,13 @@ base64 (unsigned int i)
break; \
} \
\
- *outptr++ = '+'; \
- if (ch == '+') \
+ *outptr++ = shift_character (var); \
+ if (ch == shift_character (var)) \
*outptr++ = '-'; \
else if (ch < 0x10000) \
{ \
- *outptr++ = base64 (ch >> 10); \
- *outptr++ = base64 ((ch >> 4) & 0x3f); \
+ *outptr++ = base64 (ch >> 10, var); \
+ *outptr++ = base64 ((ch >> 4) & 0x3f, var); \
statep->__count = ((ch & 15) << 5) | (3 << 3); \
} \
else if (ch < 0x110000) \
@@ -354,11 +460,11 @@ base64 (unsigned int i)
uint32_t ch2 = 0xdc00 + ((ch - 0x10000) & 0x3ff); \
\
ch = (ch1 << 16) | ch2; \
- *outptr++ = base64 (ch >> 26); \
- *outptr++ = base64 ((ch >> 20) & 0x3f); \
- *outptr++ = base64 ((ch >> 14) & 0x3f); \
- *outptr++ = base64 ((ch >> 8) & 0x3f); \
- *outptr++ = base64 ((ch >> 2) & 0x3f); \
+ *outptr++ = base64 (ch >> 26, var); \
+ *outptr++ = base64 ((ch >> 20) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 14) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 8) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 2) & 0x3f, var); \
statep->__count = ((ch & 3) << 7) | (2 << 3); \
} \
else \
@@ -368,7 +474,7 @@ base64 (unsigned int i)
else \
{ \
/* base64 encoding active */ \
- if (isdirect (ch)) \
+ if (isdirect (ch, var)) \
{ \
/* deactivate base64 encoding */ \
size_t count; \
@@ -382,7 +488,7 @@ base64 (unsigned int i)
} \
\
if ((statep->__count & 0x18) >= 0x10) \
- *outptr++ = base64 ((statep->__count >> 3) & ~3); \
+ *outptr++ = base64 ((statep->__count >> 3) & ~3, var); \
if (needs_explicit_shift (ch)) \
*outptr++ = '-'; \
*outptr++ = (unsigned char) ch; \
@@ -410,22 +516,24 @@ base64 (unsigned int i)
switch ((statep->__count >> 3) & 3) \
{ \
case 1: \
- *outptr++ = base64 (ch >> 10); \
- *outptr++ = base64 ((ch >> 4) & 0x3f); \
+ *outptr++ = base64 (ch >> 10, var); \
+ *outptr++ = base64 ((ch >> 4) & 0x3f, var); \
statep->__count = ((ch & 15) << 5) | (3 << 3); \
break; \
case 2: \
*outptr++ = \
- base64 (((statep->__count >> 3) & ~3) | (ch >> 12)); \
- *outptr++ = base64 ((ch >> 6) & 0x3f); \
- *outptr++ = base64 (ch & 0x3f); \
+ base64 (((statep->__count >> 3) & ~3) | (ch >> 12), \
+ var); \
+ *outptr++ = base64 ((ch >> 6) & 0x3f, var); \
+ *outptr++ = base64 (ch & 0x3f, var); \
statep->__count = (1 << 3); \
break; \
case 3: \
*outptr++ = \
- base64 (((statep->__count >> 3) & ~3) | (ch >> 14)); \
- *outptr++ = base64 ((ch >> 8) & 0x3f); \
- *outptr++ = base64 ((ch >> 2) & 0x3f); \
+ base64 (((statep->__count >> 3) & ~3) | (ch >> 14), \
+ var); \
+ *outptr++ = base64 ((ch >> 8) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 2) & 0x3f, var); \
statep->__count = ((ch & 3) << 7) | (2 << 3); \
break; \
default: \
@@ -441,30 +549,32 @@ base64 (unsigned int i)
switch ((statep->__count >> 3) & 3) \
{ \
case 1: \
- *outptr++ = base64 (ch >> 26); \
- *outptr++ = base64 ((ch >> 20) & 0x3f); \
- *outptr++ = base64 ((ch >> 14) & 0x3f); \
- *outptr++ = base64 ((ch >> 8) & 0x3f); \
- *outptr++ = base64 ((ch >> 2) & 0x3f); \
+ *outptr++ = base64 (ch >> 26, var); \
+ *outptr++ = base64 ((ch >> 20) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 14) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 8) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 2) & 0x3f, var); \
statep->__count = ((ch & 3) << 7) | (2 << 3); \
break; \
case 2: \
*outptr++ = \
- base64 (((statep->__count >> 3) & ~3) | (ch >> 28)); \
- *outptr++ = base64 ((ch >> 22) & 0x3f); \
- *outptr++ = base64 ((ch >> 16) & 0x3f); \
- *outptr++ = base64 ((ch >> 10) & 0x3f); \
- *outptr++ = base64 ((ch >> 4) & 0x3f); \
+ base64 (((statep->__count >> 3) & ~3) | (ch >> 28), \
+ var); \
+ *outptr++ = base64 ((ch >> 22) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 16) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 10) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 4) & 0x3f, var); \
statep->__count = ((ch & 15) << 5) | (3 << 3); \
break; \
case 3: \
*outptr++ = \
- base64 (((statep->__count >> 3) & ~3) | (ch >> 30)); \
- *outptr++ = base64 ((ch >> 24) & 0x3f); \
- *outptr++ = base64 ((ch >> 18) & 0x3f); \
- *outptr++ = base64 ((ch >> 12) & 0x3f); \
- *outptr++ = base64 ((ch >> 6) & 0x3f); \
- *outptr++ = base64 (ch & 0x3f); \
+ base64 (((statep->__count >> 3) & ~3) | (ch >> 30), \
+ var); \
+ *outptr++ = base64 ((ch >> 24) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 18) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 12) & 0x3f, var); \
+ *outptr++ = base64 ((ch >> 6) & 0x3f, var); \
+ *outptr++ = base64 (ch & 0x3f, var); \
statep->__count = (1 << 3); \
break; \
default: \
@@ -480,7 +590,7 @@ base64 (unsigned int i)
inptr += 4; \
}
#define LOOP_NEED_FLAGS
-#define EXTRA_LOOP_DECLS , mbstate_t *statep
+#define EXTRA_LOOP_DECLS , mbstate_t *statep, enum variant var
#include <iconv/loop.c>
@@ -510,7 +620,7 @@ base64 (unsigned int i)
{ \
/* Write out the shift sequence. */ \
if ((state & 0x18) >= 0x10) \
- *outbuf++ = base64 ((state >> 3) & ~3); \
+ *outbuf++ = base64 ((state >> 3) & ~3, var); \
*outbuf++ = '-'; \
\
data->__statep->__count = 0; \
--
2.35.1
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 3/4] iconv: make utf-7.c able to use variants
2022-03-20 16:42 ` [PATCH v5 " Max Gautier
@ 2022-03-21 12:24 ` Adhemerval Zanella
0 siblings, 0 replies; 60+ messages in thread
From: Adhemerval Zanella @ 2022-03-21 12:24 UTC (permalink / raw)
To: Max Gautier, libc-alpha
On 20/03/2022 13:42, Max Gautier via Libc-alpha wrote:
> Add infrastructure in utf-7.c to handle variants. The approach comes from
> iso646.c
> The variant is defined at gconv_init time and is passed as a
> supplementary variable.
>
> Signed-off-by: Max Gautier <mg@max.gautier.name>
Patch looks ok, although it should be refactor to add 'enum variant' argument
on isdirect and isxdirect instead of relying on previous patch (to keep the
patch consistent).
> ---
> iconvdata/utf-7.c | 230 ++++++++++++++++++++++++++++++++++------------
> 1 file changed, 170 insertions(+), 60 deletions(-)
>
> diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
> index 15f3669ac8..b639d8ff3e 100644
> --- a/iconvdata/utf-7.c
> +++ b/iconvdata/utf-7.c
> @@ -29,6 +29,24 @@
> #include <stdlib.h>
>
>
> +enum variant
> +{
> + UTF7,
> +};
> +
> +/* Must be in the same order as enum variant above. */
> +static const char names[] =
> + "UTF-7//\0"
> + "\0";
> +
> +static uint32_t
> +shift_character (enum variant const var)
> +{
> + if (var == UTF7)
> + return '+';
> + else
> + abort ();
> +}
>
> static bool
> between (uint32_t const ch,
> @@ -38,23 +56,27 @@ between (uint32_t const ch,
> }
>
> /* The set of "direct characters":
> + FOR UTF-7
> A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
> */
>
> static bool
> isdirect (uint32_t ch, enum variant var)
> {
> - return (between (ch, 'A', 'Z')
> - || between (ch, 'a', 'z')
> - || between (ch, '0', '9')
> - || ch == '\'' || ch == '(' || ch == ')'
> - || between (ch, ',', '/')
> - || ch == ':' || ch == '?'
> - || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
> + if (var == UTF7)
> + return (between (ch, 'A', 'Z')
> + || between (ch, 'a', 'z')
> + || between (ch, '0', '9')
> + || ch == '\'' || ch == '(' || ch == ')'
> + || between (ch, ',', '/')
> + || ch == ':' || ch == '?'
> + || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
> + abort ();
> }
>
>
> /* The set of "direct and optional direct characters":
> + (UTF-7 only)
> A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
> ! " # $ % & * ; < = > @ [ ] ^ _ ` { | }
> */
> @@ -62,10 +84,15 @@ isdirect (uint32_t ch, enum variant var)
> static bool
> isxdirect (uint32_t ch, enum variant var)
> {
> - return (ch == '\t'
> - || ch == '\n'
> - || ch == '\r'
> - || (between (ch, ' ', '}') && ch != '+' && ch != '\\'));
> + if (isdirect (ch, var))
> + return true;
> + if (var != UTF7)
> + return false;
> + return between (ch, '!', '&')
> + || ch == '*'
> + || between (ch, ';', '@')
> + || (between (ch, '[', '`') && ch != '\\')
> + || between (ch, '{', '}');
> }
>
>
> @@ -85,7 +112,7 @@ needs_explicit_shift (uint32_t ch)
>
> /* Converts a value in the range 0..63 to a base64 encoded char. */
> static unsigned char
> -base64 (unsigned int i)
> +base64 (unsigned int i, enum variant var)
> {
> if (i < 26)
> return i + 'A';
> @@ -95,7 +122,7 @@ base64 (unsigned int i)
> return i - 52 + '0';
> else if (i == 62)
> return '+';
> - else if (i == 63)
> + else if (i == 63 && var == UTF7)
> return '/';
> else
> abort ();
> @@ -103,9 +130,8 @@ base64 (unsigned int i)
>
>
> /* Definitions used in the body of the `gconv' function. */
> -#define CHARSET_NAME "UTF-7//"
> -#define DEFINE_INIT 1
> -#define DEFINE_FINI 1
> +#define DEFINE_INIT 0
> +#define DEFINE_FINI 0
> #define FROM_LOOP from_utf7_loop
> #define TO_LOOP to_utf7_loop
> #define MIN_NEEDED_FROM 1
> @@ -113,11 +139,27 @@ base64 (unsigned int i)
> #define MIN_NEEDED_TO 4
> #define MAX_NEEDED_TO 4
> #define ONE_DIRECTION 0
> +#define FROM_DIRECTION (dir == from_utf7)
> #define PREPARE_LOOP \
> mbstate_t saved_state; \
> - mbstate_t *statep = data->__statep;
> -#define EXTRA_LOOP_ARGS , statep
> + mbstate_t *statep = data->__statep; \
> + enum direction dir = ((struct utf7_data *) step->__data)->dir; \
> + enum direction var = ((struct utf7_data *) step->__data)->var;
> +#define EXTRA_LOOP_ARGS , statep, var
> +
>
> +enum direction
> +{
> + illegal_dir,
> + from_utf7,
> + to_utf7
> +};
> +
> +struct utf7_data
> +{
> + enum direction dir;
> + enum variant var;
> +};
>
> /* Since we might have to reset input pointer we must be able to save
> and restore the state. */
> @@ -127,6 +169,70 @@ base64 (unsigned int i)
> else \
> *statep = saved_state
>
> +int
> +gconv_init (struct __gconv_step *step)
> +{
> + /* Determine which direction. */
> + struct utf7_data *new_data;
> + enum direction dir = illegal_dir;
> +
> + enum variant var = 0;
> + for (const char *name = names; *name != '\0';
> + name = __rawmemchr (name, '\0') + 1)
> + {
> + if (__strcasecmp (step->__from_name, name) == 0)
> + {
> + dir = from_utf7;
> + break;
> + }
> + else if (__strcasecmp (step->__to_name, name) == 0)
> + {
> + dir = to_utf7;
> + break;
> + }
> + ++var;
> + }
> +
> + if (__glibc_likely (dir != illegal_dir))
> + {
> + new_data = malloc (sizeof (*new_data));
> + if (new_data == NULL)
> + return __GCONV_NOMEM;
> +
> + new_data->dir = dir;
> + new_data->var = var;
> + step->__data = new_data;
> +
> + if (dir == from_utf7)
> + {
> + step->__min_needed_from = MIN_NEEDED_FROM;
> + step->__max_needed_from = MAX_NEEDED_FROM;
> + step->__min_needed_to = MIN_NEEDED_TO;
> + step->__max_needed_to = MAX_NEEDED_TO;
> + }
> + else
> + {
> + step->__min_needed_from = MIN_NEEDED_TO;
> + step->__max_needed_from = MAX_NEEDED_TO;
> + step->__min_needed_to = MIN_NEEDED_FROM;
> + step->__max_needed_to = MAX_NEEDED_FROM;
> + }
> + }
> + else
> + return __GCONV_NOCONV;
> +
> + step->__stateful = 1;
> +
> + return __GCONV_OK;
> +}
> +
> +void
> +gconv_end (struct __gconv_step *data)
> +{
> + free (data->__data);
> +}
> +
> +
>
> /* First define the conversion function from UTF-7 to UCS4.
> The state is structured as follows:
> @@ -154,13 +260,13 @@ base64 (unsigned int i)
> if ((statep->__count >> 3) == 0) \
> { \
> /* base64 encoding inactive. */ \
> - if (isxdirect (ch)) \
> + if (isxdirect (ch, var)) \
> { \
> inptr++; \
> put32 (outptr, ch); \
> outptr += 4; \
> } \
> - else if (__glibc_likely (ch == '+')) \
> + else if (__glibc_likely (ch == shift_character (var))) \
> { \
> if (__glibc_unlikely (inptr + 2 > inend)) \
> { \
> @@ -285,7 +391,7 @@ base64 (unsigned int i)
> } \
> }
> #define LOOP_NEED_FLAGS
> -#define EXTRA_LOOP_DECLS , mbstate_t *statep
> +#define EXTRA_LOOP_DECLS , mbstate_t *statep, enum variant var
> #include <iconv/loop.c>
>
>
> @@ -316,7 +422,7 @@ base64 (unsigned int i)
> if ((statep->__count & 0x18) == 0) \
> { \
> /* base64 encoding inactive */ \
> - if (isdirect (ch)) \
> + if (isdirect (ch, var)) \
> { \
> *outptr++ = (unsigned char) ch; \
> } \
> @@ -324,7 +430,7 @@ base64 (unsigned int i)
> { \
> size_t count; \
> \
> - if (ch == '+') \
> + if (ch == shift_character (var)) \
> count = 2; \
> else if (ch < 0x10000) \
> count = 3; \
> @@ -339,13 +445,13 @@ base64 (unsigned int i)
> break; \
> } \
> \
> - *outptr++ = '+'; \
> - if (ch == '+') \
> + *outptr++ = shift_character (var); \
> + if (ch == shift_character (var)) \
> *outptr++ = '-'; \
> else if (ch < 0x10000) \
> { \
> - *outptr++ = base64 (ch >> 10); \
> - *outptr++ = base64 ((ch >> 4) & 0x3f); \
> + *outptr++ = base64 (ch >> 10, var); \
> + *outptr++ = base64 ((ch >> 4) & 0x3f, var); \
> statep->__count = ((ch & 15) << 5) | (3 << 3); \
> } \
> else if (ch < 0x110000) \
> @@ -354,11 +460,11 @@ base64 (unsigned int i)
> uint32_t ch2 = 0xdc00 + ((ch - 0x10000) & 0x3ff); \
> \
> ch = (ch1 << 16) | ch2; \
> - *outptr++ = base64 (ch >> 26); \
> - *outptr++ = base64 ((ch >> 20) & 0x3f); \
> - *outptr++ = base64 ((ch >> 14) & 0x3f); \
> - *outptr++ = base64 ((ch >> 8) & 0x3f); \
> - *outptr++ = base64 ((ch >> 2) & 0x3f); \
> + *outptr++ = base64 (ch >> 26, var); \
> + *outptr++ = base64 ((ch >> 20) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 14) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 8) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 2) & 0x3f, var); \
> statep->__count = ((ch & 3) << 7) | (2 << 3); \
> } \
> else \
> @@ -368,7 +474,7 @@ base64 (unsigned int i)
> else \
> { \
> /* base64 encoding active */ \
> - if (isdirect (ch)) \
> + if (isdirect (ch, var)) \
> { \
> /* deactivate base64 encoding */ \
> size_t count; \
> @@ -382,7 +488,7 @@ base64 (unsigned int i)
> } \
> \
> if ((statep->__count & 0x18) >= 0x10) \
> - *outptr++ = base64 ((statep->__count >> 3) & ~3); \
> + *outptr++ = base64 ((statep->__count >> 3) & ~3, var); \
> if (needs_explicit_shift (ch)) \
> *outptr++ = '-'; \
> *outptr++ = (unsigned char) ch; \
> @@ -410,22 +516,24 @@ base64 (unsigned int i)
> switch ((statep->__count >> 3) & 3) \
> { \
> case 1: \
> - *outptr++ = base64 (ch >> 10); \
> - *outptr++ = base64 ((ch >> 4) & 0x3f); \
> + *outptr++ = base64 (ch >> 10, var); \
> + *outptr++ = base64 ((ch >> 4) & 0x3f, var); \
> statep->__count = ((ch & 15) << 5) | (3 << 3); \
> break; \
> case 2: \
> *outptr++ = \
> - base64 (((statep->__count >> 3) & ~3) | (ch >> 12)); \
> - *outptr++ = base64 ((ch >> 6) & 0x3f); \
> - *outptr++ = base64 (ch & 0x3f); \
> + base64 (((statep->__count >> 3) & ~3) | (ch >> 12), \
> + var); \
> + *outptr++ = base64 ((ch >> 6) & 0x3f, var); \
> + *outptr++ = base64 (ch & 0x3f, var); \
> statep->__count = (1 << 3); \
> break; \
> case 3: \
> *outptr++ = \
> - base64 (((statep->__count >> 3) & ~3) | (ch >> 14)); \
> - *outptr++ = base64 ((ch >> 8) & 0x3f); \
> - *outptr++ = base64 ((ch >> 2) & 0x3f); \
> + base64 (((statep->__count >> 3) & ~3) | (ch >> 14), \
> + var); \
> + *outptr++ = base64 ((ch >> 8) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 2) & 0x3f, var); \
> statep->__count = ((ch & 3) << 7) | (2 << 3); \
> break; \
> default: \
> @@ -441,30 +549,32 @@ base64 (unsigned int i)
> switch ((statep->__count >> 3) & 3) \
> { \
> case 1: \
> - *outptr++ = base64 (ch >> 26); \
> - *outptr++ = base64 ((ch >> 20) & 0x3f); \
> - *outptr++ = base64 ((ch >> 14) & 0x3f); \
> - *outptr++ = base64 ((ch >> 8) & 0x3f); \
> - *outptr++ = base64 ((ch >> 2) & 0x3f); \
> + *outptr++ = base64 (ch >> 26, var); \
> + *outptr++ = base64 ((ch >> 20) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 14) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 8) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 2) & 0x3f, var); \
> statep->__count = ((ch & 3) << 7) | (2 << 3); \
> break; \
> case 2: \
> *outptr++ = \
> - base64 (((statep->__count >> 3) & ~3) | (ch >> 28)); \
> - *outptr++ = base64 ((ch >> 22) & 0x3f); \
> - *outptr++ = base64 ((ch >> 16) & 0x3f); \
> - *outptr++ = base64 ((ch >> 10) & 0x3f); \
> - *outptr++ = base64 ((ch >> 4) & 0x3f); \
> + base64 (((statep->__count >> 3) & ~3) | (ch >> 28), \
> + var); \
> + *outptr++ = base64 ((ch >> 22) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 16) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 10) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 4) & 0x3f, var); \
> statep->__count = ((ch & 15) << 5) | (3 << 3); \
> break; \
> case 3: \
> *outptr++ = \
> - base64 (((statep->__count >> 3) & ~3) | (ch >> 30)); \
> - *outptr++ = base64 ((ch >> 24) & 0x3f); \
> - *outptr++ = base64 ((ch >> 18) & 0x3f); \
> - *outptr++ = base64 ((ch >> 12) & 0x3f); \
> - *outptr++ = base64 ((ch >> 6) & 0x3f); \
> - *outptr++ = base64 (ch & 0x3f); \
> + base64 (((statep->__count >> 3) & ~3) | (ch >> 30), \
> + var); \
> + *outptr++ = base64 ((ch >> 24) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 18) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 12) & 0x3f, var); \
> + *outptr++ = base64 ((ch >> 6) & 0x3f, var); \
> + *outptr++ = base64 (ch & 0x3f, var); \
> statep->__count = (1 << 3); \
> break; \
> default: \
> @@ -480,7 +590,7 @@ base64 (unsigned int i)
> inptr += 4; \
> }
> #define LOOP_NEED_FLAGS
> -#define EXTRA_LOOP_DECLS , mbstate_t *statep
> +#define EXTRA_LOOP_DECLS , mbstate_t *statep, enum variant var
> #include <iconv/loop.c>
>
>
> @@ -510,7 +620,7 @@ base64 (unsigned int i)
> { \
> /* Write out the shift sequence. */ \
> if ((state & 0x18) >= 0x10) \
> - *outbuf++ = base64 ((state >> 3) & ~3); \
> + *outbuf++ = base64 ((state >> 3) & ~3, var); \
> *outbuf++ = '-'; \
> \
> data->__statep->__count = 0; \
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v4 4/4] iconv: Add UTF-7-IMAP variant in utf-7.c
2021-12-09 9:31 ` [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP Max Gautier
` (2 preceding siblings ...)
2021-12-09 9:31 ` [PATCH v4 3/4] iconv: make utf-7.c able to use variants Max Gautier
@ 2021-12-09 9:31 ` Max Gautier
2022-03-07 12:46 ` Adhemerval Zanella
2022-03-20 16:43 ` [PATCH v5 " Max Gautier
2021-12-17 13:15 ` [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP Max Gautier
` (2 subsequent siblings)
6 siblings, 2 replies; 60+ messages in thread
From: Max Gautier @ 2021-12-09 9:31 UTC (permalink / raw)
To: libc-alpha; +Cc: Max Gautier
UTF-7-IMAP differs from UTF-7 in the followings ways (see RFC 3501[1]
for reference) :
- The shift character is '&' instead of '+'
- There is no "optional direct characters" and the "direct characters"
set is different
- ',' replaces '/' in the Modified Base64 alphabet
- There is no implicit shift back to US-ASCII from BASE64, all BASE64
sequences MUST be terminated with '-'
[1]: https://datatracker.ietf.org/doc/html/rfc3501#section-5.1.3
Signed-off-by: Max Gautier <mg@max.gautier.name>
---
iconvdata/TESTS | 1 +
iconvdata/gconv-modules | 4 ++++
iconvdata/testdata/UTF-7-IMAP | 1 +
iconvdata/testdata/UTF-7-IMAP..UTF8 | 32 +++++++++++++++++++++++++++++
iconvdata/utf-7.c | 28 ++++++++++++++++++++-----
5 files changed, 61 insertions(+), 5 deletions(-)
create mode 100644 iconvdata/testdata/UTF-7-IMAP
create mode 100644 iconvdata/testdata/UTF-7-IMAP..UTF8
diff --git a/iconvdata/TESTS b/iconvdata/TESTS
index a0157c3350..3cc043c21b 100644
--- a/iconvdata/TESTS
+++ b/iconvdata/TESTS
@@ -94,6 +94,7 @@ EUC-TW EUC-TW Y UTF8
GBK GBK Y UTF8
BIG5HKSCS BIG5HKSCS Y UTF8
UTF-7 UTF-7 N UTF8
+UTF-7-IMAP UTF-7-IMAP N UTF8
IBM856 IBM856 N UTF8
IBM922 IBM922 Y UTF8
IBM930 IBM930 N UTF8
diff --git a/iconvdata/gconv-modules b/iconvdata/gconv-modules
index 4acbba062f..d120699394 100644
--- a/iconvdata/gconv-modules
+++ b/iconvdata/gconv-modules
@@ -113,3 +113,7 @@ module INTERNAL UTF-32BE// UTF-32 1
alias UTF7// UTF-7//
module UTF-7// INTERNAL UTF-7 1
module INTERNAL UTF-7// UTF-7 1
+
+# from to module cost
+module UTF-7-IMAP// INTERNAL UTF-7 1
+module INTERNAL UTF-7-IMAP// UTF-7 1
diff --git a/iconvdata/testdata/UTF-7-IMAP b/iconvdata/testdata/UTF-7-IMAP
new file mode 100644
index 0000000000..6b5dada63c
--- /dev/null
+++ b/iconvdata/testdata/UTF-7-IMAP
@@ -0,0 +1 @@
+&EqASGxItEps- Amharic&AAoBDQ-esky Czech&AAo-Dansk Danish&AAo-English English&AAo-Suomi Finnish&AAo-Fran&AOc-ais French&AAo-Deutsch German&AAoDlQO7A7sDtwO9A7kDugOs- Greek&AAoF4gXRBegF2QXq- Hebrew&AAo-Italiano Italian&AAo-Norsk Norwegian&AAoEIARDBEEEQQQ6BDgEOQ- Russian&AAo-Espa&APE-ol Spanish&AAo-Svenska Swedish&AAoOIA4yDikOMg5EDhcOIg- Thai&AAo-T&APw-rk&AOc-e Turkish&AAo-Ti&Hr8-ng Vi&Hsc-t Vietnamese&AApl5Wcsip4- Japanese&AApOLWWH- Chinese&AArVXK4A- Korean&AAoACg-// Checking for correct handling of shift characters ('&-', '-') after base64 sequences&AArVXK4A-&-&AArVXK4A--&AAoACg-// Checking for correct handling of litteral '&-' and '-'&AAo----&-&--&AAoACg-// The last line of this file is missing the end-of-line terminator&AAo-// on purpose, in order to test that the conversion empties the bit buffer&AAo-// and shifts back to the initial state at the end of the conversion.&AAo-A&ImIDkQ-
\ No newline at end of file
diff --git a/iconvdata/testdata/UTF-7-IMAP..UTF8 b/iconvdata/testdata/UTF-7-IMAP..UTF8
new file mode 100644
index 0000000000..8b9add3670
--- /dev/null
+++ b/iconvdata/testdata/UTF-7-IMAP..UTF8
@@ -0,0 +1,32 @@
+አማርኛ Amharic
+česky Czech
+Dansk Danish
+English English
+Suomi Finnish
+Français French
+Deutsch German
+Ελληνικά Greek
+עברית Hebrew
+Italiano Italian
+Norsk Norwegian
+Русский Russian
+Español Spanish
+Svenska Swedish
+ภาษาไทย Thai
+Türkçe Turkish
+Tiếng Việt Vietnamese
+日本語 Japanese
+中文 Chinese
+한글 Korean
+
+// Checking for correct handling of shift characters ('&', '-') after base64 sequences
+한글&
+한글-
+
+// Checking for correct handling of litteral '&' and '-'
+---&&-
+
+// The last line of this file is missing the end-of-line terminator
+// on purpose, in order to test that the conversion empties the bit buffer
+// and shifts back to the initial state at the end of the conversion.
+A≢Α
\ No newline at end of file
diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
index 965d4220f1..553636e324 100644
--- a/iconvdata/utf-7.c
+++ b/iconvdata/utf-7.c
@@ -32,11 +32,13 @@
enum variant
{
UTF7,
+ UTF_7_IMAP
};
/* Must be in the same order as enum variant above. */
static const char names[] =
"UTF-7//\0"
+ "UTF-7-IMAP//\0"
"\0";
static uint32_t
@@ -44,6 +46,8 @@ shift_character(enum variant const var)
{
if (var == UTF7)
return '+';
+ else if (var == UTF_7_IMAP)
+ return '&';
else
abort();
}
@@ -58,6 +62,9 @@ between(uint32_t const ch,
/* The set of "direct characters":
FOR UTF-7
A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
+ FOR UTF-7-IMAP
+ A-Z a-z 0-9 ' ( ) , - . / : ? space
+ ! " # $ % + * ; < = > @ [ \ ] ^ _ ` { | } ~
*/
static int
@@ -71,6 +78,8 @@ isdirect (uint32_t ch, enum variant var)
|| between(ch, ',', '/')
|| ch == ':' || ch == '?'
|| ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
+ else if (var == UTF_7_IMAP)
+ return (ch != '&' && between(ch, ' ', '~'));
abort();
}
@@ -127,6 +136,8 @@ base64 (unsigned int i, enum variant var)
return '+';
else if (i == 63 && var == UTF7)
return '/';
+ else if (i == 63 && var == UTF_7_IMAP)
+ return ',';
else
abort ();
}
@@ -313,7 +324,8 @@ gconv_end (struct __gconv_step *data)
i = ch - '0' + 52; \
else if (ch == '+') \
i = 62; \
- else if (ch == '/') \
+ else if ((var == UTF7 && ch == '/') \
+ || (var == UTF_7_IMAP && ch == ',')) \
i = 63; \
else \
{ \
@@ -321,8 +333,10 @@ gconv_end (struct __gconv_step *data)
\
/* If accumulated data is nonzero, the input is invalid. */ \
/* Also, partial UTF-16 characters are invalid. */ \
+ /* In IMAP variant, must be terminated by '-'. */ \
if (__builtin_expect (statep->__value.__wch != 0, 0) \
- || __builtin_expect ((statep->__count >> 3) <= 26, 0)) \
+ || __builtin_expect ((statep->__count >> 3) <= 26, 0) \
+ || __builtin_expect (var == UTF_7_IMAP && ch != '-', 0)) \
{ \
STANDARD_FROM_LOOP_ERR_HANDLER ((statep->__count = 0, 1)); \
} \
@@ -479,13 +493,15 @@ gconv_end (struct __gconv_step *data)
else \
{ \
/* base64 encoding active */ \
- if (isdirect (ch, var)) \
+ if ((var == UTF_7_IMAP && ch == '&') || isdirect (ch, var)) \
{ \
/* deactivate base64 encoding */ \
size_t count; \
\
count = ((statep->__count & 0x18) >= 0x10) \
- + needs_explicit_shift (ch) + 1; \
+ + (var == UTF_7_IMAP || needs_explicit_shift (ch)) \
+ + (var == UTF_7_IMAP && ch == '&') \
+ + 1; \
if (__glibc_unlikely (outptr + count > outend)) \
{ \
result = __GCONV_FULL_OUTPUT; \
@@ -494,9 +510,11 @@ gconv_end (struct __gconv_step *data)
\
if ((statep->__count & 0x18) >= 0x10) \
*outptr++ = base64 ((statep->__count >> 3) & ~3, var); \
- if (needs_explicit_shift (ch)) \
+ if (var == UTF_7_IMAP || needs_explicit_shift (ch)) \
*outptr++ = '-'; \
*outptr++ = (unsigned char) ch; \
+ if (var == UTF_7_IMAP && ch == '&') \
+ *outptr++ = '-'; \
statep->__count = 0; \
} \
else \
--
2.34.1
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v4 4/4] iconv: Add UTF-7-IMAP variant in utf-7.c
2021-12-09 9:31 ` [PATCH v4 4/4] iconv: Add UTF-7-IMAP variant in utf-7.c Max Gautier
@ 2022-03-07 12:46 ` Adhemerval Zanella
2022-03-20 16:43 ` [PATCH v5 " Max Gautier
1 sibling, 0 replies; 60+ messages in thread
From: Adhemerval Zanella @ 2022-03-07 12:46 UTC (permalink / raw)
To: Max Gautier, libc-alpha
On 09/12/2021 06:31, Max Gautier via Libc-alpha wrote:
> UTF-7-IMAP differs from UTF-7 in the followings ways (see RFC 3501[1]
> for reference) :
>
> - The shift character is '&' instead of '+'
> - There is no "optional direct characters" and the "direct characters"
> set is different
> - ',' replaces '/' in the Modified Base64 alphabet
> - There is no implicit shift back to US-ASCII from BASE64, all BASE64
> sequences MUST be terminated with '-'
>
> [1]: https://datatracker.ietf.org/doc/html/rfc3501#section-5.1.3
>
> Signed-off-by: Max Gautier <mg@max.gautier.name>
Patch looks ok, some minor style issues (as for other parts as well).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> ---
> iconvdata/TESTS | 1 +
> iconvdata/gconv-modules | 4 ++++
> iconvdata/testdata/UTF-7-IMAP | 1 +
> iconvdata/testdata/UTF-7-IMAP..UTF8 | 32 +++++++++++++++++++++++++++++
> iconvdata/utf-7.c | 28 ++++++++++++++++++++-----
> 5 files changed, 61 insertions(+), 5 deletions(-)
> create mode 100644 iconvdata/testdata/UTF-7-IMAP
> create mode 100644 iconvdata/testdata/UTF-7-IMAP..UTF8
>
> diff --git a/iconvdata/TESTS b/iconvdata/TESTS
> index a0157c3350..3cc043c21b 100644
> --- a/iconvdata/TESTS
> +++ b/iconvdata/TESTS
> @@ -94,6 +94,7 @@ EUC-TW EUC-TW Y UTF8
> GBK GBK Y UTF8
> BIG5HKSCS BIG5HKSCS Y UTF8
> UTF-7 UTF-7 N UTF8
> +UTF-7-IMAP UTF-7-IMAP N UTF8
> IBM856 IBM856 N UTF8
> IBM922 IBM922 Y UTF8
> IBM930 IBM930 N UTF8
Ok.
> diff --git a/iconvdata/gconv-modules b/iconvdata/gconv-modules
> index 4acbba062f..d120699394 100644
> --- a/iconvdata/gconv-modules
> +++ b/iconvdata/gconv-modules
> @@ -113,3 +113,7 @@ module INTERNAL UTF-32BE// UTF-32 1
> alias UTF7// UTF-7//
> module UTF-7// INTERNAL UTF-7 1
> module INTERNAL UTF-7// UTF-7 1
> +
> +# from to module cost
> +module UTF-7-IMAP// INTERNAL UTF-7 1
> +module INTERNAL UTF-7-IMAP// UTF-7 1
Ok.
> diff --git a/iconvdata/testdata/UTF-7-IMAP b/iconvdata/testdata/UTF-7-IMAP
> new file mode 100644
> index 0000000000..6b5dada63c
> --- /dev/null
> +++ b/iconvdata/testdata/UTF-7-IMAP
> @@ -0,0 +1 @@
> +&EqASGxItEps- Amharic&AAoBDQ-esky Czech&AAo-Dansk Danish&AAo-English English&AAo-Suomi Finnish&AAo-Fran&AOc-ais French&AAo-Deutsch German&AAoDlQO7A7sDtwO9A7kDugOs- Greek&AAoF4gXRBegF2QXq- Hebrew&AAo-Italiano Italian&AAo-Norsk Norwegian&AAoEIARDBEEEQQQ6BDgEOQ- Russian&AAo-Espa&APE-ol Spanish&AAo-Svenska Swedish&AAoOIA4yDikOMg5EDhcOIg- Thai&AAo-T&APw-rk&AOc-e Turkish&AAo-Ti&Hr8-ng Vi&Hsc-t Vietnamese&AApl5Wcsip4- Japanese&AApOLWWH- Chinese&AArVXK4A- Korean&AAoACg-// Checking for correct handling of shift characters ('&-', '-') after base64 sequences&AArVXK4A-&-&AArVXK4A--&AAoACg-// Checking for correct handling of litteral '&-' and '-'&AAo----&-&--&AAoACg-// The last line of this file is missing the end-of-line terminator&AAo-// on purpose, in order to test that the conversion empties the bit buffer&AAo-// and shifts back to the initial state at the end of the conversion.&AAo-A&ImIDkQ-
> \ No newline at end of file
Ok.
> diff --git a/iconvdata/testdata/UTF-7-IMAP..UTF8 b/iconvdata/testdata/UTF-7-IMAP..UTF8
> new file mode 100644
> index 0000000000..8b9add3670
> --- /dev/null
> +++ b/iconvdata/testdata/UTF-7-IMAP..UTF8
> @@ -0,0 +1,32 @@
> +አማርኛ Amharic
> +česky Czech
> +Dansk Danish
> +English English
> +Suomi Finnish
> +Français French
> +Deutsch German
> +Ελληνικά Greek
> +עברית Hebrew
> +Italiano Italian
> +Norsk Norwegian
> +Русский Russian
> +Español Spanish
> +Svenska Swedish
> +ภาษาไทย Thai
> +Türkçe Turkish
> +Tiếng Việt Vietnamese
> +日本語 Japanese
> +中文 Chinese
> +한글 Korean
> +
> +// Checking for correct handling of shift characters ('&', '-') after base64 sequences
> +한글&
> +한글-
> +
> +// Checking for correct handling of litteral '&' and '-'
> +---&&-
> +
> +// The last line of this file is missing the end-of-line terminator
> +// on purpose, in order to test that the conversion empties the bit buffer
> +// and shifts back to the initial state at the end of the conversion.
> +A≢Α
> \ No newline at end of file
Ok.
> diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
> index 965d4220f1..553636e324 100644
> --- a/iconvdata/utf-7.c
> +++ b/iconvdata/utf-7.c
> @@ -32,11 +32,13 @@
> enum variant
> {
> UTF7,
> + UTF_7_IMAP
> };
>
> /* Must be in the same order as enum variant above. */
> static const char names[] =
> "UTF-7//\0"
> + "UTF-7-IMAP//\0"
> "\0";
>
> static uint32_t
> @@ -44,6 +46,8 @@ shift_character(enum variant const var)
> {
> if (var == UTF7)
> return '+';
> + else if (var == UTF_7_IMAP)
> + return '&';
> else
> abort();
> }
> @@ -58,6 +62,9 @@ between(uint32_t const ch,
> /* The set of "direct characters":
> FOR UTF-7
> A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
> + FOR UTF-7-IMAP
> + A-Z a-z 0-9 ' ( ) , - . / : ? space
> + ! " # $ % + * ; < = > @ [ \ ] ^ _ ` { | } ~
> */
>
> static int
> @@ -71,6 +78,8 @@ isdirect (uint32_t ch, enum variant var)
> || between(ch, ',', '/')
> || ch == ':' || ch == '?'
> || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
> + else if (var == UTF_7_IMAP)
> + return (ch != '&' && between(ch, ' ', '~'));
> abort();
> }
>
Some style issues as before.
> @@ -127,6 +136,8 @@ base64 (unsigned int i, enum variant var)
> return '+';
> else if (i == 63 && var == UTF7)
> return '/';
> + else if (i == 63 && var == UTF_7_IMAP)
> + return ',';
> else
> abort ();
> }
> @@ -313,7 +324,8 @@ gconv_end (struct __gconv_step *data)
> i = ch - '0' + 52; \
> else if (ch == '+') \
> i = 62; \
> - else if (ch == '/') \
> + else if ((var == UTF7 && ch == '/') \
> + || (var == UTF_7_IMAP && ch == ',')) \
> i = 63; \
> else \
> { \
> @@ -321,8 +333,10 @@ gconv_end (struct __gconv_step *data)
> \
> /* If accumulated data is nonzero, the input is invalid. */ \
> /* Also, partial UTF-16 characters are invalid. */ \
> + /* In IMAP variant, must be terminated by '-'. */ \
> if (__builtin_expect (statep->__value.__wch != 0, 0) \
> - || __builtin_expect ((statep->__count >> 3) <= 26, 0)) \
> + || __builtin_expect ((statep->__count >> 3) <= 26, 0) \
> + || __builtin_expect (var == UTF_7_IMAP && ch != '-', 0)) \
Use __glibc_likely.
> { \
> STANDARD_FROM_LOOP_ERR_HANDLER ((statep->__count = 0, 1)); \
> } \
> @@ -479,13 +493,15 @@ gconv_end (struct __gconv_step *data)
> else \
> { \
> /* base64 encoding active */ \
> - if (isdirect (ch, var)) \
> + if ((var == UTF_7_IMAP && ch == '&') || isdirect (ch, var)) \
> { \
> /* deactivate base64 encoding */ \
> size_t count; \
> \
> count = ((statep->__count & 0x18) >= 0x10) \
> - + needs_explicit_shift (ch) + 1; \
> + + (var == UTF_7_IMAP || needs_explicit_shift (ch)) \
> + + (var == UTF_7_IMAP && ch == '&') \
> + + 1; \
> if (__glibc_unlikely (outptr + count > outend)) \
> { \
> result = __GCONV_FULL_OUTPUT; \
> @@ -494,9 +510,11 @@ gconv_end (struct __gconv_step *data)
> \
> if ((statep->__count & 0x18) >= 0x10) \
> *outptr++ = base64 ((statep->__count >> 3) & ~3, var); \
> - if (needs_explicit_shift (ch)) \
> + if (var == UTF_7_IMAP || needs_explicit_shift (ch)) \
> *outptr++ = '-'; \
> *outptr++ = (unsigned char) ch; \
> + if (var == UTF_7_IMAP && ch == '&') \
> + *outptr++ = '-'; \
> statep->__count = 0; \
> } \
> else \
^ permalink raw reply [flat|nested] 60+ messages in thread
* [PATCH v5 4/4] iconv: Add UTF-7-IMAP variant in utf-7.c
2021-12-09 9:31 ` [PATCH v4 4/4] iconv: Add UTF-7-IMAP variant in utf-7.c Max Gautier
2022-03-07 12:46 ` Adhemerval Zanella
@ 2022-03-20 16:43 ` Max Gautier
2022-03-21 12:24 ` Adhemerval Zanella
1 sibling, 1 reply; 60+ messages in thread
From: Max Gautier @ 2022-03-20 16:43 UTC (permalink / raw)
To: libc-alpha; +Cc: Max Gautier
UTF-7-IMAP differs from UTF-7 in the followings ways (see RFC 3501[1]
for reference) :
- The shift character is '&' instead of '+'
- There is no "optional direct characters" and the "direct characters"
set is different
- There is no implicit shift back to US-ASCII from BASE64, all BASE64
sequences MUST be terminated with '-'
[1]: https://datatracker.ietf.org/doc/html/rfc3501#section-5.1.3
Signed-off-by: Max Gautier <mg@max.gautier.name>
---
iconvdata/TESTS | 1 +
iconvdata/gconv-modules | 4 ++++
iconvdata/testdata/UTF-7-IMAP | 1 +
iconvdata/testdata/UTF-7-IMAP..UTF8 | 32 +++++++++++++++++++++++++++++
iconvdata/utf-7.c | 30 +++++++++++++++++++++------
5 files changed, 62 insertions(+), 6 deletions(-)
create mode 100644 iconvdata/testdata/UTF-7-IMAP
create mode 100644 iconvdata/testdata/UTF-7-IMAP..UTF8
diff --git a/iconvdata/TESTS b/iconvdata/TESTS
index a0157c3350..3cc043c21b 100644
--- a/iconvdata/TESTS
+++ b/iconvdata/TESTS
@@ -94,6 +94,7 @@ EUC-TW EUC-TW Y UTF8
GBK GBK Y UTF8
BIG5HKSCS BIG5HKSCS Y UTF8
UTF-7 UTF-7 N UTF8
+UTF-7-IMAP UTF-7-IMAP N UTF8
IBM856 IBM856 N UTF8
IBM922 IBM922 Y UTF8
IBM930 IBM930 N UTF8
diff --git a/iconvdata/gconv-modules b/iconvdata/gconv-modules
index 4acbba062f..d120699394 100644
--- a/iconvdata/gconv-modules
+++ b/iconvdata/gconv-modules
@@ -113,3 +113,7 @@ module INTERNAL UTF-32BE// UTF-32 1
alias UTF7// UTF-7//
module UTF-7// INTERNAL UTF-7 1
module INTERNAL UTF-7// UTF-7 1
+
+# from to module cost
+module UTF-7-IMAP// INTERNAL UTF-7 1
+module INTERNAL UTF-7-IMAP// UTF-7 1
diff --git a/iconvdata/testdata/UTF-7-IMAP b/iconvdata/testdata/UTF-7-IMAP
new file mode 100644
index 0000000000..6b5dada63c
--- /dev/null
+++ b/iconvdata/testdata/UTF-7-IMAP
@@ -0,0 +1 @@
+&EqASGxItEps- Amharic&AAoBDQ-esky Czech&AAo-Dansk Danish&AAo-English English&AAo-Suomi Finnish&AAo-Fran&AOc-ais French&AAo-Deutsch German&AAoDlQO7A7sDtwO9A7kDugOs- Greek&AAoF4gXRBegF2QXq- Hebrew&AAo-Italiano Italian&AAo-Norsk Norwegian&AAoEIARDBEEEQQQ6BDgEOQ- Russian&AAo-Espa&APE-ol Spanish&AAo-Svenska Swedish&AAoOIA4yDikOMg5EDhcOIg- Thai&AAo-T&APw-rk&AOc-e Turkish&AAo-Ti&Hr8-ng Vi&Hsc-t Vietnamese&AApl5Wcsip4- Japanese&AApOLWWH- Chinese&AArVXK4A- Korean&AAoACg-// Checking for correct handling of shift characters ('&-', '-') after base64 sequences&AArVXK4A-&-&AArVXK4A--&AAoACg-// Checking for correct handling of litteral '&-' and '-'&AAo----&-&--&AAoACg-// The last line of this file is missing the end-of-line terminator&AAo-// on purpose, in order to test that the conversion empties the bit buffer&AAo-// and shifts back to the initial state at the end of the conversion.&AAo-A&ImIDkQ-
\ No newline at end of file
diff --git a/iconvdata/testdata/UTF-7-IMAP..UTF8 b/iconvdata/testdata/UTF-7-IMAP..UTF8
new file mode 100644
index 0000000000..8b9add3670
--- /dev/null
+++ b/iconvdata/testdata/UTF-7-IMAP..UTF8
@@ -0,0 +1,32 @@
+አማርኛ Amharic
+česky Czech
+Dansk Danish
+English English
+Suomi Finnish
+Français French
+Deutsch German
+Ελληνικά Greek
+עברית Hebrew
+Italiano Italian
+Norsk Norwegian
+Русский Russian
+Español Spanish
+Svenska Swedish
+ภาษาไทย Thai
+Türkçe Turkish
+Tiếng Việt Vietnamese
+日本語 Japanese
+中文 Chinese
+한글 Korean
+
+// Checking for correct handling of shift characters ('&', '-') after base64 sequences
+한글&
+한글-
+
+// Checking for correct handling of litteral '&' and '-'
+---&&-
+
+// The last line of this file is missing the end-of-line terminator
+// on purpose, in order to test that the conversion empties the bit buffer
+// and shifts back to the initial state at the end of the conversion.
+A≢Α
\ No newline at end of file
diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
index b639d8ff3e..5c2e17e50c 100644
--- a/iconvdata/utf-7.c
+++ b/iconvdata/utf-7.c
@@ -32,11 +32,13 @@
enum variant
{
UTF7,
+ UTF_7_IMAP
};
/* Must be in the same order as enum variant above. */
static const char names[] =
"UTF-7//\0"
+ "UTF-7-IMAP//\0"
"\0";
static uint32_t
@@ -44,6 +46,8 @@ shift_character (enum variant const var)
{
if (var == UTF7)
return '+';
+ else if (var == UTF_7_IMAP)
+ return '&';
else
abort ();
}
@@ -58,6 +62,9 @@ between (uint32_t const ch,
/* The set of "direct characters":
FOR UTF-7
A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
+ FOR UTF-7-IMAP
+ A-Z a-z 0-9 ' ( ) , - . / : ? space
+ ! " # $ % + * ; < = > @ [ \ ] ^ _ ` { | } ~
*/
static bool
@@ -71,6 +78,8 @@ isdirect (uint32_t ch, enum variant var)
|| between (ch, ',', '/')
|| ch == ':' || ch == '?'
|| ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
+ else if (var == UTF_7_IMAP)
+ return (ch != '&' && between (ch, ' ', '~'));
abort ();
}
@@ -124,6 +133,8 @@ base64 (unsigned int i, enum variant var)
return '+';
else if (i == 63 && var == UTF7)
return '/';
+ else if (i == 63 && var == UTF_7_IMAP)
+ return ',';
else
abort ();
}
@@ -308,7 +319,8 @@ gconv_end (struct __gconv_step *data)
i = ch - '0' + 52; \
else if (ch == '+') \
i = 62; \
- else if (ch == '/') \
+ else if ((var == UTF7 && ch == '/') \
+ || (var == UTF_7_IMAP && ch == ',')) \
i = 63; \
else \
{ \
@@ -316,8 +328,10 @@ gconv_end (struct __gconv_step *data)
\
/* If accumulated data is nonzero, the input is invalid. */ \
/* Also, partial UTF-16 characters are invalid. */ \
- if (__builtin_expect (statep->__value.__wch != 0, 0) \
- || __builtin_expect ((statep->__count >> 3) <= 26, 0)) \
+ /* In IMAP variant, must be terminated by '-'. */ \
+ if (__glibc_unlikely (statep->__value.__wch != 0) \
+ || __glibc_unlikely ((statep->__count >> 3) <= 26) \
+ || __glibc_unlikely (var == UTF_7_IMAP && ch != '-')) \
{ \
STANDARD_FROM_LOOP_ERR_HANDLER ((statep->__count = 0, 1)); \
} \
@@ -474,13 +488,15 @@ gconv_end (struct __gconv_step *data)
else \
{ \
/* base64 encoding active */ \
- if (isdirect (ch, var)) \
+ if ((var == UTF_7_IMAP && ch == '&') || isdirect (ch, var)) \
{ \
/* deactivate base64 encoding */ \
size_t count; \
\
count = ((statep->__count & 0x18) >= 0x10) \
- + needs_explicit_shift (ch) + 1; \
+ + (var == UTF_7_IMAP || needs_explicit_shift (ch)) \
+ + (var == UTF_7_IMAP && ch == '&') \
+ + 1; \
if (__glibc_unlikely (outptr + count > outend)) \
{ \
result = __GCONV_FULL_OUTPUT; \
@@ -489,9 +505,11 @@ gconv_end (struct __gconv_step *data)
\
if ((statep->__count & 0x18) >= 0x10) \
*outptr++ = base64 ((statep->__count >> 3) & ~3, var); \
- if (needs_explicit_shift (ch)) \
+ if (var == UTF_7_IMAP || needs_explicit_shift (ch)) \
*outptr++ = '-'; \
*outptr++ = (unsigned char) ch; \
+ if (var == UTF_7_IMAP && ch == '&') \
+ *outptr++ = '-'; \
statep->__count = 0; \
} \
else \
--
2.35.1
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v5 4/4] iconv: Add UTF-7-IMAP variant in utf-7.c
2022-03-20 16:43 ` [PATCH v5 " Max Gautier
@ 2022-03-21 12:24 ` Adhemerval Zanella
0 siblings, 0 replies; 60+ messages in thread
From: Adhemerval Zanella @ 2022-03-21 12:24 UTC (permalink / raw)
To: Max Gautier, libc-alpha
On 20/03/2022 13:43, Max Gautier via Libc-alpha wrote:
> UTF-7-IMAP differs from UTF-7 in the followings ways (see RFC 3501[1]
> for reference) :
>
> - The shift character is '&' instead of '+'
> - There is no "optional direct characters" and the "direct characters"
> set is different
> - There is no implicit shift back to US-ASCII from BASE64, all BASE64
> sequences MUST be terminated with '-'
>
> [1]: https://datatracker.ietf.org/doc/html/rfc3501#section-5.1.3
>
> Signed-off-by: Max Gautier <mg@max.gautier.name>
LGTM, thanks.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> ---
> iconvdata/TESTS | 1 +
> iconvdata/gconv-modules | 4 ++++
> iconvdata/testdata/UTF-7-IMAP | 1 +
> iconvdata/testdata/UTF-7-IMAP..UTF8 | 32 +++++++++++++++++++++++++++++
> iconvdata/utf-7.c | 30 +++++++++++++++++++++------
> 5 files changed, 62 insertions(+), 6 deletions(-)
> create mode 100644 iconvdata/testdata/UTF-7-IMAP
> create mode 100644 iconvdata/testdata/UTF-7-IMAP..UTF8
>
> diff --git a/iconvdata/TESTS b/iconvdata/TESTS
> index a0157c3350..3cc043c21b 100644
> --- a/iconvdata/TESTS
> +++ b/iconvdata/TESTS
> @@ -94,6 +94,7 @@ EUC-TW EUC-TW Y UTF8
> GBK GBK Y UTF8
> BIG5HKSCS BIG5HKSCS Y UTF8
> UTF-7 UTF-7 N UTF8
> +UTF-7-IMAP UTF-7-IMAP N UTF8
> IBM856 IBM856 N UTF8
> IBM922 IBM922 Y UTF8
> IBM930 IBM930 N UTF8
> diff --git a/iconvdata/gconv-modules b/iconvdata/gconv-modules
> index 4acbba062f..d120699394 100644
> --- a/iconvdata/gconv-modules
> +++ b/iconvdata/gconv-modules
> @@ -113,3 +113,7 @@ module INTERNAL UTF-32BE// UTF-32 1
> alias UTF7// UTF-7//
> module UTF-7// INTERNAL UTF-7 1
> module INTERNAL UTF-7// UTF-7 1
> +
> +# from to module cost
> +module UTF-7-IMAP// INTERNAL UTF-7 1
> +module INTERNAL UTF-7-IMAP// UTF-7 1
> diff --git a/iconvdata/testdata/UTF-7-IMAP b/iconvdata/testdata/UTF-7-IMAP
> new file mode 100644
> index 0000000000..6b5dada63c
> --- /dev/null
> +++ b/iconvdata/testdata/UTF-7-IMAP
> @@ -0,0 +1 @@
> +&EqASGxItEps- Amharic&AAoBDQ-esky Czech&AAo-Dansk Danish&AAo-English English&AAo-Suomi Finnish&AAo-Fran&AOc-ais French&AAo-Deutsch German&AAoDlQO7A7sDtwO9A7kDugOs- Greek&AAoF4gXRBegF2QXq- Hebrew&AAo-Italiano Italian&AAo-Norsk Norwegian&AAoEIARDBEEEQQQ6BDgEOQ- Russian&AAo-Espa&APE-ol Spanish&AAo-Svenska Swedish&AAoOIA4yDikOMg5EDhcOIg- Thai&AAo-T&APw-rk&AOc-e Turkish&AAo-Ti&Hr8-ng Vi&Hsc-t Vietnamese&AApl5Wcsip4- Japanese&AApOLWWH- Chinese&AArVXK4A- Korean&AAoACg-// Checking for correct handling of shift characters ('&-', '-') after base64 sequences&AArVXK4A-&-&AArVXK4A--&AAoACg-// Checking for correct handling of litteral '&-' and '-'&AAo----&-&--&AAoACg-// The last line of this file is missing the end-of-line terminator&AAo-// on purpose, in order to test that the conversion empties the bit buffer&AAo-// and shifts back to the initial state at the end of the conversion.&AAo-A&ImIDkQ-
> \ No newline at end of file
> diff --git a/iconvdata/testdata/UTF-7-IMAP..UTF8 b/iconvdata/testdata/UTF-7-IMAP..UTF8
> new file mode 100644
> index 0000000000..8b9add3670
> --- /dev/null
> +++ b/iconvdata/testdata/UTF-7-IMAP..UTF8
> @@ -0,0 +1,32 @@
> +አማርኛ Amharic
> +česky Czech
> +Dansk Danish
> +English English
> +Suomi Finnish
> +Français French
> +Deutsch German
> +Ελληνικά Greek
> +עברית Hebrew
> +Italiano Italian
> +Norsk Norwegian
> +Русский Russian
> +Español Spanish
> +Svenska Swedish
> +ภาษาไทย Thai
> +Türkçe Turkish
> +Tiếng Việt Vietnamese
> +日本語 Japanese
> +中文 Chinese
> +한글 Korean
> +
> +// Checking for correct handling of shift characters ('&', '-') after base64 sequences
> +한글&
> +한글-
> +
> +// Checking for correct handling of litteral '&' and '-'
> +---&&-
> +
> +// The last line of this file is missing the end-of-line terminator
> +// on purpose, in order to test that the conversion empties the bit buffer
> +// and shifts back to the initial state at the end of the conversion.
> +A≢Α
> \ No newline at end of file
> diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c
> index b639d8ff3e..5c2e17e50c 100644
> --- a/iconvdata/utf-7.c
> +++ b/iconvdata/utf-7.c
> @@ -32,11 +32,13 @@
> enum variant
> {
> UTF7,
> + UTF_7_IMAP
> };
>
> /* Must be in the same order as enum variant above. */
> static const char names[] =
> "UTF-7//\0"
> + "UTF-7-IMAP//\0"
> "\0";
>
> static uint32_t
> @@ -44,6 +46,8 @@ shift_character (enum variant const var)
> {
> if (var == UTF7)
> return '+';
> + else if (var == UTF_7_IMAP)
> + return '&';
> else
> abort ();
> }
> @@ -58,6 +62,9 @@ between (uint32_t const ch,
> /* The set of "direct characters":
> FOR UTF-7
> A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr
> + FOR UTF-7-IMAP
> + A-Z a-z 0-9 ' ( ) , - . / : ? space
> + ! " # $ % + * ; < = > @ [ \ ] ^ _ ` { | } ~
> */
>
> static bool
> @@ -71,6 +78,8 @@ isdirect (uint32_t ch, enum variant var)
> || between (ch, ',', '/')
> || ch == ':' || ch == '?'
> || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r');
> + else if (var == UTF_7_IMAP)
> + return (ch != '&' && between (ch, ' ', '~'));
> abort ();
> }
>
> @@ -124,6 +133,8 @@ base64 (unsigned int i, enum variant var)
> return '+';
> else if (i == 63 && var == UTF7)
> return '/';
> + else if (i == 63 && var == UTF_7_IMAP)
> + return ',';
> else
> abort ();
> }
> @@ -308,7 +319,8 @@ gconv_end (struct __gconv_step *data)
> i = ch - '0' + 52; \
> else if (ch == '+') \
> i = 62; \
> - else if (ch == '/') \
> + else if ((var == UTF7 && ch == '/') \
> + || (var == UTF_7_IMAP && ch == ',')) \
> i = 63; \
> else \
> { \
> @@ -316,8 +328,10 @@ gconv_end (struct __gconv_step *data)
> \
> /* If accumulated data is nonzero, the input is invalid. */ \
> /* Also, partial UTF-16 characters are invalid. */ \
> - if (__builtin_expect (statep->__value.__wch != 0, 0) \
> - || __builtin_expect ((statep->__count >> 3) <= 26, 0)) \
> + /* In IMAP variant, must be terminated by '-'. */ \
> + if (__glibc_unlikely (statep->__value.__wch != 0) \
> + || __glibc_unlikely ((statep->__count >> 3) <= 26) \
> + || __glibc_unlikely (var == UTF_7_IMAP && ch != '-')) \
> { \
> STANDARD_FROM_LOOP_ERR_HANDLER ((statep->__count = 0, 1)); \
> } \
> @@ -474,13 +488,15 @@ gconv_end (struct __gconv_step *data)
> else \
> { \
> /* base64 encoding active */ \
> - if (isdirect (ch, var)) \
> + if ((var == UTF_7_IMAP && ch == '&') || isdirect (ch, var)) \
> { \
> /* deactivate base64 encoding */ \
> size_t count; \
> \
> count = ((statep->__count & 0x18) >= 0x10) \
> - + needs_explicit_shift (ch) + 1; \
> + + (var == UTF_7_IMAP || needs_explicit_shift (ch)) \
> + + (var == UTF_7_IMAP && ch == '&') \
> + + 1; \
> if (__glibc_unlikely (outptr + count > outend)) \
> { \
> result = __GCONV_FULL_OUTPUT; \
> @@ -489,9 +505,11 @@ gconv_end (struct __gconv_step *data)
> \
> if ((statep->__count & 0x18) >= 0x10) \
> *outptr++ = base64 ((statep->__count >> 3) & ~3, var); \
> - if (needs_explicit_shift (ch)) \
> + if (var == UTF_7_IMAP || needs_explicit_shift (ch)) \
> *outptr++ = '-'; \
> *outptr++ = (unsigned char) ch; \
> + if (var == UTF_7_IMAP && ch == '&') \
> + *outptr++ = '-'; \
> statep->__count = 0; \
> } \
> else \
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP
2021-12-09 9:31 ` [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP Max Gautier
` (3 preceding siblings ...)
2021-12-09 9:31 ` [PATCH v4 4/4] iconv: Add UTF-7-IMAP variant in utf-7.c Max Gautier
@ 2021-12-17 13:15 ` Max Gautier
2022-01-24 14:19 ` Adhemerval Zanella
2022-01-17 14:07 ` Max Gautier
2022-01-24 9:17 ` Max Gautier
6 siblings, 1 reply; 60+ messages in thread
From: Max Gautier @ 2021-12-17 13:15 UTC (permalink / raw)
To: libc-alpha; +Cc: mg
Hi,
The contribution checklist on the wiki says to keep pinging weekly, so,
doing that.
Cheers
--
Max Gautier
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP
2021-12-17 13:15 ` [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP Max Gautier
@ 2022-01-24 14:19 ` Adhemerval Zanella
2022-02-10 13:16 ` Max Gautier
0 siblings, 1 reply; 60+ messages in thread
From: Adhemerval Zanella @ 2022-01-24 14:19 UTC (permalink / raw)
To: Max Gautier, libc-alpha
On 17/12/2021 10:15, Max Gautier via Libc-alpha wrote:
> Hi,
>
> The contribution checklist on the wiki says to keep pinging weekly, so,
> doing that.
>
> Cheers
>
Thanks for your patience. I think it is late for 2.35, but I want to get
back on this for 2.36.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP
2022-01-24 14:19 ` Adhemerval Zanella
@ 2022-02-10 13:16 ` Max Gautier
2022-02-10 13:17 ` Adhemerval Zanella
0 siblings, 1 reply; 60+ messages in thread
From: Max Gautier @ 2022-02-10 13:16 UTC (permalink / raw)
To: Adhemerval Zanella; +Cc: libc-alpha
On Mon, Jan 24, 2022 at 11:19:46AM -0300, Adhemerval Zanella wrote:
> ...
> I think it is late for 2.35, but I want to get back on this for 2.36.
Since 2.35 has sailed, any chances to tackle this some time soon ?
Cheers
--
Max Gautier
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP
2022-02-10 13:16 ` Max Gautier
@ 2022-02-10 13:17 ` Adhemerval Zanella
2022-03-04 8:53 ` Max Gautier
0 siblings, 1 reply; 60+ messages in thread
From: Adhemerval Zanella @ 2022-02-10 13:17 UTC (permalink / raw)
To: Max Gautier, libc-alpha
On 10/02/2022 10:16, Max Gautier wrote:
> On Mon, Jan 24, 2022 at 11:19:46AM -0300, Adhemerval Zanella wrote:
>> ...
>> I think it is late for 2.35, but I want to get back on this for 2.36.
>
> Since 2.35 has sailed, any chances to tackle this some time soon ?
>
> Cheers
>
Thanks for remind me, I will try to spare some time to check on this.
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP
2021-12-09 9:31 ` [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP Max Gautier
` (4 preceding siblings ...)
2021-12-17 13:15 ` [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP Max Gautier
@ 2022-01-17 14:07 ` Max Gautier
2022-01-24 9:17 ` Max Gautier
6 siblings, 0 replies; 60+ messages in thread
From: Max Gautier @ 2022-01-17 14:07 UTC (permalink / raw)
To: libc-alpha; +Cc: Max Gautier
Keeping pinging.
--
Max Gautier
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP
2021-12-09 9:31 ` [PATCH v4 0/4] iconv: Add support for UTF-7-IMAP Max Gautier
` (5 preceding siblings ...)
2022-01-17 14:07 ` Max Gautier
@ 2022-01-24 9:17 ` Max Gautier
6 siblings, 0 replies; 60+ messages in thread
From: Max Gautier @ 2022-01-24 9:17 UTC (permalink / raw)
To: libc-alpha
Pinging the patch.
--
Max Gautier
^ permalink raw reply [flat|nested] 60+ messages in thread