From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x22f.google.com (mail-oi1-x22f.google.com [IPv6:2607:f8b0:4864:20::22f]) by sourceware.org (Postfix) with ESMTPS id 652803858D35 for ; Mon, 7 Mar 2022 12:14:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 652803858D35 Received: by mail-oi1-x22f.google.com with SMTP id z8so11531091oix.3 for ; Mon, 07 Mar 2022 04:14:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=NTpU6Sbl8jKgqGjbzrXyXak5LUOt3p5cY9qldCBc+KA=; b=CwVr0hLKnVsuL8DrhgnVqhr058masMI0nWGVxzDgGB1hxI40XzZr+Ln5Xr26scYaMv nmuYS5EgFT5y49GkC3P9GabG5rIp+GinuS4Ep4K6yfpJ2KNqmW7v4D0UMjU/1FUnc9x/ oDgFU4ODWp9mSTjjRe9ht9Hj6cxjw0Vu/LFlwwrbQsZcA0mOXMTCPNldvZp2Pnah/X7C OLHYJCZXVGr4oeYW5sFH2jTL1OEfYAr/a0t9HxXqLsVuZL8S3N8T9Y7OesuExhjHyBMP 2I8bMxzht5rJ5hmMkzmCCtbA6N96JK4H4tJNcpw9l7sKglt6cJELm3Mp6jhPzv42OKaK b4HQ== X-Gm-Message-State: AOAM53244PzFvcMbHqq3iYcXQIUnqtdyVCrwvBzC433Ly1XwB1aCFcxz wGjAebp3Ei1aaOAz3U/ax9poXa8p8NoPrQ== X-Google-Smtp-Source: ABdhPJz6xRthN0SzVzwbxaERPRKD2CXWRnGq2YT80T+EbSSFk9Nlo31qQdklUKZYhRMVo6sN2dIHsg== X-Received: by 2002:a05:6808:f0e:b0:2d9:a01a:4bca with SMTP id m14-20020a0568080f0e00b002d9a01a4bcamr6910586oiw.241.1646655279653; Mon, 07 Mar 2022 04:14:39 -0800 (PST) Received: from ?IPV6:2804:431:c7ca:2dcb:c0d9:9b45:50c5:bc8e? ([2804:431:c7ca:2dcb:c0d9:9b45:50c5:bc8e]) by smtp.gmail.com with ESMTPSA id p67-20020acabf46000000b002d97bda3868sm6315594oif.45.2022.03.07.04.14.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 07 Mar 2022 04:14:39 -0800 (PST) Message-ID: Date: Mon, 7 Mar 2022 09:14:37 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH v4 2/4] iconv: Better mapping to RFC for UTF-7 Content-Language: en-US To: Max Gautier , libc-alpha@sourceware.org References: <87blcw9ptq.fsf@oldenburg.str.redhat.com> <20211209093152.313872-1-mg@max.gautier.name> <20211209093152.313872-3-mg@max.gautier.name> From: Adhemerval Zanella In-Reply-To: <20211209093152.313872-3-mg@max.gautier.name> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 07 Mar 2022 12:14:41 -0000 On 09/12/2021 06:31, Max Gautier via Libc-alpha wrote: > - Direct use of characters instead of arcane arrays > - isxbase64 is not the Modified BASE64 alphabet, but the characters who > needs to trigger an explicit shift back to US-ASCII. Make that clearer > > Signed-off-by: Max Gautier LGTM with style fixes below. Reviewed-by: Adhemerval Zanella > --- > iconvdata/utf-7.c | 56 +++++++++++++++++++++++++++-------------------- > 1 file changed, 32 insertions(+), 24 deletions(-) > > diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c > index 9ba0974959..ac7d78141a 100644 > --- a/iconvdata/utf-7.c > +++ b/iconvdata/utf-7.c > @@ -30,20 +30,27 @@ > > > > +static int > +between(uint32_t const ch, Space before '(') and for other usages below.. Also 'const' does not change much here. > + uint32_t const lower_bound, uint32_t const upper_bound) > +{ > + return (ch >= lower_bound && ch <= upper_bound); > +} > + > /* The set of "direct characters": > A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr > */ > > -static const unsigned char direct_tab[128 / 8] = > - { > - 0x00, 0x26, 0x00, 0x00, 0x81, 0xf3, 0xff, 0x87, > - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07 > - }; > - > static int > isdirect (uint32_t ch) > { > - return (ch < 128 && ((direct_tab[ch >> 3] >> (ch & 7)) & 1)); > + return (between(ch, 'A', 'Z') Ok, it is indeed clear. > + || between(ch, 'a', 'z') > + || between(ch, '0', '9') > + || ch == '\'' || ch == '(' || ch == ')' > + || between(ch, ',', '/') > + || ch == ':' || ch == '?' > + || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r'); > } > > > @@ -52,33 +59,33 @@ isdirect (uint32_t ch) > ! " # $ % & * ; < = > @ [ ] ^ _ ` { | } > */ > > -static const unsigned char xdirect_tab[128 / 8] = > - { > - 0x00, 0x26, 0x00, 0x00, 0xff, 0xf7, 0xff, 0xff, > - 0xff, 0xff, 0xff, 0xef, 0xff, 0xff, 0xff, 0x3f > - }; > > static int > isxdirect (uint32_t ch) > { > - return (ch < 128 && ((xdirect_tab[ch >> 3] >> (ch & 7)) & 1)); > + return (ch == '\t' > + || ch == '\n' > + || ch == '\r' > + || (between(ch, ' ','}') > + && ch != '+' && ch != '\\') > + ); > } > > Ok. > -/* The set of "extended base64 characters": > +/* Characters which needs to trigger an explicit shift back to US-ASCII (UTF-7 > + only): Modified base64 + '-' (shift back character) > A-Z a-z 0-9 + / - > */ > > -static const unsigned char xbase64_tab[128 / 8] = > - { > - 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0xff, 0x03, > - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07 > - }; > - > static int > -isxbase64 (uint32_t ch) > +needs_explicit_shift (uint32_t ch) > { > - return (ch < 128 && ((xbase64_tab[ch >> 3] >> (ch & 7)) & 1)); > + return (between(ch, 'A', 'Z') > + || between(ch, 'a', 'z') > + || between(ch, '/', '9') > + || ch == '+' > + || ch == '-' > + ); > } > > Ok. > @@ -372,7 +379,8 @@ base64 (unsigned int i) > /* deactivate base64 encoding */ \ > size_t count; \ > \ > - count = ((statep->__count & 0x18) >= 0x10) + isxbase64 (ch) + 1; \ > + count = ((statep->__count & 0x18) >= 0x10) \ > + + needs_explicit_shift (ch) + 1; \ > if (__glibc_unlikely (outptr + count > outend)) \ > { \ > result = __GCONV_FULL_OUTPUT; \ > @@ -381,7 +389,7 @@ base64 (unsigned int i) > \ > if ((statep->__count & 0x18) >= 0x10) \ > *outptr++ = base64 ((statep->__count >> 3) & ~3); \ > - if (isxbase64 (ch)) \ > + if (needs_explicit_shift (ch)) \ > *outptr++ = '-'; \ > *outptr++ = (unsigned char) ch; \ > statep->__count = 0; \ Ok, it just change the function name.