From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x22c.google.com (mail-oi1-x22c.google.com [IPv6:2607:f8b0:4864:20::22c]) by sourceware.org (Postfix) with ESMTPS id 71653385E444 for ; Mon, 21 Mar 2022 11:53:13 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 71653385E444 Received: by mail-oi1-x22c.google.com with SMTP id q189so15908744oia.9 for ; Mon, 21 Mar 2022 04:53:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=19sc4agPnCeUtashL1h8LxACwkhgNbj6CoRXGCPxnUY=; b=lylinqEbrhGtu8IEnBiUbi7quxjNZ63MOCT5wgEDtrcD5UlAUpudGeg5YfeRQRhArD sg+OZstPDxT2n2rY7ZSn/aY3mix3SXTMdLcqY/ZxvBzbtCT8aQ8r+9A+rC4WlioZYrsN 2M9p7/js/qBqfDgk1VygL/vLx8/OXQMsWiyDLwQe7N3GZS7IOyO1xVB4egW3t6MxeZSR fe+rForh3AYYW7/CFowonLXj1gV5Y+Fb4N/rQeziEhChCbJKYg0XJLmrYS9fRsPoSg3A oNTnpxlKyLEsWYa23rqtA8OcNuRGHl8imJF9XZ/N8292IN65xtL0lMvylWmSIwP7TuVg Um9Q== X-Gm-Message-State: AOAM530tU+nybCN8fxRLW7ROnwWYIQ4yt0H945lYJ7HxwRyCpTLSSV7q SSq8XYuXDMjqKihQOVszmdLEcvqNrJNfdQ== X-Google-Smtp-Source: ABdhPJzfi9bCE3F8JIEHSonF19tDMyH9Bq2YTh6yagPJ+3LTR9Do3DNXhweTXIqlwYKtPnB5ReVNMA== X-Received: by 2002:a05:6808:23cd:b0:2ec:f109:689a with SMTP id bq13-20020a05680823cd00b002ecf109689amr13968136oib.280.1647863592564; Mon, 21 Mar 2022 04:53:12 -0700 (PDT) Received: from ?IPV6:2804:431:c7ca:2d55:f04a:67c7:cbf3:571d? ([2804:431:c7ca:2d55:f04a:67c7:cbf3:571d]) by smtp.gmail.com with ESMTPSA id y16-20020a9d6350000000b005c9653ab377sm7259563otk.17.2022.03.21.04.53.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 21 Mar 2022 04:53:11 -0700 (PDT) Message-ID: <56099b45-e0de-17ac-7cbb-de7d4cec27dc@linaro.org> Date: Mon, 21 Mar 2022 08:53:09 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.7.0 Subject: Re: [PATCH v5 2/4] iconv: Better mapping to RFC for UTF-7 Content-Language: en-US To: Max Gautier , libc-alpha@sourceware.org References: <87blcw9ptq.fsf@oldenburg.str.redhat.com> <20211209093152.313872-1-mg@max.gautier.name> <20211209093152.313872-3-mg@max.gautier.name> From: Adhemerval Zanella In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_NUMSUBJECT, NICE_REPLY_A, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Mar 2022 11:53:15 -0000 On 20/03/2022 13:41, Max Gautier via Libc-alpha wrote: > - Direct use of characters instead of arcane arrays > - isxbase64 is not the Modified BASE64 alphabet, but the characters who > needs to trigger an explicit shift back to US-ASCII. Make that clearer > > Signed-off-by: Max Gautier LGTM, thanks. Reviewed-by: Adhemerval Zanellla > --- > iconvdata/utf-7.c | 64 ++++++++++++++++++++++++----------------------- > 1 file changed, 33 insertions(+), 31 deletions(-) > > diff --git a/iconvdata/utf-7.c b/iconvdata/utf-7.c > index 9ba0974959..15f3669ac8 100644 > --- a/iconvdata/utf-7.c > +++ b/iconvdata/utf-7.c > @@ -30,20 +30,27 @@ > > > > +static bool > +between (uint32_t const ch, > + uint32_t const lower_bound, uint32_t const upper_bound) > +{ > + return (ch >= lower_bound && ch <= upper_bound); > +} > + > /* The set of "direct characters": > A-Z a-z 0-9 ' ( ) , - . / : ? space tab lf cr > */ > > -static const unsigned char direct_tab[128 / 8] = > - { > - 0x00, 0x26, 0x00, 0x00, 0x81, 0xf3, 0xff, 0x87, > - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07 > - }; > - > -static int > -isdirect (uint32_t ch) > +static bool > +isdirect (uint32_t ch, enum variant var) > { > - return (ch < 128 && ((direct_tab[ch >> 3] >> (ch & 7)) & 1)); > + return (between (ch, 'A', 'Z') > + || between (ch, 'a', 'z') > + || between (ch, '0', '9') > + || ch == '\'' || ch == '(' || ch == ')' > + || between (ch, ',', '/') > + || ch == ':' || ch == '?' > + || ch == ' ' || ch == '\t' || ch == '\n' || ch == '\r'); > } > > > @@ -52,33 +59,27 @@ isdirect (uint32_t ch) > ! " # $ % & * ; < = > @ [ ] ^ _ ` { | } > */ > > -static const unsigned char xdirect_tab[128 / 8] = > - { > - 0x00, 0x26, 0x00, 0x00, 0xff, 0xf7, 0xff, 0xff, > - 0xff, 0xff, 0xff, 0xef, 0xff, 0xff, 0xff, 0x3f > - }; > - > -static int > -isxdirect (uint32_t ch) > +static bool > +isxdirect (uint32_t ch, enum variant var) > { > - return (ch < 128 && ((xdirect_tab[ch >> 3] >> (ch & 7)) & 1)); > + return (ch == '\t' > + || ch == '\n' > + || ch == '\r' > + || (between (ch, ' ', '}') && ch != '+' && ch != '\\')); > } > > > -/* The set of "extended base64 characters": > +/* Characters which needs to trigger an explicit shift back to US-ASCII (UTF-7 > + only): Modified base64 + '-' (shift back character) > A-Z a-z 0-9 + / - > */ > > -static const unsigned char xbase64_tab[128 / 8] = > - { > - 0x00, 0x00, 0x00, 0x00, 0x00, 0xa8, 0xff, 0x03, > - 0xfe, 0xff, 0xff, 0x07, 0xfe, 0xff, 0xff, 0x07 > - }; > - > -static int > -isxbase64 (uint32_t ch) > +static bool > +needs_explicit_shift (uint32_t ch) > { > - return (ch < 128 && ((xbase64_tab[ch >> 3] >> (ch & 7)) & 1)); > + return (between (ch, 'A', 'Z') > + || between (ch, 'a', 'z') > + || between (ch, '/', '9') || ch == '+' || ch == '-'); > } > > > @@ -252,7 +253,7 @@ base64 (unsigned int i) > indeed form a Low Surrogate. */ \ > uint32_t wc2 = wch & 0xffff; \ > \ > - if (! __builtin_expect (wc2 >= 0xdc00 && wc2 < 0xe000, 1)) \ > + if (! __glibc_likely (wc2 >= 0xdc00 && wc2 < 0xe000)) \ > { \ > STANDARD_FROM_LOOP_ERR_HANDLER ((statep->__count = 0, 1));\ > } \ > @@ -372,7 +373,8 @@ base64 (unsigned int i) > /* deactivate base64 encoding */ \ > size_t count; \ > \ > - count = ((statep->__count & 0x18) >= 0x10) + isxbase64 (ch) + 1; \ > + count = ((statep->__count & 0x18) >= 0x10) \ > + + needs_explicit_shift (ch) + 1; \ > if (__glibc_unlikely (outptr + count > outend)) \ > { \ > result = __GCONV_FULL_OUTPUT; \ > @@ -381,7 +383,7 @@ base64 (unsigned int i) > \ > if ((statep->__count & 0x18) >= 0x10) \ > *outptr++ = base64 ((statep->__count >> 3) & ~3); \ > - if (isxbase64 (ch)) \ > + if (needs_explicit_shift (ch)) \ > *outptr++ = '-'; \ > *outptr++ = (unsigned char) ch; \ > statep->__count = 0; \