From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ua1-x936.google.com (mail-ua1-x936.google.com [IPv6:2607:f8b0:4864:20::936]) by sourceware.org (Postfix) with ESMTPS id 66CFA3857433 for ; Mon, 4 Jul 2022 18:16:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 66CFA3857433 Received: by mail-ua1-x936.google.com with SMTP id t21so3749280uaq.3 for ; Mon, 04 Jul 2022 11:16:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=pThUI9tKrNROHjj5Tp4e/QoLZDnRbJWQQpZhypDYv44=; b=bsJkvg6y9apD8IGKbSM3z/tnoPDZdnlVbCMID21wFSeilkmwXGnktzsiv/1WLasF+c 7H0l5OARUxnHZJh/OmtNylsKHKePHGrHffGV4JgYEHPBUa/cfDbaPlIbleOGl7ZO+zMV /7VKxhCFFXploK7AnJ6+YNITU7Ux7sajU5T3Ex1fErRVBGszpNd5kL6fUdvmmub72nET xCJR1v47b0lSaE6XW5SeAaky2e6uYoAT6qEoGJsXWyJcX8uaUGqI5aFtj60Fd9rqakWo Ioo1j0yDlLZ4CP8s0nf/n725hhbaP6Hqnvrd8/Ff8At2w2GD3zizStOw6uTgzeuty7eE oVyA== X-Gm-Message-State: AJIora/UlGRvO+dMjrr13gxj9y+3eI0c6x6MQPnFnTNWY+RM+CHCdZYQ 7tGA9m8Bu0mNH6fFW0woLONzBQ== X-Google-Smtp-Source: AGRyM1to3GsBjVkakEMihn0Wih9/5dVLcmrbKNHpr9w4WG9bd6YtEaT+iLwllry48QWyYqPxS/UeyQ== X-Received: by 2002:ab0:63cf:0:b0:382:90bb:5c29 with SMTP id i15-20020ab063cf000000b0038290bb5c29mr2604439uap.51.1656958592561; Mon, 04 Jul 2022 11:16:32 -0700 (PDT) Received: from smtpclient.apple ([2804:431:c7cb:fef6:fc57:dc88:c1a6:22c5]) by smtp.gmail.com with ESMTPSA id d73-20020a1f1d4c000000b0036cf73a1e99sm4518308vkd.39.2022.07.04.11.16.31 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Jul 2022 11:16:32 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\)) Subject: Re: [PATCH v4 1/3] gconv: Correct Big5-HKSCS conversion to preserve all state bits. [BZ #25744] From: Adhemerval Zanella In-Reply-To: <20220630125215.6052-2-tom@honermann.net> Date: Mon, 4 Jul 2022 15:16:29 -0300 Cc: libc-alpha@sourceware.org Content-Transfer-Encoding: quoted-printable Message-Id: <8B30C512-E275-4D60-A124-C17C716B359E@linaro.org> References: <20220630125215.6052-1-tom@honermann.net> <20220630125215.6052-2-tom@honermann.net> To: Tom Honermann X-Mailer: Apple Mail (2.3696.100.31) X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Jul 2022 18:16:35 -0000 > On 30 Jun 2022, at 09:52, Tom Honermann via Libc-alpha = wrote: >=20 > This patch corrects the Big5-HKSCS converter to preserve the lowest 3 = bits of > the mbstate_t __count data member when the converter encounters an = incomplete > multibyte character. >=20 > This fixes BZ #25744. LGTM, thanks. Reviewed-by: Adhemerval Zanella > --- > iconvdata/big5hkscs.c | 16 +++--- > iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c | 65 +++++++++++++++++++++++ > 2 files changed, 73 insertions(+), 8 deletions(-) >=20 > diff --git a/iconvdata/big5hkscs.c b/iconvdata/big5hkscs.c > index a28b18a5ec..d12389b2e3 100644 > --- a/iconvdata/big5hkscs.c > +++ b/iconvdata/big5hkscs.c > @@ -17769,7 +17769,7 @@ static struct > the output state to the initial state. This has to be done during = the > flushing. */ > #define EMIT_SHIFT_TO_INIT \ > - if (data->__statep->__count !=3D 0) = \ > + if ((data->__statep->__count >> 3) !=3D 0) = \ > { = \ > if (FROM_DIRECTION) = \ > { = \ > @@ -17778,7 +17778,7 @@ static struct > /* Write out the last character. */ = \ > *((uint32_t *) outbuf) =3D data->__statep->__count >> 3; = \ > outbuf +=3D sizeof (uint32_t); = \ > - data->__statep->__count =3D 0; = \ > + data->__statep->__count &=3D 7; = \ > } = \ > else = \ > /* We don't have enough room in the output buffer. */ = \ > @@ -17792,7 +17792,7 @@ static struct > uint32_t lasttwo =3D data->__statep->__count >> 3; = \ > *outbuf++ =3D (lasttwo >> 8) & 0xff; = \ > *outbuf++ =3D lasttwo & 0xff; = \ > - data->__statep->__count =3D 0; = \ > + data->__statep->__count &=3D 7; = \ > } = \ > else = \ > /* We don't have enough room in the output buffer. */ = \ > @@ -17878,7 +17878,7 @@ static struct > = \ > /* Otherwise store only the first character now, and = \ > put the second one into the queue. */ = \ > - *statep =3D ch2 << 3; = \ > + *statep =3D (ch2 << 3) | (*statep & 7); = \ > /* Tell the caller why we terminate the loop. */ = \ > result =3D __GCONV_FULL_OUTPUT; = \ > break; = \ > @@ -17895,7 +17895,7 @@ static struct > } = \ > else = \ > /* Clear the queue and proceed to output the saved character. = */ \ > - *statep =3D 0; = \ > + *statep &=3D 7; = \ > = \ > put32 (outptr, ch); = \ > outptr +=3D 4; = \ > @@ -17946,7 +17946,7 @@ static struct > } = \ > *outptr++ =3D (ch >> 8) & 0xff; = \ > *outptr++ =3D ch & 0xff; = \ > - *statep =3D 0; = \ > + *statep &=3D 7; = \ > inptr +=3D 4; = \ > continue; = \ > = \ > @@ -17959,7 +17959,7 @@ static struct > } = \ > *outptr++ =3D (lasttwo >> 8) & 0xff; = \ > *outptr++ =3D lasttwo & 0xff; = \ > - *statep =3D 0; = \ > + *statep &=3D 7; = \ > continue; = \ > } = \ > = \ > @@ -17996,7 +17996,7 @@ static struct > /* Check for possible combining character. */ = \ > if (__glibc_unlikely (ch =3D=3D 0xca || ch =3D=3D 0xea)) = \ > { = \ > - *statep =3D ((cp[0] << 8) | cp[1]) << 3; = \ > + *statep =3D (((cp[0] << 8) | cp[1]) << 3) | (*statep & = 7); \ > inptr +=3D 4; = \ > continue; = \ > } = \ > diff --git a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c = b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c > index 9601b6c1d9..e1472dc2e2 100644 > --- a/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c > +++ b/iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c > @@ -128,6 +128,71 @@ check_conversion (struct testdata test) > printf ("error: Result of third conversion was wrong.\n"); > err++; > } > + > + /* Now perform the same test as above consuming one byte at a time. = */ > + mbs =3D test.input; > + memset (&st, 0, sizeof (st)); > + > + /* Consume the first byte; expect an incomplete multibyte = character. */ > + ret =3D mbrtowc (&wc, mbs, 1, &st); > + if (ret !=3D -2) > + { > + printf ("error: First byte conversion returned %zd.\n", ret); > + err++; > + } > + /* Advance past the first consumed byte. */ > + mbs +=3D 1; > + /* Consume the second byte; expect the first wchar_t. */ > + ret =3D mbrtowc (&wc, mbs, 1, &st); > + if (ret !=3D 1) > + { > + printf ("error: Second byte conversion returned %zd.\n", ret); > + err++; > + } > + /* Advance past the second consumed byte. */ > + mbs +=3D 1; > + if (wc !=3D test.expected[0]) > + { > + printf ("error: Result of first wchar_t conversion was = wrong.\n"); > + err++; > + } > + /* Consume no bytes; expect the second wchar_t. */ > + ret =3D mbrtowc (&wc, mbs, 1, &st); > + if (ret !=3D 0) > + { > + printf ("error: First attempt of third byte conversion returned = %zd.\n", ret); > + err++; > + } > + /* Do not advance past the third byte. */ > + mbs +=3D 0; > + if (wc !=3D test.expected[1]) > + { > + printf ("error: Result of second wchar_t conversion was = wrong.\n"); > + err++; > + } > + /* After the second wchar_t conversion, the converter should be in > + the initial state since the two input BIG5-HKSCS bytes have been > + consumed and the two wchar_t's have been output. */ > + if (mbsinit (&st) =3D=3D 0) > + { > + printf ("error: Converter not in initial state.\n"); > + err++; > + } > + /* Consume the third byte; expect the third wchar_t. */ > + ret =3D mbrtowc (&wc, mbs, 1, &st); > + if (ret !=3D 1) > + { > + printf ("error: Third byte conversion returned %zd.\n", ret); > + err++; > + } > + /* Advance past the third consumed byte. */ > + mbs +=3D 1; > + if (wc !=3D test.expected[2]) > + { > + printf ("error: Result of third wchar_t conversion was = wrong.\n"); > + err++; > + } > + > /* Return 0 if we saw no errors. */ > return err; > } > --=20 > 2.32.0 >=20