From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 943AD3857C75 for ; Thu, 4 Nov 2021 14:10:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 943AD3857C75 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-140-_0h2PRBSP-WCL7tapf9Teg-1; Thu, 04 Nov 2021 10:10:22 -0400 X-MC-Unique: _0h2PRBSP-WCL7tapf9Teg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3A85A1006AA4; Thu, 4 Nov 2021 14:10:21 +0000 (UTC) Received: from oldenburg.str.redhat.com (unknown [10.39.192.72]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5E3E960854; Thu, 4 Nov 2021 14:10:20 +0000 (UTC) From: Florian Weimer To: =?utf-8?B?0J3QuNC60LjRgtCwINCf0L7Qv9C+0LIgdmlhIExpYmMtYWxwaGE=?= Cc: =?utf-8?B?0J3QuNC60LjRgtCwINCf0L7Qv9C+0LI=?= Subject: Re: [PATCH] gconv: Do not emit spurious NUL character in ISO-2022-JP-3 (bug 28524) References: Date: Thu, 04 Nov 2021 15:10:18 +0100 In-Reply-To: (=?utf-8?B?ItCd0LjQutC40YLQsCDQn9C+0L/QvtCy?= via Libc-alpha"'s message of "Tue, 2 Nov 2021 14:06:58 +0500") Message-ID: <87ee7wf1v9.fsf@oldenburg.str.redhat.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, BODY_8BITS, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 04 Nov 2021 14:10:27 -0000 * =D0=9D=D0=B8=D0=BA=D0=B8=D1=82=D0=B0 =D0=9F=D0=BE=D0=BF=D0=BE=D0=B2 via L= ibc-alpha: > To eliminate this issue, these steps are taken: > * Restore original condition > '(data->__statep->__count & ~7) !=3D ASCII_set'. > It is necessary since bits 0-2 may contain > number of buffered input characters. Right, I missed that use of __count by the framework. 8-/ > * Check that queued character is not NUL. > Similar step is taken for main conversion loop. And since EMIT_SHIFT_TO_INIT runs before the loop, it's necessary to mirror that logic there. > diff --git a/iconvdata/bug-iconv15.c b/iconvdata/bug-iconv15.c > new file mode 100644 > index 0000000000..4037e131ff > + /* First call to iconv should alter internal state. > + Now, JISX0201_Kana_set is selected and > + state value !=3D ASCII_set */ Sorry, GNU style says that comments should end with =E2=80=9D. */=E2=80=9D= (period and two spaces). > + TEST_VERIFY (iconv (cd, &inbuf, &inleft, &outbuf, &outleft) !=3D (size= _t) -1); Can we add an additional test here? I think no bytes have been added to the output at this point, right? > + /* Second call shall emit spurious NUL character in unpatched glibc. *= / > + /* No characters are expected to be produced. */ Missing space after =E2=80=9D.=E2=80=9D (see above). > diff --git a/iconvdata/iso-2022-jp-3.c b/iconvdata/iso-2022-jp-3.c > index 70b28ace7f..5e66d686f1 100644 > --- a/iconvdata/iso-2022-jp-3.c > +++ b/iconvdata/iso-2022-jp-3.c > @@ -79,20 +79,31 @@ enum > the output state to the initial state. This has to be done during th= e > flushing. */ > #define EMIT_SHIFT_TO_INIT \ > - if (data->__statep->__count !=3D ASCII_set)=09=09=09 \ > + if ((data->__statep->__count & ~7) !=3D ASCII_set)=09=09=09 \ > {=09=09=09=09=09=09=09=09=09 \ > if (FROM_DIRECTION)=09=09=09=09=09=09 \ > =09{=09=09=09=09=09=09=09=09 \ > -=09 if (__glibc_likely (outbuf + 4 <=3D outend))=09=09=09 \ > +=09 uint32_t ch =3D data->__statep->__count >> 6;=09=09=09 \ > +=09=09=09=09=09=09=09=09=09 \ > +=09 if (__glibc_unlikely (ch !=3D 0))=09=09=09=09 \ > =09 {=09=09=09=09=09=09=09=09 \ > -=09 /* Write out the last character. */=09=09=09 \ > -=09 *((uint32_t *) outbuf) =3D data->__statep->__count >> 6;=09 = \ > -=09 outbuf +=3D sizeof (uint32_t);=09=09=09=09 \ > -=09 data->__statep->__count =3D ASCII_set;=09=09=09\ > +=09 if (__glibc_likely (outbuf + 4 <=3D outend))=09=09 \ > +=09=09{=09=09=09=09=09=09=09 \ > +=09=09 /* Write out the last character. */=09=09=09 \ > +=09=09 put32u (outbuf, ch);=09=09=09=09=09 \ > +=09=09 outbuf +=3D 4;=09=09=09=09=09=09 \ > +=09=09 data->__statep->__count &=3D 7;=09=09=09=09 \ > +=09=09 data->__statep->__count |=3D ASCII_set;=09=09=09 \ > +=09=09}=09=09=09=09=09=09=09 \ > +=09 else=09=09=09=09=09=09=09 \ > +=09=09/* We don't have enough room in the output buffer. */=09 \ > +=09=09status =3D __GCONV_FULL_OUTPUT;=09=09=09=09 \ > =09 }=09=09=09=09=09=09=09=09 \ > =09 else=09=09=09=09=09=09=09=09 \ > -=09 /* We don't have enough room in the output buffer. */=09 \ > -=09 status =3D __GCONV_FULL_OUTPUT;=09=09=09=09 \ > +=09 {=09=09=09=09=09=09=09=09 \ > +=09 data->__statep->__count &=3D 7;=09=09=09=09 \ > +=09 data->__statep->__count |=3D ASCII_set;=09=09=09 \ > +=09 }=09=09=09=09=09=09=09=09 \ > =09}=09=09=09=09=09=09=09=09 \ > else=09=09=09=09=09=09=09=09 \ > =09{=09=09=09=09=09=09=09=09 \ The actual code change looks okay to me. Thanks, Florian