From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 681FD3854170 for ; Mon, 4 Jul 2022 19:54:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 681FD3854170 Received: from mail-il1-f197.google.com (mail-il1-f197.google.com [209.85.166.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-235-quRObzUmMQWW2H8KhAnPdw-1; Mon, 04 Jul 2022 15:54:16 -0400 X-MC-Unique: quRObzUmMQWW2H8KhAnPdw-1 Received: by mail-il1-f197.google.com with SMTP id k8-20020a056e02156800b002d91998aef7so4611969ilu.0 for ; Mon, 04 Jul 2022 12:54:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:organization:in-reply-to :content-transfer-encoding; bh=12bZS0vrrUN9bMO2OAxwsX3b40O/7ln2c6/tOEc8WoY=; b=nfI0DFG2Xt2mQzoYLZv4DOP/L4uT09lhc6vdmjH1jgtUHEor0VXbOMY2T0tXkojY86 QQfTRfTlHDUH2GCa6PBM/kM3tALJCRPVLvePdPQyML5eQwvYznp1C5Nk55NBcCqcJ7JF TJ5nvhfLL9xyjx+OfKO4rgMnhmiz8oIEK5pSj3tfU2Ez0c9OZv/hEnv4fivZtidbCx+y ndfpYSApXKRAnEa+IQ60KXkHdxLTMIVBYWdpR89tZQm+BlkLrf64mlcPeZ1ExlIGgmrk YRmGd+udyarTmntzxiXeuRI026H6IiHLoakR1h2oWoLJznwFwvQkXMaS8mPPRWEBERHc tj4A== X-Gm-Message-State: AJIora8vbETFxALXVDbWEskb7+6Q/KJ+rGzY2cgK8KU1O5tClPUjBn0S kg5pcb2vb9pZ4H1gsOi7hq2Cz6Cd3/ghmo5BAyNvT59c0F1xQB/9W0hKlVsJD1tdzRJS9JOwd6w 8qDkqSbfI9yVCzkBG8B45 X-Received: by 2002:a05:6602:3313:b0:675:5d31:56e3 with SMTP id b19-20020a056602331300b006755d3156e3mr16375845ioz.102.1656964455341; Mon, 04 Jul 2022 12:54:15 -0700 (PDT) X-Google-Smtp-Source: AGRyM1voI96lO151gs+OFc/xXan50thUMQrsKQSS2UuGg/isA521Jw8qFnRCXfOvhvhq6JT4+BVZwQ== X-Received: by 2002:a05:6602:3313:b0:675:5d31:56e3 with SMTP id b19-20020a056602331300b006755d3156e3mr16375842ioz.102.1656964455176; Mon, 04 Jul 2022 12:54:15 -0700 (PDT) Received: from [192.168.0.241] (135-23-175-80.cpe.pppoe.ca. [135.23.175.80]) by smtp.gmail.com with ESMTPSA id y7-20020a5d94c7000000b006752c67c7c3sm12032021ior.19.2022.07.04.12.54.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 04 Jul 2022 12:54:14 -0700 (PDT) Message-ID: <5c1393d8-cc69-e77b-a674-c00460cbeece@redhat.com> Date: Mon, 4 Jul 2022 15:54:13 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [PATCH 2/5] locale: Fix signed char bug in lr_getc To: Florian Weimer , libc-alpha@sourceware.org References: <619cade7e73dc33184bf4247b739d54cd9d7d8b3.1652994079.git.fweimer@redhat.com> From: Carlos O'Donell Organization: Red Hat In-Reply-To: <619cade7e73dc33184bf4247b739d54cd9d7d8b3.1652994079.git.fweimer@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-16.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Jul 2022 19:54:21 -0000 On 5/19/22 17:06, Florian Weimer via Libc-alpha wrote: > The array lr->buf contains characters, which can be signed. A 0xff > byte in the input could be incorrectly reported as EOF. More > importantly, get_string in linereader.c converts a signed input byte > to a Unicode code point using ADDWC ((uint32_t) ch), under the > assumption that this decodes the ISO-8859-1 input encoding. If char > is signed, this does not give the correct result. This means that > ISO-8859-1 input files for localedef are not actually supported, > contrary to the comment in get_string. This is a happy accident because > we can therefore change the file encoding to UTF-8 without impacting > backwards compatibility. LGTM. Reviewed-by: Carlos O'Donell Tested-by: Carlos O'Donell > > While at it, remove the \32 check for MS-DOS end-of-file character (^Z). OK. We don't need this, files should have the correct EOF. > --- > locale/programs/linereader.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/locale/programs/linereader.h b/locale/programs/linereader.h > index 0fb10ec833..653a71d2d1 100644 > --- a/locale/programs/linereader.h > +++ b/locale/programs/linereader.h > @@ -134,7 +134,7 @@ lr_getc (struct linereader *lr) > return EOF; > } > > - return lr->buf[lr->idx] == '\32' ? EOF : lr->buf[lr->idx++]; > + return lr->buf[lr->idx++] & 0xff; OK. Agreed, this should not be sign extended. It's a byte in the buffer not EOF. With the original MS-DOS checking it might have been *needed* to return -1. > } > > -- Cheers, Carlos.