From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTP id 59BF63858C60 for ; Thu, 7 Oct 2021 13:23:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 59BF63858C60 Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-53-cV1QGZ5CMoa6pYjxSwaItw-1; Thu, 07 Oct 2021 09:23:45 -0400 X-MC-Unique: cV1QGZ5CMoa6pYjxSwaItw-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B3B32EC1A0; Thu, 7 Oct 2021 13:23:43 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.193.109]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 2219F5C1CF; Thu, 7 Oct 2021 13:23:42 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.16.1/8.16.1) with ESMTPS id 197DNd3F3940494 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 7 Oct 2021 15:23:40 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.16.1/8.16.1/Submit) id 197DNcrw3940493; Thu, 7 Oct 2021 15:23:38 +0200 Date: Thu, 7 Oct 2021 15:23:38 +0200 From: Jakub Jelinek To: Jason Merrill , Andreas Krebbel Cc: "Joseph S. Myers" , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] c++: Add testcase for C++23 P2316R2 - consistent character literal encoding [PR102615] Message-ID: <20211007132338.GV304296@tucnak> Reply-To: Jakub Jelinek References: <20211007130049.GT304296@tucnak> <364eac8d-92d1-eadf-ad8e-565712f463fe@redhat.com> MIME-Version: 1.0 In-Reply-To: <364eac8d-92d1-eadf-ad8e-565712f463fe@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Oct 2021 13:23:47 -0000 On Thu, Oct 07, 2021 at 09:12:15AM -0400, Jason Merrill wrote: > > And another thing, if HOST_CHARSET == HOST_CHARSET_EBCDIC, how does the libcpp/lex.c > > static const cppchar_t utf8_signifier = 0xC0; > > ... > > if (*buffer->cur >= utf8_signifier) > > { > > if (_cpp_valid_utf8 (pfile, &buffer->cur, buffer->rlimit, 1 + !first, > > state, &s)) > > return true; > > } > > work? Because in UTF-EBCDIC, >= 0xC0 isn't the right test for start of > > multi-byte character, it is more complicated and seems _cpp_valid_utf8 > > assumes UTF-8 as the host charset. > > Are there any supported platforms that use UTF-EBCDIC? I have no idea. From the libcpp/charset.c code, seems there is no built-in conversion for UTF-EBCDIC, the only internally supported conversions are { "UTF-8/UTF-32LE", convert_utf8_utf32, (iconv_t)0 }, { "UTF-8/UTF-32BE", convert_utf8_utf32, (iconv_t)1 }, { "UTF-8/UTF-16LE", convert_utf8_utf16, (iconv_t)0 }, { "UTF-8/UTF-16BE", convert_utf8_utf16, (iconv_t)1 }, { "UTF-32LE/UTF-8", convert_utf32_utf8, (iconv_t)0 }, { "UTF-32BE/UTF-8", convert_utf32_utf8, (iconv_t)1 }, { "UTF-16LE/UTF-8", convert_utf16_utf8, (iconv_t)0 }, { "UTF-16BE/UTF-8", convert_utf16_utf8, (iconv_t)1 }, and identity, so unless the C library iconv supports conversion to UTF-EBCDIC, the only case that could be supported is when -finput-charset= is also UTF-EBCDIC. E.g. glibc iconv doesn't support that. Never used z/VM nor OS/390 which I think are the only possible hosts that could have UTF-EBCDIC. CCing Andreas if he knows more... > > --- gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C.jj 2021-10-07 14:34:35.182132411 +0200 > > +++ gcc/testsuite/g++.dg/cpp23/charlit-encoding1.C 2021-10-07 14:34:02.902583774 +0200 > > @@ -0,0 +1,33 @@ > > +// PR c++/102615 - P2316R2 - Consistent character literal encoding > > +// { dg-do compile } > > Doesn't this need to run? OK with that change. Thanks for catching that, fixed, retested and committed. Jakub