From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 92323382DB25 for ; Thu, 1 Sep 2022 11:15:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 92323382DB25 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662030902; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=HPDKJ3ISOdqlWGmTYOM8w9fN6FX/TwbrV/fiQvy3sSc=; b=XBKQCU5OnN0KCirDszIpWad8DZ4RZvyzbTgrGzojZEQmu/C1OssFL9RMH5Zs4hbOvEwNZz M8TESussfOHTwmPlrma1MW7k+rhkqSBzr2LhK4TP0DFO8Z/+I/HhhgYdnvOvwaSqgW190P Yz00z5LkMZF5lrU4OPdHOrXAmUfQFqU= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-448-HDUuj6S-MvCknbN3_WYTBA-1; Thu, 01 Sep 2022 07:15:00 -0400 X-MC-Unique: HDUuj6S-MvCknbN3_WYTBA-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id A50F0382C96D; Thu, 1 Sep 2022 11:15:00 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 31B4B401473; Thu, 1 Sep 2022 11:14:58 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 281BEuOP2098018 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 1 Sep 2022 13:14:56 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 281BEt0Y2098017; Thu, 1 Sep 2022 13:14:55 +0200 Date: Thu, 1 Sep 2022 13:14:54 +0200 From: Jakub Jelinek To: Jason Merrill Cc: Joseph Myers , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] c++, v2: Implement C++23 P2071R2 - Named universal character escapes [PR106648] Message-ID: Reply-To: Jakub Jelinek References: <4fcd7e74-6f1c-dbec-a42c-e4e3fd13470b@redhat.com> <5da578e7-9c43-99ea-15c1-aefc641a0654@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.85 on 10.11.54.10 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Wed, Aug 31, 2022 at 12:14:22PM -0400, Jason Merrill wrote: > On 8/31/22 11:07, Jakub Jelinek wrote: > > On Wed, Aug 31, 2022 at 10:52:49AM -0400, Jason Merrill wrote: > > > It could be more explicit, but I think we can assume that from the existing > > > wording; it says it designates the named character. If there is no such > > > character, that cannot be satisfied, so it must be ill-formed. > > > > Ok. > > > > > > So, we could reject the int h case above and accept silently the others? > > > > > > Why not warn on the others? > > > > We were always silent for the cases like \u123X or \U12345X. > > Do you think we should emit some warnings (but never pedwarns/errors in that > > case) that it is universal character name like but not completely? > > I think that would be helpful, at least for \u{ and \N{. Ok. > > Given what you said above, I think that is what we want for the last 2 > > for C++23, the question is if it is ok also for C++20/C17 etc. and whether > > it should depend on -pedantic or -pedantic-errors or GNU vs. ISO mode > > or not in that case. We could handle those 2 also differently, just > > warn instead of error for the \N{ABC} case if not in C++23 mode when > > identifier_pos. > > That sounds right. > > > Here is an incremental version of the patch which will make valid > > \u{123} and \N{LATIN SMALL LETTER A WITH ACUTE} an extension in GNU > > modes before C++23 and split it as separate tokens in ISO modes. > > Looks good. Here is a patch which implements that. I just wonder if we shouldn't have some warning option that would cover these warnings, currently one needs to use -w to disable those warnings. Apparently clang uses -Wunicode option to cover these, but unfortunately they don't bother to document it (nor almost any other warning option), so it is unclear what else exactly it covers. Plus a question is how we should document that option for GCC... 2022-09-01 Jakub Jelinek * charset.cc (_cpp_valid_ucn): In possible identifier contexts, don't handle \u{ or \N{ specially in -std=c* modes except -std=c++2{3,b}. In possible identifier contexts, don't emit an error and punt if \N isn't followed by {, or if \N{} surrounds some lower case letters or _. In possible identifier contexts when not C++23, don't emit an error but warning about unknown character names and treat as separate tokens. When treating as separate tokens \u{ or \N{, emit warnings. * c-c++-common/cpp/delimited-escape-seq-4.c: New test. * c-c++-common/cpp/delimited-escape-seq-5.c: New test. * c-c++-common/cpp/named-universal-char-escape-5.c: New test. * c-c++-common/cpp/named-universal-char-escape-6.c: New test. * g++.dg/cpp23/named-universal-char-escape1.C: New test. * g++.dg/cpp23/named-universal-char-escape2.C: New test. --- libcpp/charset.cc.jj 2022-09-01 09:47:24.146886929 +0200 +++ libcpp/charset.cc 2022-09-01 12:52:28.424034208 +0200 @@ -1448,7 +1448,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const if (str[-1] == 'u') { length = 4; - if (str < limit && *str == '{') + if (str < limit + && *str == '{' + && (!identifier_pos + || CPP_OPTION (pfile, delimited_escape_seqs) + || !CPP_OPTION (pfile, std))) { str++; /* Magic value to indicate no digits seen. */ @@ -1462,8 +1466,22 @@ _cpp_valid_ucn (cpp_reader *pfile, const else if (str[-1] == 'N') { length = 4; + if (identifier_pos + && !CPP_OPTION (pfile, delimited_escape_seqs) + && CPP_OPTION (pfile, std)) + { + *cp = 0; + return false; + } if (str == limit || *str != '{') - cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'"); + { + if (identifier_pos) + { + *cp = 0; + return false; + } + cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'"); + } else { str++; @@ -1489,8 +1507,16 @@ _cpp_valid_ucn (cpp_reader *pfile, const if (str < limit && *str == '}') { - if (name == str && identifier_pos) + if (identifier_pos && (name == str || !strict)) { + if (name == str) + cpp_warning (pfile, CPP_W_NONE, + "empty named universal character escape " + "sequence; treating it as separate tokens"); + else + cpp_warning (pfile, CPP_W_NONE, + "incomplete named universal character escape " + "sequence; treating it as separate tokens"); *cp = 0; return false; } @@ -1515,27 +1541,48 @@ _cpp_valid_ucn (cpp_reader *pfile, const uname2c_tree, NULL); if (result == (cppchar_t) -1) { - cpp_error (pfile, CPP_DL_ERROR, - "\\N{%.*s} is not a valid universal " - "character", (int) (str - name), name); + bool ret = true; + if (identifier_pos + && !CPP_OPTION (pfile, delimited_escape_seqs)) + ret = cpp_warning (pfile, CPP_W_NONE, + "\\N{%.*s} is not a valid " + "universal character; treating it " + "as separate tokens", + (int) (str - name), name); + else + cpp_error (pfile, CPP_DL_ERROR, + "\\N{%.*s} is not a valid universal " + "character", (int) (str - name), name); /* Try to do a loose name lookup according to Unicode loose matching rule UAX44-LM2. */ char canon_name[uname2c_max_name_len + 1]; result = _cpp_uname2c_uax44_lm2 ((const char *) name, str - name, canon_name); - if (result != (cppchar_t) -1) + if (result != (cppchar_t) -1 && ret) cpp_error (pfile, CPP_DL_NOTE, "did you mean \\N{%s}?", canon_name); else - result = 0x40; + result = 0xC0; + if (identifier_pos + && !CPP_OPTION (pfile, delimited_escape_seqs)) + { + *cp = 0; + return false; + } } } str++; extend_char_range (char_range, loc_reader); } else if (identifier_pos) - length = 1; + { + cpp_warning (pfile, CPP_W_NONE, + "incomplete named universal character escape " + "sequence; treating it as separate tokens"); + *cp = 0; + return false; + } else { cpp_error (pfile, CPP_DL_ERROR, @@ -1584,12 +1631,17 @@ _cpp_valid_ucn (cpp_reader *pfile, const } while (--length); - if (delimited - && str < limit - && *str == '}' - && (length != 32 || !identifier_pos)) + if (delimited && str < limit && *str == '}') { - if (length == 32) + if (length == 32 && identifier_pos) + { + cpp_warning (pfile, CPP_W_NONE, + "empty delimited escape sequence; " + "treating it as separate tokens"); + *cp = 0; + return false; + } + else if (length == 32) cpp_error (pfile, CPP_DL_ERROR, "empty delimited escape sequence"); else if (!CPP_OPTION (pfile, delimited_escape_seqs) @@ -1607,6 +1659,10 @@ _cpp_valid_ucn (cpp_reader *pfile, const error message in that case. */ if (length && identifier_pos) { + if (delimited) + cpp_warning (pfile, CPP_W_NONE, + "incomplete delimited escape sequence; " + "treating it as separate tokens"); *cp = 0; return false; } --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c.jj 2022-09-01 11:46:26.951102634 +0200 +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c 2022-09-01 12:50:17.351810090 +0200 @@ -0,0 +1,13 @@ +/* P2290R3 - Delimited escape sequences */ +/* { dg-do compile } */ +/* { dg-require-effective-target wchar } */ +/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */ +/* { dg-options "-std=gnu++20" { target c++ } } */ + +#define z(x) 0 +#define a z( +int b = a\u{}); /* { dg-warning "empty delimited escape sequence; treating it as separate tokens" } */ +int c = a\u{); /* { dg-warning "incomplete delimited escape sequence; treating it as separate tokens" } */ +int d = a\u{12XYZ}); /* { dg-warning "incomplete delimited escape sequence; treating it as separate tokens" } */ +int e = a\u123); +int f = a\U1234567); --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c.jj 2022-09-01 11:46:26.951102634 +0200 +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c 2022-09-01 12:57:50.830665924 +0200 @@ -0,0 +1,13 @@ +/* P2290R3 - Delimited escape sequences */ +/* { dg-do compile } */ +/* { dg-require-effective-target wchar } */ +/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */ +/* { dg-options "-std=c++23" { target c++ } } */ + +#define z(x) 0 +#define a z( +int b = a\u{}); /* { dg-warning "empty delimited escape sequence; treating it as separate tokens" "" { target c++23 } } */ +int c = a\u{); /* { dg-warning "incomplete delimited escape sequence; treating it as separate tokens" "" { target c++23 } } */ +int d = a\u{12XYZ}); /* { dg-warning "incomplete delimited escape sequence; treating it as separate tokens" "" { target c++23 } } */ +int e = a\u123); +int f = a\U1234567); --- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c.jj 2022-09-01 11:46:26.951102634 +0200 +++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c 2022-09-01 12:48:01.753647302 +0200 @@ -0,0 +1,15 @@ +/* P2071R2 - Named universal character escapes */ +/* { dg-do compile } */ +/* { dg-require-effective-target wchar } */ +/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */ +/* { dg-options "-std=gnu++20" { target c++ } } */ + +#define z(x) 0 +#define a z( +int b = a\N{}); /* { dg-warning "empty named universal character escape sequence; treating it as separate tokens" } */ +int c = a\N{); /* { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" } */ +int d = a\N); +int e = a\NARG); +int f = a\N{abc}); /* { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" } */ +int g = a\N{ABC.123}); /* { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" } */ +int h = a\N{NON-EXISTENT CHAR}); /* { dg-warning "is not a valid universal character" } */ --- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c.jj 2022-09-01 11:46:26.951102634 +0200 +++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c 2022-09-01 11:46:26.951102634 +0200 @@ -0,0 +1,16 @@ +/* P2071R2 - Named universal character escapes */ +/* { dg-do compile } */ +/* { dg-require-effective-target wchar } */ +/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */ +/* { dg-options "-std=c++20" { target c++ } } */ + +#define z(x) 0 +#define a z( +int b = a\N{}); +int c = a\N{); +int d = a\N); +int e = a\NARG); +int f = a\N{abc}); +int g = a\N{ABC.123}); +int h = a\N{NON-EXISTENT CHAR}); /* { dg-bogus "is not a valid universal character" } */ +int i = a\N{LATIN SMALL CHARACTER A WITH ACUTE}); --- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C.jj 2022-09-01 11:46:26.951102634 +0200 +++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C 2022-09-01 12:56:48.031516792 +0200 @@ -0,0 +1,14 @@ +// P2071R2 - Named universal character escapes +// { dg-do compile } +// { dg-require-effective-target wchar } + +#define z(x) 0 +#define a z( +int b = a\N{}); // { dg-warning "empty named universal character escape sequence; treating it as separate tokens" "" { target c++23 } } +int c = a\N{); // { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" "" { target c++23 } } +int d = a\N); +int e = a\NARG); +int f = a\N{abc}); // { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" "" { target c++23 } } +int g = a\N{ABC.123}); // { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" "" { target c++23 } } +int h = a\N{NON-EXISTENT CHAR}); // { dg-error "is not a valid universal character" "" { target c++23 } } + // { dg-error "was not declared in this scope" "" { target c++23 } .-1 } --- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C.jj 2022-09-01 12:54:47.436150733 +0200 +++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C 2022-09-01 12:59:22.512428644 +0200 @@ -0,0 +1,16 @@ +// P2071R2 - Named universal character escapes +// { dg-do compile } +// { dg-require-effective-target wchar } +// { dg-options "" } + +#define z(x) 0 +#define a z( +int b = a\N{}); // { dg-warning "empty named universal character escape sequence; treating it as separate tokens" } +int c = a\N{); // { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" } +int d = a\N); +int e = a\NARG); +int f = a\N{abc}); // { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" } +int g = a\N{ABC.123}); // { dg-warning "incomplete named universal character escape sequence; treating it as separate tokens" } +int h = a\N{NON-EXISTENT CHAR}); // { dg-error "is not a valid universal character" "" { target c++23 } } + // { dg-error "was not declared in this scope" "" { target c++23 } .-1 } + // { dg-warning "is not a valid universal character; treating it as separate tokens" "" { target c++20_down } .-2 } Jakub