From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 1AE7C38560B9 for ; Sat, 3 Sep 2022 10:50:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1AE7C38560B9 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662202234; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:resent-to:resent-from:resent-message-id: in-reply-to:in-reply-to:references:references; bh=iLqGRPqQ0dtAyT6VGtvhsv4axqCFy/4KiWqEjctpv5M=; b=LjSNv/uf6auGk7TzzwBLoz4rgqQ0gIXEBxgalpFaNwCjT4tw9O+9W0kzc/Iieh0I3WknFq xnfMMEl62LTo/fpEOoTUxX+MRXmK4HX/DWwLhC7+48YH3we9BUhnAU/l6YIf+y/gMQX+wX RiLN9HTh/xA8cbih4zlJgvbliuQK5eg= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-665-xeSKEKsvNsalHCA-utrvQg-1; Sat, 03 Sep 2022 06:50:33 -0400 X-MC-Unique: xeSKEKsvNsalHCA-utrvQg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 20F8029AA3BD; Sat, 3 Sep 2022 10:50:33 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B73BA1121314; Sat, 3 Sep 2022 10:50:32 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 283AoUHo654473 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Sat, 3 Sep 2022 12:50:30 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 283AoTTk654472; Sat, 3 Sep 2022 12:50:29 +0200 Resent-From: Jakub Jelinek Resent-Date: Sat, 3 Sep 2022 12:50:29 +0200 Resent-Message-ID: Resent-To: Jason Merrill , Joseph Myers , gcc-patches@gcc.gnu.org Date: Sat, 3 Sep 2022 12:29:52 +0200 From: Jakub Jelinek To: Jason Merrill Cc: Joseph Myers , gcc-patches@gcc.gnu.org Subject: [PATCH] libcpp, v3: Named universal character escapes and delimited escape sequence tweaks Message-ID: Reply-To: Jakub Jelinek References: <5da578e7-9c43-99ea-15c1-aefc641a0654@redhat.com> <37250e6c-80f9-2b93-a381-c1c9b869c04d@redhat.com> MIME-Version: 1.0 In-Reply-To: <37250e6c-80f9-2b93-a381-c1c9b869c04d@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Sep 01, 2022 at 03:00:28PM -0400, Jason Merrill wrote: > We might as well use the same flag name, and document it to mean what it > currently means for GCC. Ok, following patch introduces -Wunicode (on by default). > It looks like this is handling \N{abc}, for which "incomplete" seems like > the wrong description; it's complete, just wrong, and the diagnostic doesn't > help correct it. And also will emit the is not a valid universal character with did you mean if it matches loosely, otherwise will use the not terminated with } after ... wording. Ok if it passes bootstrap/regtest? 2022-09-03 Jakub Jelinek libcpp/ * include/cpplib.h (struct cpp_options): Add cpp_warn_unicode member. (enum cpp_warning_reason): Add CPP_W_UNICODE. * init.cc (cpp_create_reader): Initialize cpp_warn_unicode. * charset.cc (_cpp_valid_ucn): In possible identifier contexts, don't handle \u{ or \N{ specially in -std=c* modes except -std=c++2{3,b}. In possible identifier contexts, don't emit an error and punt if \N isn't followed by {, or if \N{} surrounds some lower case letters or _. In possible identifier contexts when not C++23, don't emit an error but warning about unknown character names and treat as separate tokens. When treating as separate tokens \u{ or \N{, emit warnings. gcc/ * doc/invoke.texi (-Wno-unicode): Document. gcc/c-family/ * c.opt (Winvalid-utf8): Use ObjC instead of objC. Remove " in comments" from description. (Wunicode): New option. gcc/testsuite/ * c-c++-common/cpp/delimited-escape-seq-4.c: New test. * c-c++-common/cpp/delimited-escape-seq-5.c: New test. * c-c++-common/cpp/delimited-escape-seq-6.c: New test. * c-c++-common/cpp/delimited-escape-seq-7.c: New test. * c-c++-common/cpp/named-universal-char-escape-5.c: New test. * c-c++-common/cpp/named-universal-char-escape-6.c: New test. * c-c++-common/cpp/named-universal-char-escape-7.c: New test. * g++.dg/cpp23/named-universal-char-escape1.C: New test. * g++.dg/cpp23/named-universal-char-escape2.C: New test. --- libcpp/include/cpplib.h.jj 2022-09-03 09:35:41.465984642 +0200 +++ libcpp/include/cpplib.h 2022-09-03 11:30:57.250677870 +0200 @@ -565,6 +565,10 @@ struct cpp_options 2 if it should be a pedwarn. */ unsigned char cpp_warn_invalid_utf8; + /* True if libcpp should warn about invalid forms of delimited or named + escape sequences. */ + bool cpp_warn_unicode; + /* True if -finput-charset= option has been used explicitly. */ bool cpp_input_charset_explicit; @@ -675,7 +679,8 @@ enum cpp_warning_reason { CPP_W_CXX20_COMPAT, CPP_W_EXPANSION_TO_DEFINED, CPP_W_BIDIRECTIONAL, - CPP_W_INVALID_UTF8 + CPP_W_INVALID_UTF8, + CPP_W_UNICODE }; /* Callback for header lookup for HEADER, which is the name of a --- libcpp/init.cc.jj 2022-09-01 09:47:23.729892618 +0200 +++ libcpp/init.cc 2022-09-03 11:19:10.954452329 +0200 @@ -228,6 +228,7 @@ cpp_create_reader (enum c_lang lang, cpp CPP_OPTION (pfile, warn_date_time) = 0; CPP_OPTION (pfile, cpp_warn_bidirectional) = bidirectional_unpaired; CPP_OPTION (pfile, cpp_warn_invalid_utf8) = 0; + CPP_OPTION (pfile, cpp_warn_unicode) = 1; CPP_OPTION (pfile, cpp_input_charset_explicit) = 0; /* Default CPP arithmetic to something sensible for the host for the --- libcpp/charset.cc.jj 2022-09-01 14:19:47.462235851 +0200 +++ libcpp/charset.cc 2022-09-03 11:26:14.858585905 +0200 @@ -1448,7 +1448,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const if (str[-1] == 'u') { length = 4; - if (str < limit && *str == '{') + if (str < limit + && *str == '{' + && (!identifier_pos + || CPP_OPTION (pfile, delimited_escape_seqs) + || !CPP_OPTION (pfile, std))) { str++; /* Magic value to indicate no digits seen. */ @@ -1462,8 +1466,22 @@ _cpp_valid_ucn (cpp_reader *pfile, const else if (str[-1] == 'N') { length = 4; + if (identifier_pos + && !CPP_OPTION (pfile, delimited_escape_seqs) + && CPP_OPTION (pfile, std)) + { + *cp = 0; + return false; + } if (str == limit || *str != '{') - cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'"); + { + if (identifier_pos) + { + *cp = 0; + return false; + } + cpp_error (pfile, CPP_DL_ERROR, "'\\N' not followed by '{'"); + } else { str++; @@ -1472,6 +1490,7 @@ _cpp_valid_ucn (cpp_reader *pfile, const length = 0; const uchar *name = str; bool strict = true; + const uchar *strict_end = name; do { @@ -1481,7 +1500,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const if (!ISIDNUM (c) && c != ' ' && c != '-') break; if (ISLOWER (c) || c == '_') - strict = false; + { + if (strict) + strict_end = str; + strict = false; + } str++; extend_char_range (char_range, loc_reader); } @@ -1489,8 +1512,35 @@ _cpp_valid_ucn (cpp_reader *pfile, const if (str < limit && *str == '}') { - if (name == str && identifier_pos) + if (identifier_pos && (name == str || !strict)) { + if (name == str) + cpp_warning (pfile, CPP_W_UNICODE, + "empty named universal character escape " + "sequence; treating it as separate tokens"); + else + { + char canon_name[uname2c_max_name_len + 1]; + result = _cpp_uname2c_uax44_lm2 ((const char *) name, + str - name, canon_name); + if (result == (cppchar_t) -1) + cpp_warning (pfile, CPP_W_UNICODE, + "'\\N{' not terminated with '}' after " + "%.*s; treating it as separate tokens", + (int) (strict_end - base), base); + else + { + bool ret + = cpp_warning (pfile, CPP_W_UNICODE, + "\\N{%.*s} is not a valid " + "universal character; treating it " + "as separate tokens", + (int) (str - name), name); + if (ret) + cpp_error (pfile, CPP_DL_NOTE, + "did you mean \\N{%s}?", canon_name); + } + } *cp = 0; return false; } @@ -1515,27 +1565,49 @@ _cpp_valid_ucn (cpp_reader *pfile, const uname2c_tree, NULL); if (result == (cppchar_t) -1) { - cpp_error (pfile, CPP_DL_ERROR, - "\\N{%.*s} is not a valid universal " - "character", (int) (str - name), name); + bool ret = true; + if (identifier_pos + && !CPP_OPTION (pfile, delimited_escape_seqs)) + ret = cpp_warning (pfile, CPP_W_UNICODE, + "\\N{%.*s} is not a valid " + "universal character; treating it " + "as separate tokens", + (int) (str - name), name); + else + cpp_error (pfile, CPP_DL_ERROR, + "\\N{%.*s} is not a valid universal " + "character", (int) (str - name), name); /* Try to do a loose name lookup according to Unicode loose matching rule UAX44-LM2. */ char canon_name[uname2c_max_name_len + 1]; result = _cpp_uname2c_uax44_lm2 ((const char *) name, str - name, canon_name); - if (result != (cppchar_t) -1) + if (result != (cppchar_t) -1 && ret) cpp_error (pfile, CPP_DL_NOTE, "did you mean \\N{%s}?", canon_name); else - result = 0x40; + result = 0xC0; + if (identifier_pos + && !CPP_OPTION (pfile, delimited_escape_seqs)) + { + *cp = 0; + return false; + } } } str++; extend_char_range (char_range, loc_reader); } else if (identifier_pos) - length = 1; + { + cpp_warning (pfile, CPP_W_UNICODE, + "'\\N{' not terminated with '}' after %.*s; " + "treating it as separate tokens", + (int) (str - base), base); + *cp = 0; + return false; + } else { cpp_error (pfile, CPP_DL_ERROR, @@ -1584,12 +1656,17 @@ _cpp_valid_ucn (cpp_reader *pfile, const } while (--length); - if (delimited - && str < limit - && *str == '}' - && (length != 32 || !identifier_pos)) + if (delimited && str < limit && *str == '}') { - if (length == 32) + if (length == 32 && identifier_pos) + { + cpp_warning (pfile, CPP_W_UNICODE, + "empty delimited escape sequence; " + "treating it as separate tokens"); + *cp = 0; + return false; + } + else if (length == 32) cpp_error (pfile, CPP_DL_ERROR, "empty delimited escape sequence"); else if (!CPP_OPTION (pfile, delimited_escape_seqs) @@ -1607,6 +1684,11 @@ _cpp_valid_ucn (cpp_reader *pfile, const error message in that case. */ if (length && identifier_pos) { + if (delimited) + cpp_warning (pfile, CPP_W_UNICODE, + "'\\u{' not terminated with '}' after %.*s; " + "treating it as separate tokens", + (int) (str - base), base); *cp = 0; return false; } --- gcc/doc/invoke.texi.jj 2022-09-03 09:35:40.966991672 +0200 +++ gcc/doc/invoke.texi 2022-09-03 11:39:03.875914845 +0200 @@ -365,7 +365,7 @@ Objective-C and Objective-C++ Dialects}. -Winfinite-recursion @gol -Winit-self -Winline -Wno-int-conversion -Wint-in-bool-context @gol -Wno-int-to-pointer-cast -Wno-invalid-memory-model @gol --Winvalid-pch -Winvalid-utf8 -Wjump-misses-init @gol +-Winvalid-pch -Winvalid-utf8 -Wno-unicode -Wjump-misses-init @gol -Wlarger-than=@var{byte-size} -Wlogical-not-parentheses -Wlogical-op @gol -Wlong-long -Wno-lto-type-mismatch -Wmain -Wmaybe-uninitialized @gol -Wmemset-elt-size -Wmemset-transposed-args @gol @@ -9577,6 +9577,12 @@ Warn if an invalid UTF-8 character is fo This warning is on by default for C++23 if @option{-finput-charset=UTF-8} is used and turned into error with @option{-pedantic-errors}. +@item -Wno-unicode +@opindex Wunicode +@opindex Wno-unicode +Don't diagnose invalid forms of delimited or named escape sequences which are +treated as separate tokens. @option{Wunicode} is enabled by default. + @item -Wlong-long @opindex Wlong-long @opindex Wno-long-long --- gcc/c-family/c.opt.jj 2022-09-03 09:35:40.206002393 +0200 +++ gcc/c-family/c.opt 2022-09-03 11:17:04.529201926 +0200 @@ -822,8 +822,8 @@ C ObjC C++ ObjC++ CPP(warn_invalid_pch) Warn about PCH files that are found but not used. Winvalid-utf8 -C objC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(warn_invalid_utf8) Init(0) Warning -Warn about invalid UTF-8 characters in comments. +C ObjC C++ ObjC++ CPP(cpp_warn_invalid_utf8) CppReason(CPP_W_INVALID_UTF8) Var(warn_invalid_utf8) Init(0) Warning +Warn about invalid UTF-8 characters. Wjump-misses-init C ObjC Var(warn_jump_misses_init) Warning LangEnabledby(C ObjC,Wc++-compat) @@ -1345,6 +1345,10 @@ Wundef C ObjC C++ ObjC++ CPP(warn_undef) CppReason(CPP_W_UNDEF) Var(cpp_warn_undef) Init(0) Warning Warn if an undefined macro is used in an #if directive. +Wunicode +C ObjC C++ ObjC++ CPP(cpp_warn_unicode) CppReason(CPP_W_UNICODE) Var(warn_unicode) Init(1) Warning +Warn about invalid forms of delimited or named escape sequences. + Wuninitialized C ObjC C++ ObjC++ LTO LangEnabledBy(C ObjC C++ ObjC++ LTO,Wall) ; --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c.jj 2022-09-03 11:13:37.570068845 +0200 +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-4.c 2022-09-03 11:56:52.818054420 +0200 @@ -0,0 +1,13 @@ +/* P2290R3 - Delimited escape sequences */ +/* { dg-do compile } */ +/* { dg-require-effective-target wchar } */ +/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */ +/* { dg-options "-std=gnu++20" { target c++ } } */ + +#define z(x) 0 +#define a z( +int b = a\u{}); /* { dg-warning "empty delimited escape sequence; treating it as separate tokens" } */ +int c = a\u{); /* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */ +int d = a\u{12XYZ}); /* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */ +int e = a\u123); +int f = a\U1234567); --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c.jj 2022-09-03 11:13:37.570068845 +0200 +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-5.c 2022-09-03 12:01:35.618124647 +0200 @@ -0,0 +1,13 @@ +/* P2290R3 - Delimited escape sequences */ +/* { dg-do compile } */ +/* { dg-require-effective-target wchar } */ +/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */ +/* { dg-options "-std=c++23" { target c++ } } */ + +#define z(x) 0 +#define a z( +int b = a\u{}); /* { dg-warning "empty delimited escape sequence; treating it as separate tokens" "" { target c++23 } } */ +int c = a\u{); /* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" "" { target c++23 } } */ +int d = a\u{12XYZ}); /* { dg-warning "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" "" { target c++23 } } */ +int e = a\u123); +int f = a\U1234567); --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-6.c.jj 2022-09-03 11:59:36.573778876 +0200 +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-6.c 2022-09-03 11:59:55.808511591 +0200 @@ -0,0 +1,13 @@ +/* P2290R3 - Delimited escape sequences */ +/* { dg-do compile } */ +/* { dg-require-effective-target wchar } */ +/* { dg-options "-std=gnu99 -Wno-c++-compat -Wno-unicode" { target c } } */ +/* { dg-options "-std=gnu++20 -Wno-unicode" { target c++ } } */ + +#define z(x) 0 +#define a z( +int b = a\u{}); /* { dg-bogus "empty delimited escape sequence; treating it as separate tokens" } */ +int c = a\u{); /* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */ +int d = a\u{12XYZ}); /* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */ +int e = a\u123); +int f = a\U1234567); --- gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-7.c.jj 2022-09-03 12:01:48.958939255 +0200 +++ gcc/testsuite/c-c++-common/cpp/delimited-escape-seq-7.c 2022-09-03 12:02:16.765552854 +0200 @@ -0,0 +1,13 @@ +/* P2290R3 - Delimited escape sequences */ +/* { dg-do compile } */ +/* { dg-require-effective-target wchar } */ +/* { dg-options "-std=c17 -Wno-c++-compat -Wno-unicode" { target c } } */ +/* { dg-options "-std=c++23 -Wno-unicode" { target c++ } } */ + +#define z(x) 0 +#define a z( +int b = a\u{}); /* { dg-bogus "empty delimited escape sequence; treating it as separate tokens" } */ +int c = a\u{); /* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{; treating it as separate tokens" } */ +int d = a\u{12XYZ}); /* { dg-bogus "'\\\\u\\\{' not terminated with '\\\}' after \\\\u\\\{12; treating it as separate tokens" } */ +int e = a\u123); +int f = a\U1234567); --- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c.jj 2022-09-03 11:13:37.570068845 +0200 +++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-5.c 2022-09-03 12:12:29.596042747 +0200 @@ -0,0 +1,17 @@ +/* P2071R2 - Named universal character escapes */ +/* { dg-do compile } */ +/* { dg-require-effective-target wchar } */ +/* { dg-options "-std=gnu99 -Wno-c++-compat" { target c } } */ +/* { dg-options "-std=gnu++20" { target c++ } } */ + +#define z(x) 0 +#define a z( +int b = a\N{}); /* { dg-warning "empty named universal character escape sequence; treating it as separate tokens" } */ +int c = a\N{); /* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */ +int d = a\N); +int e = a\NARG); +int f = a\N{abc}); /* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */ +int g = a\N{ABC.123}); /* { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" } */ +int h = a\N{NON-EXISTENT CHAR}); /* { dg-warning "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" } */ +int i = a\N{Latin_Small_Letter_A_With_Acute}); /* { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" } */ + /* { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 } */ --- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c.jj 2022-09-03 11:13:37.570068845 +0200 +++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-6.c 2022-09-03 11:44:34.558316155 +0200 @@ -0,0 +1,17 @@ +/* P2071R2 - Named universal character escapes */ +/* { dg-do compile } */ +/* { dg-require-effective-target wchar } */ +/* { dg-options "-std=c17 -Wno-c++-compat" { target c } } */ +/* { dg-options "-std=c++20" { target c++ } } */ + +#define z(x) 0 +#define a z( +int b = a\N{}); +int c = a\N{); +int d = a\N); +int e = a\NARG); +int f = a\N{abc}); +int g = a\N{ABC.123}); +int h = a\N{NON-EXISTENT CHAR}); /* { dg-bogus "is not a valid universal character" } */ +int i = a\N{Latin_Small_Letter_A_With_Acute}); +int j = a\N{LATIN SMALL LETTER A WITH ACUTE}); --- gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-7.c.jj 2022-09-03 12:18:31.296022384 +0200 +++ gcc/testsuite/c-c++-common/cpp/named-universal-char-escape-7.c 2022-09-03 12:19:00.956610699 +0200 @@ -0,0 +1,17 @@ +/* P2071R2 - Named universal character escapes */ +/* { dg-do compile } */ +/* { dg-require-effective-target wchar } */ +/* { dg-options "-std=gnu99 -Wno-c++-compat -Wno-unicode" { target c } } */ +/* { dg-options "-std=gnu++20 -Wno-unicode" { target c++ } } */ + +#define z(x) 0 +#define a z( +int b = a\N{}); /* { dg-bogus "empty named universal character escape sequence; treating it as separate tokens" } */ +int c = a\N{); /* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */ +int d = a\N); +int e = a\NARG); +int f = a\N{abc}); /* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } */ +int g = a\N{ABC.123}); /* { dg-bogus "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" } */ +int h = a\N{NON-EXISTENT CHAR}); /* { dg-bogus "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" } */ +int i = a\N{Latin_Small_Letter_A_With_Acute}); /* { dg-bogus "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" } */ + /* { dg-bogus "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 } */ --- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C.jj 2022-09-03 11:13:37.571068831 +0200 +++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape1.C 2022-09-03 12:16:49.010442096 +0200 @@ -0,0 +1,16 @@ +// P2071R2 - Named universal character escapes +// { dg-do compile } +// { dg-require-effective-target wchar } + +#define z(x) 0 +#define a z( +int b = a\N{}); // { dg-warning "empty named universal character escape sequence; treating it as separate tokens" "" { target c++23 } } +int c = a\N{); // { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" "" { target c++23 } } +int d = a\N); +int e = a\NARG); +int f = a\N{abc}); // { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" "" { target c++23 } } +int g = a\N{ABC.123}); // { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" "" { target c++23 } } +int h = a\N{NON-EXISTENT CHAR}); // { dg-error "is not a valid universal character" "" { target c++23 } } + // { dg-error "was not declared in this scope" "" { target c++23 } .-1 } +int i = a\N{Latin_Small_Letter_A_With_Acute}); // { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" "" { target c++23 } } + // { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target c++23 } .-1 } --- gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C.jj 2022-09-03 11:13:37.571068831 +0200 +++ gcc/testsuite/g++.dg/cpp23/named-universal-char-escape2.C 2022-09-03 12:18:03.567407252 +0200 @@ -0,0 +1,18 @@ +// P2071R2 - Named universal character escapes +// { dg-do compile } +// { dg-require-effective-target wchar } +// { dg-options "" } + +#define z(x) 0 +#define a z( +int b = a\N{}); // { dg-warning "empty named universal character escape sequence; treating it as separate tokens" } +int c = a\N{); // { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } +int d = a\N); +int e = a\NARG); +int f = a\N{abc}); // { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{; treating it as separate tokens" } +int g = a\N{ABC.123}); // { dg-warning "'\\\\N\\\{' not terminated with '\\\}' after \\\\N\\\{ABC; treating it as separate tokens" } +int h = a\N{NON-EXISTENT CHAR}); // { dg-error "is not a valid universal character" "" { target c++23 } } + // { dg-error "was not declared in this scope" "" { target c++23 } .-1 } + // { dg-warning "\\\\N\\\{NON-EXISTENT CHAR\\\} is not a valid universal character; treating it as separate tokens" "" { target c++20_down } .-2 } +int i = a\N{Latin_Small_Letter_A_With_Acute}); // { dg-warning "\\\\N\\\{Latin_Small_Letter_A_With_Acute\\\} is not a valid universal character; treating it as separate tokens" } + // { dg-message "did you mean \\\\N\\\{LATIN SMALL LETTER A WITH ACUTE\\\}\\?" "" { target *-*-* } .-1 } Jakub