From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id A14A83858C83 for ; Thu, 1 Sep 2022 20:23:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A14A83858C83 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1662063814; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references; bh=8YCqOwW18CmabHwaaF9rj24UO5cfGNUfUk+d2a4OcjM=; b=c7sOnVjpjdwoT+g3Vh3m8I9lKtrnthdPpZVfl2YBJTE1boh/p2u/NZ46J8eKCgatK4CUdI VKERl3v/W3JfvGDLrC4z3fS8+pqmyp2/hYg3DrEmgYpKKtJVoKxWhZ8VKvI6FnpbAe3fR/ sXJFYAA5bLpItX+bfi17xUcI0LX2ar0= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-248-y_Vyu5k8ODiRQB2VI_FqZQ-1; Thu, 01 Sep 2022 16:23:31 -0400 X-MC-Unique: y_Vyu5k8ODiRQB2VI_FqZQ-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id AE012801231; Thu, 1 Sep 2022 20:23:30 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5A1F9492C3B; Thu, 1 Sep 2022 20:23:30 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 281KNRrx2597569 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Thu, 1 Sep 2022 22:23:28 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 281KNQ672597568; Thu, 1 Sep 2022 22:23:26 +0200 Date: Thu, 1 Sep 2022 22:23:26 +0200 From: Jakub Jelinek To: Jason Merrill Cc: Joseph Myers , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] c++, v2: Implement C++23 P2071R2 - Named universal character escapes [PR106648] Message-ID: Reply-To: Jakub Jelinek References: <5da578e7-9c43-99ea-15c1-aefc641a0654@redhat.com> <37250e6c-80f9-2b93-a381-c1c9b869c04d@redhat.com> MIME-Version: 1.0 In-Reply-To: <37250e6c-80f9-2b93-a381-c1c9b869c04d@redhat.com> X-Scanned-By: MIMEDefang 2.85 on 10.11.54.10 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Sep 01, 2022 at 03:00:28PM -0400, Jason Merrill wrote: > > Apparently clang uses -Wunicode option to cover these, but unfortunately > > they don't bother to document it (nor almost any other warning option), > > so it is unclear what else exactly it covers. Plus a question is how > > we should document that option for GCC... > > We might as well use the same flag name, and document it to mean what it > currently means for GCC. Ok, will work on that tomorrow. > > @@ -1489,8 +1507,16 @@ _cpp_valid_ucn (cpp_reader *pfile, const > > if (str < limit && *str == '}') > > { > > - if (name == str && identifier_pos) > > + if (identifier_pos && (name == str || !strict)) > > { > > + if (name == str) > > + cpp_warning (pfile, CPP_W_NONE, > > + "empty named universal character escape " > > + "sequence; treating it as separate tokens"); > > + else > > + cpp_warning (pfile, CPP_W_NONE, > > + "incomplete named universal character escape " > > + "sequence; treating it as separate tokens"); > > It looks like this is handling \N{abc}, for which "incomplete" seems like > the wrong description; it's complete, just wrong, and the diagnostic doesn't > help correct it. The point is to make it more consistent with the \N{X.1} handling. The grammar is clear that only upper case letters + digits + space + hyphen can appear in between \N{ and }. So, both of those cases IMHO should be handled the same. The !strict case is if there is at least one lower case letter or underscore but no other characters than letters + digits + space + hyphen + underscore, we then find the terminating } and inside of string/character literals want to do the UAX44LM2 algorithm suggestions. But for X.1 in literals we don't even look for }, we just emit the cpp_error (pfile, CPP_DL_ERROR, "'\\N{' not terminated with '}' after %.*s", (int) (str - base), base); diagnostics which prints after X For the identifier_pos case, both the !strict and *str != '}' cases are the same reason why it is treated as separate tokens, not because the name is not valid, but because it contains invalid characters. So perhaps for the identifier_pos !strict and *str != '}' cases we could emit a warning with the same wording as above (but so that we stop for !strict on the first lowercase or _ char just break instead of set strict = true if identifier_pos). Or we could emit such a warning and a note that would clarify that only upper case letters, digits, space or hyphen are allowed there? Jakub