From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 0F28D385843E for ; Wed, 17 Aug 2022 21:19:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0F28D385843E Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-626--H_HEo1PM4GNbInjksGGpg-1; Wed, 17 Aug 2022 17:19:19 -0400 X-MC-Unique: -H_HEo1PM4GNbInjksGGpg-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7DB4C1C051A6; Wed, 17 Aug 2022 21:19:19 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.41]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 263A8492C3B; Wed, 17 Aug 2022 21:19:19 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 27HLJGAm3135993 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Wed, 17 Aug 2022 23:19:16 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 27HLJFKF3135992; Wed, 17 Aug 2022 23:19:15 +0200 Date: Wed, 17 Aug 2022 23:19:14 +0200 From: Jakub Jelinek To: Jason Merrill Cc: Marek Polacek , "Joseph S. Myers" , gcc-patches@gcc.gnu.org Subject: Re: [PATCH] libcpp: Implement C++23 P2290R3 - Delimited escape sequences [PR106645] Message-ID: Reply-To: Jakub Jelinek References: MIME-Version: 1.0 In-Reply-To: X-Scanned-By: MIMEDefang 2.85 on 10.11.54.9 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 17 Aug 2022 21:19:24 -0000 On Wed, Aug 17, 2022 at 04:47:19PM -0400, Jason Merrill via Gcc-patches wrote: > > + length = 32; > > /* Magic value to indicate no digits seen. */ Indeed, will add the comment. > > + delimited = true; > > + if (loc_reader) > > + char_range->m_finish = loc_reader->get_next ().m_finish; > > + } > > + } > > else if (str[-1] == 'U') > > length = 8; > > else > > @@ -1107,6 +1118,8 @@ _cpp_valid_ucn (cpp_reader *pfile, const > > result = 0; > > do > > { > > + if (str == limit) > > + break; > > c = *str; > > if (!ISXDIGIT (c)) > > break; > > @@ -1116,9 +1129,41 @@ _cpp_valid_ucn (cpp_reader *pfile, const > > gcc_assert (char_range); > > char_range->m_finish = loc_reader->get_next ().m_finish; > > } > > + if (delimited) > > + { > > + if (!result) > > + /* Accept arbitrary number of leading zeros. */ > > + length = 16; > > + else if (length == 8) > > + { > > + /* Make sure we detect overflows. */ > > + result |= 0x8000000; > > + ++length; > > + } > > 16 above so that this case happens after we read 8 digits after leading > zeroes? Another magic value less than the no digits seen one and > 8, so that it can count 8 digits with the first non-zero one after which to or in the overflow flag. The intent is not to break the loop if there are further digits, just that there will be overflow. Another option would be those overflow |= n ^ (n << 4 >> 4); tests that convert_hex does and just making sure length is never decremented (except we need a way to distinguish between \u{} and at least one digit). > > + if (loc_reader) > > + char_range->m_finish = loc_reader->get_next ().m_finish; > > Here and in other functions, the pattern of increment the input pointer and > update m_finish seems like it should be a macro? Perhaps or inline function. Before my patch, there are 5 such ifs (some with char_range.m_finish and others char_range->m_finish), the patch adds another 7 such spots. > > @@ -2119,15 +2255,23 @@ _cpp_interpret_identifier (cpp_reader *p > > cppchar_t value = 0; > > size_t bufleft = len - (bufp - buf); > > int rval; > > + bool delimited = false; > > idp += 2; > > + if (length == 4 && id[idp] == '{') > > + { > > + delimited = true; > > + idp++; > > + } > > while (length && idp < len && ISXDIGIT (id[idp])) > > { > > value = (value << 4) + hex_value (id[idp]); > > idp++; > > - length--; > > + if (!delimited) > > + length--; > > } > > - idp--; > > + if (!delimited) > > + idp--; > > Don't we need to check that the first non-xdigit is a }? The comments and my understanding of the code say that we first check what is a valid identifier and the above is only called on a valid identifier. So, if it would be delimited \u{ not terminated with }, then it would fail forms_identifier_p and wouldn't be included in the range. Thus e.g. the ISXDIGIT (id[id]) test is probably not needed unless delimited is true because we've checked earlier that it has 4 or 8 hex digits. But sure, if you want a id[idp] == '}' test or assertion, it can be added. Jakub