From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jason@redhat.com>
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
 by sourceware.org (Postfix) with ESMTPS id DD9E93858D1E
 for <gcc-patches@gcc.gnu.org>; Thu, 18 Aug 2022 02:22:14 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DD9E93858D1E
Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com
 [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-627-2hGpnz87PreiRQ_U9-qc9Q-1; Wed, 17 Aug 2022 22:22:05 -0400
X-MC-Unique: 2hGpnz87PreiRQ_U9-qc9Q-1
Received: by mail-qk1-f199.google.com with SMTP id
 bm11-20020a05620a198b00b006bb2388ef0cso297603qkb.5
 for <gcc-patches@gcc.gnu.org>; Wed, 17 Aug 2022 19:22:05 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=content-transfer-encoding:in-reply-to:from:references:cc:to
 :content-language:subject:user-agent:mime-version:date:message-id
 :x-gm-message-state:from:to:cc;
 bh=V9wa6jS9YZAse4jJdds/cm5e4IgphZoB/y5LA54zZwI=;
 b=5+eilIexuN04bAxPf02Qz7Bjv1YVDEVTLE+0pqSRoq7b6BbWBL2rN+ltiyxCDlcF/J
 E4+7W3IJXLPgG1PSDIPnkiVFjytMNy/FbRs2+TXMmJZM2NyRzXC6xG72gdTOkR9YqAK4
 UZXLq2+YTaasxXDVCNgm9Huxui3/QXEIaaY2xb+wjLjZfPbcDtdWZAWN5oKMc7hmRTAr
 loFb5J1Jwi47qqmKuP9NWc6fSuHkJ7JYeZRA5Tz0tRkgIHPEYzytGn9Oh3srRxTRQ9co
 yI/bmfPngdRmFLsOh67bRpxDXMRHyb9iaj65x9Cz/IwzgP+ZdLGVQqBCaY8QdpM6gryj
 XSag==
X-Gm-Message-State: ACgBeo2nt19M/rGeHDeEbSSzspS/fUguv4PlCuw4/uX2f6xYJX5NyOO1
 Bydn9dEabrsPeRArYO9In7+7X2/5xLZ4c+DNFqUGgguhPTTrzh5wD/ww597ce6HbwPYCf0Pf4UP
 6LCh7qqOOwzsZBwt5PA==
X-Received: by 2002:ac8:574d:0:b0:343:6eda:6906 with SMTP id
 13-20020ac8574d000000b003436eda6906mr931970qtx.576.1660789325326; 
 Wed, 17 Aug 2022 19:22:05 -0700 (PDT)
X-Google-Smtp-Source: AA6agR6uQHbkzAIoNVFf1kDJqdtTuE+e0LgxhLGSSRzD1186E0b2bkukwK/eDuvsrl7hnGEDIl4FXA==
X-Received: by 2002:ac8:574d:0:b0:343:6eda:6906 with SMTP id
 13-20020ac8574d000000b003436eda6906mr931953qtx.576.1660789325043; 
 Wed, 17 Aug 2022 19:22:05 -0700 (PDT)
Received: from [192.168.1.101]
 (130-44-159-43.s15913.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com.
 [130.44.159.43]) by smtp.gmail.com with ESMTPSA id
 c23-20020a05620a269700b006b893d135basm448054qkp.86.2022.08.17.19.22.04
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Wed, 17 Aug 2022 19:22:04 -0700 (PDT)
Message-ID: <720d71cc-0883-f8e4-2321-2e2594cf93aa@redhat.com>
Date: Wed, 17 Aug 2022 22:22:03 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.12.0
Subject: Re: [PATCH] libcpp: Implement C++23 P2290R3 - Delimited escape
 sequences [PR106645]
To: Jakub Jelinek <jakub@redhat.com>
Cc: Marek Polacek <polacek@redhat.com>,
 "Joseph S. Myers" <joseph@codesourcery.com>, gcc-patches@gcc.gnu.org
References: <YvyWH6JZYXjkPO49@tucnak>
 <d6c2553d-8ed3-b0e3-6d1d-0445103f1a0a@redhat.com> <Yv1bUmjit6zH+Jw0@tucnak>
From: Jason Merrill <jason@redhat.com>
In-Reply-To: <Yv1bUmjit6zH+Jw0@tucnak>
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Language: en-US
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A,
 RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Aug 2022 02:22:16 -0000

On 8/17/22 14:19, Jakub Jelinek wrote:
> On Wed, Aug 17, 2022 at 04:47:19PM -0400, Jason Merrill via Gcc-patches wrote:
>>> +	  length = 32;
>>
>> /* Magic value to indicate no digits seen.  */
> 
> Indeed, will add the comment.
> 
>>> +	  delimited = true;
>>> +	  if (loc_reader)
>>> +	    char_range->m_finish = loc_reader->get_next ().m_finish;
>>> +	}
>>> +    }
>>>      else if (str[-1] == 'U')
>>>        length = 8;
>>>      else
>>> @@ -1107,6 +1118,8 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>>>      result = 0;
>>>      do
>>>        {
>>> +      if (str == limit)
>>> +	break;
>>>          c = *str;
>>>          if (!ISXDIGIT (c))
>>>    	break;
>>> @@ -1116,9 +1129,41 @@ _cpp_valid_ucn (cpp_reader *pfile, const
>>>    	  gcc_assert (char_range);
>>>    	  char_range->m_finish = loc_reader->get_next ().m_finish;
>>>    	}
>>> +      if (delimited)
>>> +	{
>>> +	  if (!result)
>>> +	    /* Accept arbitrary number of leading zeros.  */
>>> +	    length = 16;
>>> +	  else if (length == 8)
>>> +	    {
>>> +	      /* Make sure we detect overflows.  */
>>> +	      result |= 0x8000000;
>>> +	      ++length;
>>> +	    }
>>
>> 16 above so that this case happens after we read 8 digits after leading
>> zeroes?
> 
> Another magic value less than the no digits seen one and >8,
> so that it can count 8 digits with the first non-zero one after
> which to or in the overflow flag.  The intent is not to break the loop
> if there are further digits, just that there will be overflow.
> Another option would be those overflow |= n ^ (n << 4 >> 4);
> tests that convert_hex does and just making sure length is never decremented
> (except we need a way to distinguish between \u{} and at least one digit).

This way is fine, could just use more comment.

>>> +      if (loc_reader)
>>> +	char_range->m_finish = loc_reader->get_next ().m_finish;
>>
>> Here and in other functions, the pattern of increment the input pointer and
>> update m_finish seems like it should be a macro?
> 
> Perhaps or inline function.  Before my patch, there are 5 such ifs
> (some with char_range.m_finish and others char_range->m_finish),
> the patch adds another 7 such spots.

Either way is fine.

>>> @@ -2119,15 +2255,23 @@ _cpp_interpret_identifier (cpp_reader *p
>>>    	cppchar_t value = 0;
>>>    	size_t bufleft = len - (bufp - buf);
>>>    	int rval;
>>> +	bool delimited = false;
>>>    	idp += 2;
>>> +	if (length == 4 && id[idp] == '{')
>>> +	  {
>>> +	    delimited = true;
>>> +	    idp++;
>>> +	  }
>>>    	while (length && idp < len && ISXDIGIT (id[idp]))
>>>    	  {
>>>    	    value = (value << 4) + hex_value (id[idp]);
>>>    	    idp++;
>>> -	    length--;
>>> +	    if (!delimited)
>>> +	      length--;
>>>    	  }
>>> -	idp--;
>>> +	if (!delimited)
>>> +	  idp--;
>>
>> Don't we need to check that the first non-xdigit is a }?
> 
> The comments and my understanding of the code say that we first
> check what is a valid identifier and the above is only called on
> a valid identifier.  So, if it would be delimited \u{ not terminated
> with }, then it would fail forms_identifier_p and wouldn't be included
> in the range.  Thus e.g. the ISXDIGIT (id[id]) test is probably not needed
> unless delimited is true because we've checked earlier that it has 4 or 8
> hex digits.
> But sure, if you want a id[idp] == '}' test or assertion, it can be
> added.

OK, a comment mentioning this should be sufficient.

Jason