From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id DD9E93858D1E for ; Thu, 18 Aug 2022 02:22:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DD9E93858D1E Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-627-2hGpnz87PreiRQ_U9-qc9Q-1; Wed, 17 Aug 2022 22:22:05 -0400 X-MC-Unique: 2hGpnz87PreiRQ_U9-qc9Q-1 Received: by mail-qk1-f199.google.com with SMTP id bm11-20020a05620a198b00b006bb2388ef0cso297603qkb.5 for ; Wed, 17 Aug 2022 19:22:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc; bh=V9wa6jS9YZAse4jJdds/cm5e4IgphZoB/y5LA54zZwI=; b=5+eilIexuN04bAxPf02Qz7Bjv1YVDEVTLE+0pqSRoq7b6BbWBL2rN+ltiyxCDlcF/J E4+7W3IJXLPgG1PSDIPnkiVFjytMNy/FbRs2+TXMmJZM2NyRzXC6xG72gdTOkR9YqAK4 UZXLq2+YTaasxXDVCNgm9Huxui3/QXEIaaY2xb+wjLjZfPbcDtdWZAWN5oKMc7hmRTAr loFb5J1Jwi47qqmKuP9NWc6fSuHkJ7JYeZRA5Tz0tRkgIHPEYzytGn9Oh3srRxTRQ9co yI/bmfPngdRmFLsOh67bRpxDXMRHyb9iaj65x9Cz/IwzgP+ZdLGVQqBCaY8QdpM6gryj XSag== X-Gm-Message-State: ACgBeo2nt19M/rGeHDeEbSSzspS/fUguv4PlCuw4/uX2f6xYJX5NyOO1 Bydn9dEabrsPeRArYO9In7+7X2/5xLZ4c+DNFqUGgguhPTTrzh5wD/ww597ce6HbwPYCf0Pf4UP 6LCh7qqOOwzsZBwt5PA== X-Received: by 2002:ac8:574d:0:b0:343:6eda:6906 with SMTP id 13-20020ac8574d000000b003436eda6906mr931970qtx.576.1660789325326; Wed, 17 Aug 2022 19:22:05 -0700 (PDT) X-Google-Smtp-Source: AA6agR6uQHbkzAIoNVFf1kDJqdtTuE+e0LgxhLGSSRzD1186E0b2bkukwK/eDuvsrl7hnGEDIl4FXA== X-Received: by 2002:ac8:574d:0:b0:343:6eda:6906 with SMTP id 13-20020ac8574d000000b003436eda6906mr931953qtx.576.1660789325043; Wed, 17 Aug 2022 19:22:05 -0700 (PDT) Received: from [192.168.1.101] (130-44-159-43.s15913.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [130.44.159.43]) by smtp.gmail.com with ESMTPSA id c23-20020a05620a269700b006b893d135basm448054qkp.86.2022.08.17.19.22.04 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 17 Aug 2022 19:22:04 -0700 (PDT) Message-ID: <720d71cc-0883-f8e4-2321-2e2594cf93aa@redhat.com> Date: Wed, 17 Aug 2022 22:22:03 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.12.0 Subject: Re: [PATCH] libcpp: Implement C++23 P2290R3 - Delimited escape sequences [PR106645] To: Jakub Jelinek Cc: Marek Polacek , "Joseph S. Myers" , gcc-patches@gcc.gnu.org References: From: Jason Merrill In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-7.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 18 Aug 2022 02:22:16 -0000 On 8/17/22 14:19, Jakub Jelinek wrote: > On Wed, Aug 17, 2022 at 04:47:19PM -0400, Jason Merrill via Gcc-patches wrote: >>> + length = 32; >> >> /* Magic value to indicate no digits seen. */ > > Indeed, will add the comment. > >>> + delimited = true; >>> + if (loc_reader) >>> + char_range->m_finish = loc_reader->get_next ().m_finish; >>> + } >>> + } >>> else if (str[-1] == 'U') >>> length = 8; >>> else >>> @@ -1107,6 +1118,8 @@ _cpp_valid_ucn (cpp_reader *pfile, const >>> result = 0; >>> do >>> { >>> + if (str == limit) >>> + break; >>> c = *str; >>> if (!ISXDIGIT (c)) >>> break; >>> @@ -1116,9 +1129,41 @@ _cpp_valid_ucn (cpp_reader *pfile, const >>> gcc_assert (char_range); >>> char_range->m_finish = loc_reader->get_next ().m_finish; >>> } >>> + if (delimited) >>> + { >>> + if (!result) >>> + /* Accept arbitrary number of leading zeros. */ >>> + length = 16; >>> + else if (length == 8) >>> + { >>> + /* Make sure we detect overflows. */ >>> + result |= 0x8000000; >>> + ++length; >>> + } >> >> 16 above so that this case happens after we read 8 digits after leading >> zeroes? > > Another magic value less than the no digits seen one and >8, > so that it can count 8 digits with the first non-zero one after > which to or in the overflow flag. The intent is not to break the loop > if there are further digits, just that there will be overflow. > Another option would be those overflow |= n ^ (n << 4 >> 4); > tests that convert_hex does and just making sure length is never decremented > (except we need a way to distinguish between \u{} and at least one digit). This way is fine, could just use more comment. >>> + if (loc_reader) >>> + char_range->m_finish = loc_reader->get_next ().m_finish; >> >> Here and in other functions, the pattern of increment the input pointer and >> update m_finish seems like it should be a macro? > > Perhaps or inline function. Before my patch, there are 5 such ifs > (some with char_range.m_finish and others char_range->m_finish), > the patch adds another 7 such spots. Either way is fine. >>> @@ -2119,15 +2255,23 @@ _cpp_interpret_identifier (cpp_reader *p >>> cppchar_t value = 0; >>> size_t bufleft = len - (bufp - buf); >>> int rval; >>> + bool delimited = false; >>> idp += 2; >>> + if (length == 4 && id[idp] == '{') >>> + { >>> + delimited = true; >>> + idp++; >>> + } >>> while (length && idp < len && ISXDIGIT (id[idp])) >>> { >>> value = (value << 4) + hex_value (id[idp]); >>> idp++; >>> - length--; >>> + if (!delimited) >>> + length--; >>> } >>> - idp--; >>> + if (!delimited) >>> + idp--; >> >> Don't we need to check that the first non-xdigit is a }? > > The comments and my understanding of the code say that we first > check what is a valid identifier and the above is only called on > a valid identifier. So, if it would be delimited \u{ not terminated > with }, then it would fail forms_identifier_p and wouldn't be included > in the range. Thus e.g. the ISXDIGIT (id[id]) test is probably not needed > unless delimited is true because we've checked earlier that it has 4 or 8 > hex digits. > But sure, if you want a id[idp] == '}' test or assertion, it can be > added. OK, a comment mentioning this should be sufficient. Jason