From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) by sourceware.org (Postfix) with ESMTPS id CF2133854153 for ; Fri, 10 Feb 2023 16:58:16 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CF2133854153 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-lf1-x12a.google.com with SMTP id cf42so9271026lfb.1 for ; Fri, 10 Feb 2023 08:58:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ZUknCCtdn0+7OC0DxRi/9plrPpTEfdtAlT3NZl8MFIA=; b=fUJbObdsZdCXdrKVtmCvxrmqA7gw4mrKEnkOoCYBspv6vkpPKq65ndxNpKTAiu7NeE 44JKeuNO7PaDLdz8cISjIdq/Lih3jXlTH+O1A4BsJWfc5z7OlQbyp+OiQ8OOrYuqY2Vl SP85+4EPotpqDqbquE7yy3wBQUkwtFvEiIlZmvDx1/nsZEeIbGVL1rhLb3bwqD8AoiRb aOEMD6EP0OOUXiA33GIXmTfUTtKG7IufRgG1j9KAgZYWgErMCQBiNzj6TQZ4dMJo/cGN KbJcjWaUlIM1wdkpRosPDxTqW4T2QJQFQmOia+fTYOlGmvgK1p+BOGMKE3T2wlIXqDDY KB8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ZUknCCtdn0+7OC0DxRi/9plrPpTEfdtAlT3NZl8MFIA=; b=WIrB+GFpODsrz1/0hMoNY6/SJ1RTLYDTdQkA2GQ4g6IDTvyDS44ii9zz9kiKXayUPX fyWQbo1s2a18/16QqjaoAIyVrg2T1sHCmHhK6AJP8bA5BXN8SIsUh6gfnq1GcTLvwpUI TSFd9Hr92BZLiFPyv1vdiJ0JrEB2IjQXdZ67kP5LefNP7lYb2ci1WxddZZz6RSpbJ27s d7F6tztDdF/fZxzrR5lIwvAsCNBkPnZqv7BoOlAJyRxAXcOLf0i2qhgdmGcOF3rwBlM3 gCcx9NZQrYvezcNjb8R/OtRLSSyyx7lwZBq+lBRXpQG3tTM8i2M8wslS5VgfuX6dfDRI yinQ== X-Gm-Message-State: AO0yUKX/T9ptq78K+sf5n69Up7lnfMsibnD+ByO59BYdvIyhIs3qNkAO vP5Fgbyfqvy2ZCHF69LoaYwknGHayKLcRwcCbyw= X-Google-Smtp-Source: AK7set9Th7zuP5DHdOvgZ2Xke8ePwXxa0ZaIdHA75FDeBX2hGBUgqpdjuzCQcq0ryjPW7aveZo3DikLRvLKXy8MJxlk= X-Received: by 2002:ac2:4859:0:b0:4d1:5c2c:553e with SMTP id 25-20020ac24859000000b004d15c2c553emr2981625lfy.275.1676048295195; Fri, 10 Feb 2023 08:58:15 -0800 (PST) MIME-Version: 1.0 References: <20220614212649.GA58025@ldh-imac.local> <20220615190616.GA70682@ldh-imac.local> <20220926222725.GA19652@ldh-imac.local> In-Reply-To: From: Lewis Hyatt Date: Fri, 10 Feb 2023 11:58:03 -0500 Message-ID: Subject: Re: Ping^3: [PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902] To: Jakub Jelinek Cc: Jason Merrill , Nathan Sidwell , gcc-patches@gcc.gnu.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-3029.4 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Fri, Feb 10, 2023 at 11:30 AM Jakub Jelinek wrote: > > On Mon, Sep 26, 2022 at 06:27:25PM -0400, Lewis Hyatt via Gcc-patches wrote: > > May I please ping this patch again? Joseph suggested that it would be best if > > a C++ maintainer has a look at it. This is one of just a few places left where > > we don't handle UTF-8 properly in libcpp, it would be really nice to get them > > fixed up if there is time to review this patch. Thanks! > > CCing them. > > Just some nits from me, but I agree C++ maintainers are the best reviewers > for this. Thanks so much for looking it over, I really appreciate it. I'll be sure to incorporate all your feedback along with those from the full review. Is this for stage 1 at this point BTW? One note, the patch as-is doesn't quite apply to master branch nowadays, it just needs a small tweak since warn_about_normalization() has acquired a new argument in the meantime. If it's helpful I can resend it with this addressed, as well as the rest of your comments? Finally one comment here: > > + if (const auto sr = scan_cur_identifier (pfile)) > > + { > > + /* If a string format macro, say from inttypes.h, is placed touching > > + a string literal it could be parsed as a C++11 user-defined > > + string literal thus breaking the program. User-defined literals > > + outside of namespace std must start with a single underscore, so > > + assume anything of that form really is a UDL suffix. We don't > > + need to worry about UDLs defined inside namespace std because > > + their names are reserved, so cannot be used as macro names in > > + valid programs. */ > > + if ((suffix_begin[0] != '_' || suffix_begin[1] == '_') > > + && cpp_macro_p (sr.node)) > > What is the advantage of dropping is_macro_not_literal_suffix and > hand-inlining it in two different spots? > Couldn't even the actual warning be moved into an inline function? The is_macro() function was doing two jobs, first lexing the identifier and looking it up in the hash table, and then calling cpp_macro_p(). This was a bit duplicative because the identifier was then immediately lexed again after the check. Since lexing it became more complicated with UTF-8 support, I changed it not to duplicate that effort and instead scan_cur_identifer() does the job once. With that done, all that's left for is_macro() to do is just the one line check so I got rid of it. However, I agree that the check about suffix_begin is not really trivial and so factoring this out into one place instead of two makes sense. I'll try to move the whole warning into its own function in the next iteration. > Otherwise it looks reasonable to me, but I'd still prefer Jason or Nathan > to review this. > > Jakub > Thanks again. -Lewis