From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 20EE4384D14D for ; Thu, 20 Oct 2022 18:22:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 20EE4384D14D Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666290126; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=69oLm1JSSOfEy9zFIiqMF+pDCo79nUQsLzqWb1kXWaM=; b=WikvFGzvpxAu5pYqEXYajKjwjANRxSEO5zt2MEBwIdyWaU+d6vLCuDM8m5RmtpuKZtTsVr 9++bhFQHkl1/Egn5kOcJRKEmlZeYtHY7zvQTlyftpLbcnZk8iY8nVVihOKYe5w2bNAULf1 dB6kZPBwuOMlM6iZosoLfzsFDVMxwc4= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-435-FIC5kuOrNtea7y3haVsVgQ-1; Thu, 20 Oct 2022 14:22:05 -0400 X-MC-Unique: FIC5kuOrNtea7y3haVsVgQ-1 Received: by mail-qk1-f198.google.com with SMTP id t1-20020a05620a450100b006ee79ceeb6fso669535qkp.11 for ; Thu, 20 Oct 2022 11:22:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=69oLm1JSSOfEy9zFIiqMF+pDCo79nUQsLzqWb1kXWaM=; b=0h26Dhk+O+uhpy2ciz7Xvu7Xp14VIWET4VffGI5zogwxbYoAHVYLiv3grVZ4auTRJf XpuEDGkaU05wtfD0K7XHM68gmuZL5ALI0MqDekyO/WISHNXgjo6G8bQcTFhvpXcHXcZm HWoXm5oXFzptFPaexv57bnzj4D3WS+gmcyYw9Njr2PYc7EXTUMdJiVfjGfKIIZZQx43S BdQ2IbRD/P4MwFOkCadl7bZbQ3U4SxnX4HkQJng/Sy90iTfBBHssfh96EXGSzHvZvhfG iqvreAhDEwVIOmWPUhO1texUBbhF+G3bqfYTe3ebtio/LUZksbOt6eWCUd8AIyiyiide R+KQ== X-Gm-Message-State: ACrzQf1oTOE14sW9im/hQchzzd4C+JcAp808FsBdvGFKiPLuVt+xKxDb FXLD++vV/JBP8UgwFdzhtg4GA7PvwpBAQQKH/LEmDdlmOyVZ/NwmFlGUq8LLpO6CzyA83gg27Fa D1PyqIh8= X-Received: by 2002:a05:622a:1ca:b0:39d:1c16:7cfc with SMTP id t10-20020a05622a01ca00b0039d1c167cfcmr434918qtw.407.1666290125039; Thu, 20 Oct 2022 11:22:05 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6S5QFRXHW6b22lgkqjVSDjO5OEQvso32yNPX7Iy4mQ7jIFfryzN4GmdGqxJEYQBd4gHEGJ7g== X-Received: by 2002:a05:622a:1ca:b0:39d:1c16:7cfc with SMTP id t10-20020a05622a01ca00b0039d1c167cfcmr434894qtw.407.1666290124777; Thu, 20 Oct 2022 11:22:04 -0700 (PDT) Received: from [192.168.1.101] (130-44-159-43.s15913.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [130.44.159.43]) by smtp.gmail.com with ESMTPSA id dt5-20020a05620a478500b006ee94c5bf26sm8083014qkb.91.2022.10.20.11.22.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 20 Oct 2022 11:22:04 -0700 (PDT) Message-ID: Date: Thu, 20 Oct 2022 14:22:03 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.2 Subject: Re: [PATCH RESEND 1/1] p1689r5: initial support To: Ben Boeckel Cc: Ben Boeckel , gcc-patches@gcc.gnu.org, nathan@acm.org, fortran@gcc.gnu.org, gcc@gcc.gnu.org, brad.king@kitware.com References: <20221004151200.1275636-1-ben.boeckel@kitware.com> <20221004151200.1275636-2-ben.boeckel@kitware.com> <78b88b1d-b328-a140-3a27-d33a3d96f3b9@redhat.com> <49eb01df-8da3-c5fb-f2c2-864c6c7dc227@redhat.com> From: Jason Merrill In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 10/20/22 13:31, Ben Boeckel wrote: > On Thu, Oct 20, 2022 at 11:39:25 -0400, Jason Merrill wrote: >> Oops, I was thinking this was in gcc as well. In libcpp there's >> _cpp_valid_utf8 (which calls one_utf8_to_cppchar). > > This routine has a lot more logic (including UCN decoding) and the > `one_utf8_to_cppchar` also supports out-of-bounds codepoints above > `0x10FFFF`. The latter seems like a bug to be fixed; presumably it hasn't been updated since the range of codepoints was restricted. This sort of thing is why I'd like to minimize the number of separate implementations of UTF-8 parsing. Jason