From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 1B2B3384D154 for ; Thu, 20 Oct 2022 18:22:09 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 1B2B3384D154 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1666290128; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=69oLm1JSSOfEy9zFIiqMF+pDCo79nUQsLzqWb1kXWaM=; b=gWTvaZnm4Ht+ykKAx/gNL76JcQxttG1efHq2XITbk34G8n6xs4kuLEtNN1kCCcWziResqA BLY43ZxWorkA4r/S+tWit7Qepm7BXLqKinaYzd4nS5y9pwNO4Zd6n2sIpqpb/TRiMoD4zG xkJarOtZ8LQ8OdO6dzHPAwm5Se5RGDk= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-445-Fv0-PUaXNZyOTFOBqr1CoQ-1; Thu, 20 Oct 2022 14:22:05 -0400 X-MC-Unique: Fv0-PUaXNZyOTFOBqr1CoQ-1 Received: by mail-qk1-f199.google.com with SMTP id de21-20020a05620a371500b006eed31abb72so691634qkb.6 for ; Thu, 20 Oct 2022 11:22:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=69oLm1JSSOfEy9zFIiqMF+pDCo79nUQsLzqWb1kXWaM=; b=5GAULyAsAYUnvCadMbwXAqJDEfm9evG+VEiW2M6fxQtW8r7GG6lnJZhtK8Dnak0wph mF0vIxDANICyfyg2HP7O5QQF0e2SDe8XQeBOR37bFJY+Oi/csrGqdHgRTWwBij1GFcCx yW1gWYghBGbB3S8C6ulwuL5B3wjfiFzenv/EO0oO4eepXI3sEOFJNq+KsyWNxNk265uH qtEAvd2eKPCuRlrplhfhyKqra1pwICF44cOAkRoMKBN3W1wc7uumFUq0iMd+O2gskxQg 1WNXYR1xSgApQdj0wPaKkyH8ZjCmGTOnkN1Eezduv8uAw+kL3D+DDdeffpYjxyEpsMTB kAhg== X-Gm-Message-State: ACrzQf16B380nmzKnhnU+cJYgg/6htHwGj51UlApCN2UrHdA66floHlD xH4LzH76xGT+iidpu9Cb0OmZNdrBCRFGdArRsUbP9fe+XGPZUBXEamyYazVWnCxoZgq0Ln/0HTk tUL4zJqdTGGM4gJmboA== X-Received: by 2002:a05:622a:1ca:b0:39d:1c16:7cfc with SMTP id t10-20020a05622a01ca00b0039d1c167cfcmr434919qtw.407.1666290125040; Thu, 20 Oct 2022 11:22:05 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6S5QFRXHW6b22lgkqjVSDjO5OEQvso32yNPX7Iy4mQ7jIFfryzN4GmdGqxJEYQBd4gHEGJ7g== X-Received: by 2002:a05:622a:1ca:b0:39d:1c16:7cfc with SMTP id t10-20020a05622a01ca00b0039d1c167cfcmr434894qtw.407.1666290124777; Thu, 20 Oct 2022 11:22:04 -0700 (PDT) Received: from [192.168.1.101] (130-44-159-43.s15913.c3-0.arl-cbr1.sbo-arl.ma.cable.rcncustomer.com. [130.44.159.43]) by smtp.gmail.com with ESMTPSA id dt5-20020a05620a478500b006ee94c5bf26sm8083014qkb.91.2022.10.20.11.22.03 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 20 Oct 2022 11:22:04 -0700 (PDT) Message-ID: Date: Thu, 20 Oct 2022 14:22:03 -0400 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.2 Subject: Re: [PATCH RESEND 1/1] p1689r5: initial support To: Ben Boeckel Cc: Ben Boeckel , gcc-patches@gcc.gnu.org, nathan@acm.org, fortran@gcc.gnu.org, gcc@gcc.gnu.org, brad.king@kitware.com References: <20221004151200.1275636-1-ben.boeckel@kitware.com> <20221004151200.1275636-2-ben.boeckel@kitware.com> <78b88b1d-b328-a140-3a27-d33a3d96f3b9@redhat.com> <49eb01df-8da3-c5fb-f2c2-864c6c7dc227@redhat.com> From: Jason Merrill In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 10/20/22 13:31, Ben Boeckel wrote: > On Thu, Oct 20, 2022 at 11:39:25 -0400, Jason Merrill wrote: >> Oops, I was thinking this was in gcc as well. In libcpp there's >> _cpp_valid_utf8 (which calls one_utf8_to_cppchar). > > This routine has a lot more logic (including UCN decoding) and the > `one_utf8_to_cppchar` also supports out-of-bounds codepoints above > `0x10FFFF`. The latter seems like a bug to be fixed; presumably it hasn't been updated since the range of codepoints was restricted. This sort of thing is why I'd like to minimize the number of separate implementations of UTF-8 parsing. Jason