From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qv1-xf2e.google.com (mail-qv1-xf2e.google.com [IPv6:2607:f8b0:4864:20::f2e]) by sourceware.org (Postfix) with ESMTPS id C288D3858280 for ; Tue, 6 Jun 2023 20:50:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C288D3858280 Authentication-Results: sourceware.org; dmarc=pass (p=reject dis=none) header.from=kitware.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=kitware.com Received: by mail-qv1-xf2e.google.com with SMTP id 6a1803df08f44-6260bb94363so37978806d6.0 for ; Tue, 06 Jun 2023 13:50:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kitware.com; s=google; t=1686084635; x=1688676635; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=REXHb1SA1zX3JaNOuBOM4dldKzP8DRyzfZx6ICOG4Vs=; b=uLipXGnt8Dv54a08r6U4tB3iozlN1gUR/9eVpCL8zn1njprBbpXJ2ssxuGw5l+/Uo/ nl1GHwCiWawpaJ9vhvUBK9aIXbUo/aoVVOo8IgkgXFBA064Mhzwx+Dox+LH+rnI5PVoy QsSZ7I8gBX1Qk0Xg5n0VV9Xp8Qps0/p6gYIhE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686084635; x=1688676635; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=REXHb1SA1zX3JaNOuBOM4dldKzP8DRyzfZx6ICOG4Vs=; b=TFwXtePQbgGlINxs3jlwxVcFA9FRDCuGAEx6KjIw4hWJLfxeqXJ1FXj8ovaAOnyXYg jQoy+/WIgw5x92Kvnad/JY5NZw9ayHghwhdmir+OSEMNJ+7aUburw6A0hPT4GkJJ542/ RrSen/t0Y8wSIckdmnNKleRaazahsRXaE+T6QorEG8eXLiRQpItZ56ZRDeZRApSHVd0M qytQCWLLYrWnYoT/COUqvS0d0pKbu8Ll6UQHqyW+1iQ9cB/QhJbEjKnSwYXCoyi9PDTr gdayKa8qyTTKR8PgXV+v9NZMzwVD0Eo+L/t3UKgA1/ErrJQJkdQYtPSDrkwMVD+UD6Qi NW1g== X-Gm-Message-State: AC+VfDzvTslwPPnoJoGHUvWkFIaLfM3vfW65E0H52mHFG4L+hS2KfOzQ oC1PVi2/mTWaidWSZn9qiPknKA== X-Google-Smtp-Source: ACHHUZ6oce9kJgyy38z4PZe8nnlIPiscU/gBa7ZsQZQnTYq+TTAxT+ABYw08nSrxXOhYGFoZd0MzuQ== X-Received: by 2002:a05:6214:2a84:b0:625:86ed:8aab with SMTP id jr4-20020a0562142a8400b0062586ed8aabmr874774qvb.14.1686084635112; Tue, 06 Jun 2023 13:50:35 -0700 (PDT) Received: from localhost (cpe-142-105-146-128.nycap.res.rr.com. [142.105.146.128]) by smtp.gmail.com with ESMTPSA id w2-20020ac84d02000000b003f6a0fa022bsm5831228qtv.51.2023.06.06.13.50.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jun 2023 13:50:34 -0700 (PDT) From: Ben Boeckel To: gcc-patches@gcc.gnu.org Cc: Ben Boeckel , jason@redhat.com, nathan@acm.org, fortran@gcc.gnu.org, gcc@gcc.gnu.org, brad.king@kitware.com Subject: [PATCH v6 1/4] libcpp: reject codepoints above 0x10FFFF Date: Tue, 6 Jun 2023 16:50:22 -0400 Message-Id: <20230606205025.3164738-2-ben.boeckel@kitware.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20230606205025.3164738-1-ben.boeckel@kitware.com> References: <20230606205025.3164738-1-ben.boeckel@kitware.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Unicode does not support such values because they are unrepresentable in UTF-16. libcpp/ * charset.cc: Reject encodings of codepoints above 0x10FFFF. UTF-16 does not support such codepoints and therefore all Unicode rejects such values. Signed-off-by: Ben Boeckel --- libcpp/charset.cc | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/libcpp/charset.cc b/libcpp/charset.cc index d7f323b2cd5..3b34d804cf1 100644 --- a/libcpp/charset.cc +++ b/libcpp/charset.cc @@ -1886,6 +1886,13 @@ cpp_valid_utf8_p (const char *buffer, size_t num_bytes) int err = one_utf8_to_cppchar (&iter, &bytesleft, &cp); if (err) return false; + + /* Additionally, Unicode declares that all codepoints above 0010FFFF are + invalid because they cannot be represented in UTF-16. + + Reject such values.*/ + if (cp >= 0x10FFFF) + return false; } /* No problems encountered. */ return true; -- 2.40.1