From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id ED1C23858C78 for ; Wed, 31 Jan 2024 09:52:47 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org ED1C23858C78 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org ED1C23858C78 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706694770; cv=none; b=K0y5qiuKDpQ7aktervX1hkWD7S3LcQkWdhi2qvKWyV46tPIEEEcS2+bF/byuffEpqyY3ZSYwVl24lGnvXyQlIgYByoYMLGNalqPDJtV7Nujwx8RTrwwNJZUFMT/rQgQDrMRF5WXwIDSrawL4ZRf9nboQUwsm7Vw01wpBT4aSYpI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1706694770; c=relaxed/simple; bh=brjTdCApyDLja7B3aZjHrX7Pf080VsBWpihXJixghbY=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=wyXAFWZRF5vA186UlQfTCkEre233PYhOARyFJZlto1W8d6zp8spKC9AaWF25RFGMXHS2XTdNbEhbkM9yaR0YIeF2Ov+K3UdHsZ6CbyFV17sjnazB0qKVlbqR+kJSUIcdq1QT8m2aEjUZQufKMcsqJ55SBJ/VxxM3gKVwDa2IROw= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1706694767; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=MwWKBbp56tWa9BZ2VJoRFgVrqRsD35az+HyWCHC8vjU=; b=CMmHz5u7Yu63VdXth5UgjSrZzmmeoOuuE607DKk7pCYPzzm6UUzIjowxsBz8HkP33LMl1Y syOFe4bDcSiU9gY8tZe/aNzoO7fXuYehQ2x5tpNGR9+FTbNuLska867BjdL0lJeWRrTH8S autkhfakLSxXEThImImZyB/E0z50rKI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-615-BxxV7c7jMuS0jYIPrpqRdA-1; Wed, 31 Jan 2024 04:52:46 -0500 X-MC-Unique: BxxV7c7jMuS0jYIPrpqRdA-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0CE0D87DC05; Wed, 31 Jan 2024 09:52:46 +0000 (UTC) Received: from localhost (unknown [10.42.28.13]) by smtp.corp.redhat.com (Postfix) with ESMTP id CDFF5492BE4; Wed, 31 Jan 2024 09:52:45 +0000 (UTC) From: Jonathan Wakely To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Cc: Ewan Higgs Subject: [committed] libstdc++: Add "ASCII" as an alias for std::text_encoding::id::ASCII Date: Wed, 31 Jan 2024 09:50:50 +0000 Message-ID: <20240131095245.1915153-1-jwakely@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.10 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: SG16 (Unicode and Text Study Group) and LWG are overwhelmingly in favour of adding this alias, so let's not wait for the issue to get voted into the working draft. Tested aarch64-linux. Pushed to trunk. -- >8 -- As noted in LWG 4043, "ASCII" is not an alias for any known registered character encoding, so std::text_encoding("ASCII").mib() == id::other. Add the alias "ASCII" to the implementation-defined superset of aliases for that encoding. libstdc++-v3/ChangeLog: * include/bits/text_encoding-data.h: Regenerate. * scripts/gen_text_encoding_data.py: Add extra_aliases dict containing "ASCII". * testsuite/std/text_encoding/cons.cc: Check "ascii" is known. Co-authored-by: Ewan Higgs Signed-off-by: Ewan Higgs --- .../include/bits/text_encoding-data.h | 3 ++- .../scripts/gen_text_encoding_data.py | 24 ++++++++++++++++++- .../testsuite/std/text_encoding/cons.cc | 5 ++++ 3 files changed, 30 insertions(+), 2 deletions(-) diff --git a/libstdc++-v3/include/bits/text_encoding-data.h b/libstdc++-v3/include/bits/text_encoding-data.h index 7ac2e9dc3d9..5041e738d21 100644 --- a/libstdc++-v3/include/bits/text_encoding-data.h +++ b/libstdc++-v3/include/bits/text_encoding-data.h @@ -14,6 +14,7 @@ { 3, "IBM367" }, { 3, "cp367" }, { 3, "csASCII" }, + { 3, "ASCII" }, // libstdc++ extension { 4, "ISO_8859-1:1987" }, { 4, "iso-ir-100" }, { 4, "ISO_8859-1" }, @@ -417,7 +418,7 @@ { 104, "csISO2022CN" }, { 105, "ISO-2022-CN-EXT" }, { 105, "csISO2022CNEXT" }, -#define _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET 413 +#define _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET 414 { 106, "UTF-8" }, { 106, "csUTF8" }, { 109, "ISO-8859-13" }, diff --git a/libstdc++-v3/scripts/gen_text_encoding_data.py b/libstdc++-v3/scripts/gen_text_encoding_data.py index 2d6f3e4077a..f0ebb42d8c2 100755 --- a/libstdc++-v3/scripts/gen_text_encoding_data.py +++ b/libstdc++-v3/scripts/gen_text_encoding_data.py @@ -36,6 +36,18 @@ print("#ifndef _GLIBCXX_GET_ENCODING_DATA") print('# error "This is not a public header, do not include it directly"') print("#endif\n") +# We need to generate a list of initializers of the form { mib, alias }, e.g., +# { 3, "US-ASCII" }, +# { 3, "ISO646-US" }, +# { 3, "csASCII" }, +# { 4, "ISO_8859-1:1987" }, +# { 4, "latin1" }, +# The initializers must be sorted by the mib value. The first entry for +# a given mib must be the primary name for the encoding. Any aliases for +# the encoding come after the primary name. +# We also define a macro _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET which is the +# offset into the list of the mib=106, alias="UTF-8" entry. This is used +# to optimize the common case, so we don't need to search for "UTF-8". charsets = {} with open(sys.argv[1], newline='') as f: @@ -52,10 +64,15 @@ with open(sys.argv[1], newline='') as f: aliases.remove(name) charsets[mib] = [name] + aliases -# Remove "NATS-DANO" and "NATS-DANO-ADD" +# Remove "NATS-DANO" and "NATS-DANO-ADD" as specified by the C++ standard. charsets.pop(33, None) charsets.pop(34, None) +# This is not an official IANA alias, but we include it in the +# implementation-defined superset of aliases for US-ASCII. +# See also LWG 4043. +extra_aliases = {3: ["ASCII"]} + count = 0 for mib in sorted(charsets.keys()): names = charsets[mib] @@ -64,6 +81,11 @@ for mib in sorted(charsets.keys()): for name in names: print(' {{ {:4}, "{}" }},'.format(mib, name)) count += len(names) + if mib in extra_aliases: + names = extra_aliases[mib] + for name in names: + print(' {{ {:4}, "{}" }}, // libstdc++ extension'.format(mib, name)) + count += len(names) # gives an error if this macro is left defined. # Do this last, so that the generated output is not usable unless we reach here. diff --git a/libstdc++-v3/testsuite/std/text_encoding/cons.cc b/libstdc++-v3/testsuite/std/text_encoding/cons.cc index b9d93641de4..8fcc2ec8c3b 100644 --- a/libstdc++-v3/testsuite/std/text_encoding/cons.cc +++ b/libstdc++-v3/testsuite/std/text_encoding/cons.cc @@ -53,6 +53,11 @@ test_construct_by_name() VERIFY( e4.name() == s ); VERIFY( ! e4.aliases().empty() ); VERIFY( e4.aliases().front() == "US-ASCII"sv ); // primary name + + s = "ascii"; + std::text_encoding e5(s); + VERIFY( e5.mib() == std::text_encoding::ASCII ); + VERIFY( e5.name() == s ); } constexpr void -- 2.43.0