From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 8ACE63858D20 for ; Sat, 13 Jan 2024 12:48:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 8ACE63858D20 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 8ACE63858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705150125; cv=none; b=gBD86J7yUdMt6DxAqevG0WzECaUI5sUCsaaKSFd03/Sve180TvIl4yFuFf4jiOX1gzWip04Yd5VvMVjKJtYkEvyqIpnBAEG+GS7NM1RJOepvxt5jindpmwCrRXs2OR3E+zXI/k8YZ76w04IQQ6J5EH5f9g/JmZGg5pDsGRVmuqI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1705150125; c=relaxed/simple; bh=wD2QR5+Na8n58g7wQeOdkH6T3GlVUfqSANInIM873es=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=LZdBaT2NSwU4QzYCmNdPw36R0Z+UbFXyFDakLohq4KvotRmhkUWM7NUiSwGr5k9oEZAKn05Xue7Yvg6Oy1WqrBssyo+h+B4ZTT4G6uikP9SGCFJUKH6znuzeH6ou0koeI/zow2ezXLkLhe7hqwBH05kcSqPIui59cGNUYXdWSD0= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705150118; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NBrawY9ikRXA38mKUNdRPhr4QqA33QWP2aOVCOaUGAA=; b=SiFUrDKXPm4cdQblwpTeV10Itue5hy8xhpVZgcoC6Ah3OuriDQRvWC+00Q9knkB0IygFdb 01fRA/L3j+x7DqP/rS4GjIlWePMvonrHjly5C6MhZhdnJjPCmV2pbul4b6dBkanwKFXF8Q mj3oYbqbwo81lP+YC2vYbGETdFl/YCU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-261-YKWgF-GDP_mHt11ID1sljQ-1; Sat, 13 Jan 2024 07:48:36 -0500 X-MC-Unique: YKWgF-GDP_mHt11ID1sljQ-1 Received: from smtp.corp.redhat.com (int-mx09.intmail.prod.int.rdu2.redhat.com [10.11.54.9]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 4E9D1101A52A; Sat, 13 Jan 2024 12:48:36 +0000 (UTC) Received: from localhost (unknown [10.42.28.185]) by smtp.corp.redhat.com (Postfix) with ESMTP id C12D4492BC6; Sat, 13 Jan 2024 12:48:35 +0000 (UTC) From: Jonathan Wakely To: libstdc++@gcc.gnu.org, gcc-patches@gcc.gnu.org Subject: [PATCH v2] libstdc++: Implement C++26 std::text_encoding [PR113318] Date: Sat, 13 Jan 2024 12:44:01 +0000 Message-ID: <20240113124834.1296437-1-jwakely@redhat.com> In-Reply-To: <20240112224145.1090544-1-jwakely@redhat.com> References: <20240112224145.1090544-1-jwakely@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.9 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-13.2 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,GIT_PATCH_0,KAM_SHORT,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Patch v2, with more tests. This fixes a bug in lookup by name where if an alias was matched then the aliases() view wouldn't contain the primary name or other aliases earlier in the table. It also optimizes text_encoding::environment_is so it constructs an encoding by ID and then only matches the environment's encoding against that object's aliases, not every string in the table. There's also a fast path for text_encoding("UTF-8") so we don't search the table for that case. More tests still needed... -- >8 -- libstdc++-v3/ChangeLog: PR libstdc++/113318 * acinclude.m4 (GLIBCXX_CHECK_TEXT_ENCODING): Define. * config.h.in: Regenerate. * configure: Regenerate. * configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING. * include/Makefile.am: Add new headers. * include/Makefile.in: Regenerate. * include/bits/locale_classes.h (locale::encoding): Declare new member function. * include/bits/unicode.h (__charset_alias_match): New function. * include/bits/text_encoding-data.h: New file. * include/bits/version.def (text_encoding): Define. * include/bits/version.h: Regenerate. * include/std/text_encoding: New file. * src/c++26/Makefile.am: Add test_encoding.cc. * src/c++26/Makefile.in: Regenerate. * src/c++26/text_encoding.cc: New file. * python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New printer. * scripts/gen_text_encoding_data.py: New file. * testsuite/ext/unicode/charset_alias_match.cc: New test. * testsuite/std/text_encoding/cons.cc: New test. * testsuite/std/text_encoding/requirements.cc: New test. --- libstdc++-v3/acinclude.m4 | 28 + libstdc++-v3/config.h.in | 3 + libstdc++-v3/configure | 54 ++ libstdc++-v3/configure.ac | 3 + libstdc++-v3/include/Makefile.am | 2 + libstdc++-v3/include/Makefile.in | 2 + libstdc++-v3/include/bits/locale_classes.h | 14 + .../include/bits/text_encoding-data.h | 902 ++++++++++++++++++ libstdc++-v3/include/bits/unicode.h | 157 ++- libstdc++-v3/include/bits/version.def | 10 + libstdc++-v3/include/bits/version.h | 13 +- libstdc++-v3/include/std/text_encoding | 704 ++++++++++++++ libstdc++-v3/python/libstdcxx/v6/printers.py | 17 + .../scripts/gen_text_encoding_data.py | 70 ++ libstdc++-v3/src/c++26/Makefile.am | 2 +- libstdc++-v3/src/c++26/Makefile.in | 4 +- libstdc++-v3/src/c++26/text_encoding.cc | 91 ++ .../ext/unicode/charset_alias_match.cc | 18 + .../testsuite/std/text_encoding/cons.cc | 106 ++ .../std/text_encoding/requirements.cc | 31 + 20 files changed, 2226 insertions(+), 5 deletions(-) create mode 100644 libstdc++-v3/include/bits/text_encoding-data.h create mode 100644 libstdc++-v3/include/std/text_encoding create mode 100755 libstdc++-v3/scripts/gen_text_encoding_data.py create mode 100644 libstdc++-v3/src/c++26/text_encoding.cc create mode 100644 libstdc++-v3/testsuite/ext/unicode/charset_alias_match.cc create mode 100644 libstdc++-v3/testsuite/std/text_encoding/cons.cc create mode 100644 libstdc++-v3/testsuite/std/text_encoding/requirements.cc diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4 index aa2cc4af52b..f9ba7ef744b 100644 --- a/libstdc++-v3/acinclude.m4 +++ b/libstdc++-v3/acinclude.m4 @@ -5821,6 +5821,34 @@ AC_LANG_SAVE AC_LANG_RESTORE ]) +dnl +dnl Check whether the dependencies for std::text_encoding are available. +dnl +dnl Defines: +dnl _GLIBCXX_USE_NL_LANGINFO_L if nl_langinfo_l is in . +dnl +AC_DEFUN([GLIBCXX_CHECK_TEXT_ENCODING], [ +AC_LANG_SAVE + AC_LANG_CPLUSPLUS + + AC_MSG_CHECKING([whether nl_langinfo_l is defined in ]) + AC_TRY_COMPILE([ + #include + #include + ],[ + locale_t loc = newlocale(LC_ALL_MASK, "", (locale_t)0); + const char* enc = nl_langinfo_l(CODESET, loc); + freelocale(loc); + ], [ac_nl_langinfo_l=yes], [ac_nl_langinfo_l=no]) + AC_MSG_RESULT($ac_nl_langinfo_l) + if test "$ac_nl_langinfo_l" = yes; then + AC_DEFINE_UNQUOTED(_GLIBCXX_USE_NL_LANGINFO_L, 1, + [Define if nl_langinfo_l should be used for std::text_encoding.]) + fi + + AC_LANG_RESTORE +]) + # Macros from the top-level gcc directory. m4_include([../config/gc++filt.m4]) m4_include([../config/tls.m4]) diff --git a/libstdc++-v3/configure.ac b/libstdc++-v3/configure.ac index c8b36333019..c68cac4f345 100644 --- a/libstdc++-v3/configure.ac +++ b/libstdc++-v3/configure.ac @@ -557,6 +557,9 @@ GLIBCXX_CHECK_INIT_PRIORITY # For __basic_file::native_handle() GLIBCXX_CHECK_FILEBUF_NATIVE_HANDLES +# For std::text_encoding +GLIBCXX_CHECK_TEXT_ENCODING + # Define documentation rules conditionally. # See if makeinfo has been installed and is modern enough diff --git a/libstdc++-v3/include/Makefile.am b/libstdc++-v3/include/Makefile.am index f9957c9de73..c1204fa5818 100644 --- a/libstdc++-v3/include/Makefile.am +++ b/libstdc++-v3/include/Makefile.am @@ -105,6 +105,7 @@ std_headers = \ ${std_srcdir}/streambuf \ ${std_srcdir}/string \ ${std_srcdir}/system_error \ + ${std_srcdir}/text_encoding \ ${std_srcdir}/thread \ ${std_srcdir}/unordered_map \ ${std_srcdir}/unordered_set \ @@ -160,6 +161,7 @@ bits_freestanding = \ ${bits_srcdir}/stl_raw_storage_iter.h \ ${bits_srcdir}/stl_relops.h \ ${bits_srcdir}/stl_uninitialized.h \ + ${bits_srcdir}/text_encoding-data.h \ ${bits_srcdir}/version.h \ ${bits_srcdir}/string_view.tcc \ ${bits_srcdir}/unicode.h \ diff --git a/libstdc++-v3/include/bits/locale_classes.h b/libstdc++-v3/include/bits/locale_classes.h index 621f2a29f50..a2e94217006 100644 --- a/libstdc++-v3/include/bits/locale_classes.h +++ b/libstdc++-v3/include/bits/locale_classes.h @@ -40,6 +40,10 @@ #include #include +#ifdef __glibcxx_text_encoding +#include +#endif + namespace std _GLIBCXX_VISIBILITY(default) { _GLIBCXX_BEGIN_NAMESPACE_VERSION @@ -248,6 +252,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION string name() const; +#ifdef __glibcxx_text_encoding +# if __CHAR_BIT__ == 8 + text_encoding + encoding() const; +# else + text_encoding + encoding() const = delete; +# endif +#endif + /** * @brief Locale equality. * diff --git a/libstdc++-v3/include/bits/text_encoding-data.h b/libstdc++-v3/include/bits/text_encoding-data.h new file mode 100644 index 00000000000..7ac2e9dc3d9 --- /dev/null +++ b/libstdc++-v3/include/bits/text_encoding-data.h @@ -0,0 +1,902 @@ +// Generated by gen_text_encoding_data.py, do not edit. + +#ifndef _GLIBCXX_GET_ENCODING_DATA +# error "This is not a public header, do not include it directly" +#endif + + { 3, "US-ASCII" }, + { 3, "iso-ir-6" }, + { 3, "ANSI_X3.4-1968" }, + { 3, "ANSI_X3.4-1986" }, + { 3, "ISO_646.irv:1991" }, + { 3, "ISO646-US" }, + { 3, "us" }, + { 3, "IBM367" }, + { 3, "cp367" }, + { 3, "csASCII" }, + { 4, "ISO_8859-1:1987" }, + { 4, "iso-ir-100" }, + { 4, "ISO_8859-1" }, + { 4, "ISO-8859-1" }, + { 4, "latin1" }, + { 4, "l1" }, + { 4, "IBM819" }, + { 4, "CP819" }, + { 4, "csISOLatin1" }, + { 5, "ISO_8859-2:1987" }, + { 5, "iso-ir-101" }, + { 5, "ISO_8859-2" }, + { 5, "ISO-8859-2" }, + { 5, "latin2" }, + { 5, "l2" }, + { 5, "csISOLatin2" }, + { 6, "ISO_8859-3:1988" }, + { 6, "iso-ir-109" }, + { 6, "ISO_8859-3" }, + { 6, "ISO-8859-3" }, + { 6, "latin3" }, + { 6, "l3" }, + { 6, "csISOLatin3" }, + { 7, "ISO_8859-4:1988" }, + { 7, "iso-ir-110" }, + { 7, "ISO_8859-4" }, + { 7, "ISO-8859-4" }, + { 7, "latin4" }, + { 7, "l4" }, + { 7, "csISOLatin4" }, + { 8, "ISO_8859-5:1988" }, + { 8, "iso-ir-144" }, + { 8, "ISO_8859-5" }, + { 8, "ISO-8859-5" }, + { 8, "cyrillic" }, + { 8, "csISOLatinCyrillic" }, + { 9, "ISO_8859-6:1987" }, + { 9, "iso-ir-127" }, + { 9, "ISO_8859-6" }, + { 9, "ISO-8859-6" }, + { 9, "ECMA-114" }, + { 9, "ASMO-708" }, + { 9, "arabic" }, + { 9, "csISOLatinArabic" }, + { 10, "ISO_8859-7:1987" }, + { 10, "iso-ir-126" }, + { 10, "ISO_8859-7" }, + { 10, "ISO-8859-7" }, + { 10, "ELOT_928" }, + { 10, "ECMA-118" }, + { 10, "greek" }, + { 10, "greek8" }, + { 10, "csISOLatinGreek" }, + { 11, "ISO_8859-8:1988" }, + { 11, "iso-ir-138" }, + { 11, "ISO_8859-8" }, + { 11, "ISO-8859-8" }, + { 11, "hebrew" }, + { 11, "csISOLatinHebrew" }, + { 12, "ISO_8859-9:1989" }, + { 12, "iso-ir-148" }, + { 12, "ISO_8859-9" }, + { 12, "ISO-8859-9" }, + { 12, "latin5" }, + { 12, "l5" }, + { 12, "csISOLatin5" }, + { 13, "ISO-8859-10" }, + { 13, "iso-ir-157" }, + { 13, "l6" }, + { 13, "ISO_8859-10:1992" }, + { 13, "csISOLatin6" }, + { 13, "latin6" }, + { 14, "ISO_6937-2-add" }, + { 14, "iso-ir-142" }, + { 14, "csISOTextComm" }, + { 15, "JIS_X0201" }, + { 15, "X0201" }, + { 15, "csHalfWidthKatakana" }, + { 16, "JIS_Encoding" }, + { 16, "csJISEncoding" }, + { 17, "Shift_JIS" }, + { 17, "MS_Kanji" }, + { 17, "csShiftJIS" }, + { 18, "Extended_UNIX_Code_Packed_Format_for_Japanese" }, + { 18, "csEUCPkdFmtJapanese" }, + { 18, "EUC-JP" }, + { 19, "Extended_UNIX_Code_Fixed_Width_for_Japanese" }, + { 19, "csEUCFixWidJapanese" }, + { 20, "BS_4730" }, + { 20, "iso-ir-4" }, + { 20, "ISO646-GB" }, + { 20, "gb" }, + { 20, "uk" }, + { 20, "csISO4UnitedKingdom" }, + { 21, "SEN_850200_C" }, + { 21, "iso-ir-11" }, + { 21, "ISO646-SE2" }, + { 21, "se2" }, + { 21, "csISO11SwedishForNames" }, + { 22, "IT" }, + { 22, "iso-ir-15" }, + { 22, "ISO646-IT" }, + { 22, "csISO15Italian" }, + { 23, "ES" }, + { 23, "iso-ir-17" }, + { 23, "ISO646-ES" }, + { 23, "csISO17Spanish" }, + { 24, "DIN_66003" }, + { 24, "iso-ir-21" }, + { 24, "de" }, + { 24, "ISO646-DE" }, + { 24, "csISO21German" }, + { 25, "NS_4551-1" }, + { 25, "iso-ir-60" }, + { 25, "ISO646-NO" }, + { 25, "no" }, + { 25, "csISO60DanishNorwegian" }, + { 25, "csISO60Norwegian1" }, + { 26, "NF_Z_62-010" }, + { 26, "iso-ir-69" }, + { 26, "ISO646-FR" }, + { 26, "fr" }, + { 26, "csISO69French" }, + { 27, "ISO-10646-UTF-1" }, + { 27, "csISO10646UTF1" }, + { 28, "ISO_646.basic:1983" }, + { 28, "ref" }, + { 28, "csISO646basic1983" }, + { 29, "INVARIANT" }, + { 29, "csINVARIANT" }, + { 30, "ISO_646.irv:1983" }, + { 30, "iso-ir-2" }, + { 30, "irv" }, + { 30, "csISO2IntlRefVersion" }, + { 31, "NATS-SEFI" }, + { 31, "iso-ir-8-1" }, + { 31, "csNATSSEFI" }, + { 32, "NATS-SEFI-ADD" }, + { 32, "iso-ir-8-2" }, + { 32, "csNATSSEFIADD" }, + { 35, "SEN_850200_B" }, + { 35, "iso-ir-10" }, + { 35, "FI" }, + { 35, "ISO646-FI" }, + { 35, "ISO646-SE" }, + { 35, "se" }, + { 35, "csISO10Swedish" }, + { 36, "KS_C_5601-1987" }, + { 36, "iso-ir-149" }, + { 36, "KS_C_5601-1989" }, + { 36, "KSC_5601" }, + { 36, "korean" }, + { 36, "csKSC56011987" }, + { 37, "ISO-2022-KR" }, + { 37, "csISO2022KR" }, + { 38, "EUC-KR" }, + { 38, "csEUCKR" }, + { 39, "ISO-2022-JP" }, + { 39, "csISO2022JP" }, + { 40, "ISO-2022-JP-2" }, + { 40, "csISO2022JP2" }, + { 41, "JIS_C6220-1969-jp" }, + { 41, "JIS_C6220-1969" }, + { 41, "iso-ir-13" }, + { 41, "katakana" }, + { 41, "x0201-7" }, + { 41, "csISO13JISC6220jp" }, + { 42, "JIS_C6220-1969-ro" }, + { 42, "iso-ir-14" }, + { 42, "jp" }, + { 42, "ISO646-JP" }, + { 42, "csISO14JISC6220ro" }, + { 43, "PT" }, + { 43, "iso-ir-16" }, + { 43, "ISO646-PT" }, + { 43, "csISO16Portuguese" }, + { 44, "greek7-old" }, + { 44, "iso-ir-18" }, + { 44, "csISO18Greek7Old" }, + { 45, "latin-greek" }, + { 45, "iso-ir-19" }, + { 45, "csISO19LatinGreek" }, + { 46, "NF_Z_62-010_(1973)" }, + { 46, "iso-ir-25" }, + { 46, "ISO646-FR1" }, + { 46, "csISO25French" }, + { 47, "Latin-greek-1" }, + { 47, "iso-ir-27" }, + { 47, "csISO27LatinGreek1" }, + { 48, "ISO_5427" }, + { 48, "iso-ir-37" }, + { 48, "csISO5427Cyrillic" }, + { 49, "JIS_C6226-1978" }, + { 49, "iso-ir-42" }, + { 49, "csISO42JISC62261978" }, + { 50, "BS_viewdata" }, + { 50, "iso-ir-47" }, + { 50, "csISO47BSViewdata" }, + { 51, "INIS" }, + { 51, "iso-ir-49" }, + { 51, "csISO49INIS" }, + { 52, "INIS-8" }, + { 52, "iso-ir-50" }, + { 52, "csISO50INIS8" }, + { 53, "INIS-cyrillic" }, + { 53, "iso-ir-51" }, + { 53, "csISO51INISCyrillic" }, + { 54, "ISO_5427:1981" }, + { 54, "iso-ir-54" }, + { 54, "ISO5427Cyrillic1981" }, + { 54, "csISO54271981" }, + { 55, "ISO_5428:1980" }, + { 55, "iso-ir-55" }, + { 55, "csISO5428Greek" }, + { 56, "GB_1988-80" }, + { 56, "iso-ir-57" }, + { 56, "cn" }, + { 56, "ISO646-CN" }, + { 56, "csISO57GB1988" }, + { 57, "GB_2312-80" }, + { 57, "iso-ir-58" }, + { 57, "chinese" }, + { 57, "csISO58GB231280" }, + { 58, "NS_4551-2" }, + { 58, "ISO646-NO2" }, + { 58, "iso-ir-61" }, + { 58, "no2" }, + { 58, "csISO61Norwegian2" }, + { 59, "videotex-suppl" }, + { 59, "iso-ir-70" }, + { 59, "csISO70VideotexSupp1" }, + { 60, "PT2" }, + { 60, "iso-ir-84" }, + { 60, "ISO646-PT2" }, + { 60, "csISO84Portuguese2" }, + { 61, "ES2" }, + { 61, "iso-ir-85" }, + { 61, "ISO646-ES2" }, + { 61, "csISO85Spanish2" }, + { 62, "MSZ_7795.3" }, + { 62, "iso-ir-86" }, + { 62, "ISO646-HU" }, + { 62, "hu" }, + { 62, "csISO86Hungarian" }, + { 63, "JIS_C6226-1983" }, + { 63, "iso-ir-87" }, + { 63, "x0208" }, + { 63, "JIS_X0208-1983" }, + { 63, "csISO87JISX0208" }, + { 64, "greek7" }, + { 64, "iso-ir-88" }, + { 64, "csISO88Greek7" }, + { 65, "ASMO_449" }, + { 65, "ISO_9036" }, + { 65, "arabic7" }, + { 65, "iso-ir-89" }, + { 65, "csISO89ASMO449" }, + { 66, "iso-ir-90" }, + { 66, "csISO90" }, + { 67, "JIS_C6229-1984-a" }, + { 67, "iso-ir-91" }, + { 67, "jp-ocr-a" }, + { 67, "csISO91JISC62291984a" }, + { 68, "JIS_C6229-1984-b" }, + { 68, "iso-ir-92" }, + { 68, "ISO646-JP-OCR-B" }, + { 68, "jp-ocr-b" }, + { 68, "csISO92JISC62991984b" }, + { 69, "JIS_C6229-1984-b-add" }, + { 69, "iso-ir-93" }, + { 69, "jp-ocr-b-add" }, + { 69, "csISO93JIS62291984badd" }, + { 70, "JIS_C6229-1984-hand" }, + { 70, "iso-ir-94" }, + { 70, "jp-ocr-hand" }, + { 70, "csISO94JIS62291984hand" }, + { 71, "JIS_C6229-1984-hand-add" }, + { 71, "iso-ir-95" }, + { 71, "jp-ocr-hand-add" }, + { 71, "csISO95JIS62291984handadd" }, + { 72, "JIS_C6229-1984-kana" }, + { 72, "iso-ir-96" }, + { 72, "csISO96JISC62291984kana" }, + { 73, "ISO_2033-1983" }, + { 73, "iso-ir-98" }, + { 73, "e13b" }, + { 73, "csISO2033" }, + { 74, "ANSI_X3.110-1983" }, + { 74, "iso-ir-99" }, + { 74, "CSA_T500-1983" }, + { 74, "NAPLPS" }, + { 74, "csISO99NAPLPS" }, + { 75, "T.61-7bit" }, + { 75, "iso-ir-102" }, + { 75, "csISO102T617bit" }, + { 76, "T.61-8bit" }, + { 76, "T.61" }, + { 76, "iso-ir-103" }, + { 76, "csISO103T618bit" }, + { 77, "ECMA-cyrillic" }, + { 77, "iso-ir-111" }, + { 77, "KOI8-E" }, + { 77, "csISO111ECMACyrillic" }, + { 78, "CSA_Z243.4-1985-1" }, + { 78, "iso-ir-121" }, + { 78, "ISO646-CA" }, + { 78, "csa7-1" }, + { 78, "csa71" }, + { 78, "ca" }, + { 78, "csISO121Canadian1" }, + { 79, "CSA_Z243.4-1985-2" }, + { 79, "iso-ir-122" }, + { 79, "ISO646-CA2" }, + { 79, "csa7-2" }, + { 79, "csa72" }, + { 79, "csISO122Canadian2" }, + { 80, "CSA_Z243.4-1985-gr" }, + { 80, "iso-ir-123" }, + { 80, "csISO123CSAZ24341985gr" }, + { 81, "ISO_8859-6-E" }, + { 81, "csISO88596E" }, + { 81, "ISO-8859-6-E" }, + { 82, "ISO_8859-6-I" }, + { 82, "csISO88596I" }, + { 82, "ISO-8859-6-I" }, + { 83, "T.101-G2" }, + { 83, "iso-ir-128" }, + { 83, "csISO128T101G2" }, + { 84, "ISO_8859-8-E" }, + { 84, "csISO88598E" }, + { 84, "ISO-8859-8-E" }, + { 85, "ISO_8859-8-I" }, + { 85, "csISO88598I" }, + { 85, "ISO-8859-8-I" }, + { 86, "CSN_369103" }, + { 86, "iso-ir-139" }, + { 86, "csISO139CSN369103" }, + { 87, "JUS_I.B1.002" }, + { 87, "iso-ir-141" }, + { 87, "ISO646-YU" }, + { 87, "js" }, + { 87, "yu" }, + { 87, "csISO141JUSIB1002" }, + { 88, "IEC_P27-1" }, + { 88, "iso-ir-143" }, + { 88, "csISO143IECP271" }, + { 89, "JUS_I.B1.003-serb" }, + { 89, "iso-ir-146" }, + { 89, "serbian" }, + { 89, "csISO146Serbian" }, + { 90, "JUS_I.B1.003-mac" }, + { 90, "macedonian" }, + { 90, "iso-ir-147" }, + { 90, "csISO147Macedonian" }, + { 91, "greek-ccitt" }, + { 91, "iso-ir-150" }, + { 91, "csISO150" }, + { 91, "csISO150GreekCCITT" }, + { 92, "NC_NC00-10:81" }, + { 92, "cuba" }, + { 92, "iso-ir-151" }, + { 92, "ISO646-CU" }, + { 92, "csISO151Cuba" }, + { 93, "ISO_6937-2-25" }, + { 93, "iso-ir-152" }, + { 93, "csISO6937Add" }, + { 94, "GOST_19768-74" }, + { 94, "ST_SEV_358-88" }, + { 94, "iso-ir-153" }, + { 94, "csISO153GOST1976874" }, + { 95, "ISO_8859-supp" }, + { 95, "iso-ir-154" }, + { 95, "latin1-2-5" }, + { 95, "csISO8859Supp" }, + { 96, "ISO_10367-box" }, + { 96, "iso-ir-155" }, + { 96, "csISO10367Box" }, + { 97, "latin-lap" }, + { 97, "lap" }, + { 97, "iso-ir-158" }, + { 97, "csISO158Lap" }, + { 98, "JIS_X0212-1990" }, + { 98, "x0212" }, + { 98, "iso-ir-159" }, + { 98, "csISO159JISX02121990" }, + { 99, "DS_2089" }, + { 99, "DS2089" }, + { 99, "ISO646-DK" }, + { 99, "dk" }, + { 99, "csISO646Danish" }, + { 100, "us-dk" }, + { 100, "csUSDK" }, + { 101, "dk-us" }, + { 101, "csDKUS" }, + { 102, "KSC5636" }, + { 102, "ISO646-KR" }, + { 102, "csKSC5636" }, + { 103, "UNICODE-1-1-UTF-7" }, + { 103, "csUnicode11UTF7" }, + { 104, "ISO-2022-CN" }, + { 104, "csISO2022CN" }, + { 105, "ISO-2022-CN-EXT" }, + { 105, "csISO2022CNEXT" }, +#define _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET 413 + { 106, "UTF-8" }, + { 106, "csUTF8" }, + { 109, "ISO-8859-13" }, + { 109, "csISO885913" }, + { 110, "ISO-8859-14" }, + { 110, "iso-ir-199" }, + { 110, "ISO_8859-14:1998" }, + { 110, "ISO_8859-14" }, + { 110, "latin8" }, + { 110, "iso-celtic" }, + { 110, "l8" }, + { 110, "csISO885914" }, + { 111, "ISO-8859-15" }, + { 111, "ISO_8859-15" }, + { 111, "Latin-9" }, + { 111, "csISO885915" }, + { 112, "ISO-8859-16" }, + { 112, "iso-ir-226" }, + { 112, "ISO_8859-16:2001" }, + { 112, "ISO_8859-16" }, + { 112, "latin10" }, + { 112, "l10" }, + { 112, "csISO885916" }, + { 113, "GBK" }, + { 113, "CP936" }, + { 113, "MS936" }, + { 113, "windows-936" }, + { 113, "csGBK" }, + { 114, "GB18030" }, + { 114, "csGB18030" }, + { 115, "OSD_EBCDIC_DF04_15" }, + { 115, "csOSDEBCDICDF0415" }, + { 116, "OSD_EBCDIC_DF03_IRV" }, + { 116, "csOSDEBCDICDF03IRV" }, + { 117, "OSD_EBCDIC_DF04_1" }, + { 117, "csOSDEBCDICDF041" }, + { 118, "ISO-11548-1" }, + { 118, "ISO_11548-1" }, + { 118, "ISO_TR_11548-1" }, + { 118, "csISO115481" }, + { 119, "KZ-1048" }, + { 119, "STRK1048-2002" }, + { 119, "RK1048" }, + { 119, "csKZ1048" }, + { 1000, "ISO-10646-UCS-2" }, + { 1000, "csUnicode" }, + { 1001, "ISO-10646-UCS-4" }, + { 1001, "csUCS4" }, + { 1002, "ISO-10646-UCS-Basic" }, + { 1002, "csUnicodeASCII" }, + { 1003, "ISO-10646-Unicode-Latin1" }, + { 1003, "csUnicodeLatin1" }, + { 1003, "ISO-10646" }, + { 1004, "ISO-10646-J-1" }, + { 1004, "csUnicodeJapanese" }, + { 1005, "ISO-Unicode-IBM-1261" }, + { 1005, "csUnicodeIBM1261" }, + { 1006, "ISO-Unicode-IBM-1268" }, + { 1006, "csUnicodeIBM1268" }, + { 1007, "ISO-Unicode-IBM-1276" }, + { 1007, "csUnicodeIBM1276" }, + { 1008, "ISO-Unicode-IBM-1264" }, + { 1008, "csUnicodeIBM1264" }, + { 1009, "ISO-Unicode-IBM-1265" }, + { 1009, "csUnicodeIBM1265" }, + { 1010, "UNICODE-1-1" }, + { 1010, "csUnicode11" }, + { 1011, "SCSU" }, + { 1011, "csSCSU" }, + { 1012, "UTF-7" }, + { 1012, "csUTF7" }, + { 1013, "UTF-16BE" }, + { 1013, "csUTF16BE" }, + { 1014, "UTF-16LE" }, + { 1014, "csUTF16LE" }, + { 1015, "UTF-16" }, + { 1015, "csUTF16" }, + { 1016, "CESU-8" }, + { 1016, "csCESU8" }, + { 1016, "csCESU-8" }, + { 1017, "UTF-32" }, + { 1017, "csUTF32" }, + { 1018, "UTF-32BE" }, + { 1018, "csUTF32BE" }, + { 1019, "UTF-32LE" }, + { 1019, "csUTF32LE" }, + { 1020, "BOCU-1" }, + { 1020, "csBOCU1" }, + { 1020, "csBOCU-1" }, + { 1021, "UTF-7-IMAP" }, + { 1021, "csUTF7IMAP" }, + { 2000, "ISO-8859-1-Windows-3.0-Latin-1" }, + { 2000, "csWindows30Latin1" }, + { 2001, "ISO-8859-1-Windows-3.1-Latin-1" }, + { 2001, "csWindows31Latin1" }, + { 2002, "ISO-8859-2-Windows-Latin-2" }, + { 2002, "csWindows31Latin2" }, + { 2003, "ISO-8859-9-Windows-Latin-5" }, + { 2003, "csWindows31Latin5" }, + { 2004, "hp-roman8" }, + { 2004, "roman8" }, + { 2004, "r8" }, + { 2004, "csHPRoman8" }, + { 2005, "Adobe-Standard-Encoding" }, + { 2005, "csAdobeStandardEncoding" }, + { 2006, "Ventura-US" }, + { 2006, "csVenturaUS" }, + { 2007, "Ventura-International" }, + { 2007, "csVenturaInternational" }, + { 2008, "DEC-MCS" }, + { 2008, "dec" }, + { 2008, "csDECMCS" }, + { 2009, "IBM850" }, + { 2009, "cp850" }, + { 2009, "850" }, + { 2009, "csPC850Multilingual" }, + { 2010, "IBM852" }, + { 2010, "cp852" }, + { 2010, "852" }, + { 2010, "csPCp852" }, + { 2011, "IBM437" }, + { 2011, "cp437" }, + { 2011, "437" }, + { 2011, "csPC8CodePage437" }, + { 2012, "PC8-Danish-Norwegian" }, + { 2012, "csPC8DanishNorwegian" }, + { 2013, "IBM862" }, + { 2013, "cp862" }, + { 2013, "862" }, + { 2013, "csPC862LatinHebrew" }, + { 2014, "PC8-Turkish" }, + { 2014, "csPC8Turkish" }, + { 2015, "IBM-Symbols" }, + { 2015, "csIBMSymbols" }, + { 2016, "IBM-Thai" }, + { 2016, "csIBMThai" }, + { 2017, "HP-Legal" }, + { 2017, "csHPLegal" }, + { 2018, "HP-Pi-font" }, + { 2018, "csHPPiFont" }, + { 2019, "HP-Math8" }, + { 2019, "csHPMath8" }, + { 2020, "Adobe-Symbol-Encoding" }, + { 2020, "csHPPSMath" }, + { 2021, "HP-DeskTop" }, + { 2021, "csHPDesktop" }, + { 2022, "Ventura-Math" }, + { 2022, "csVenturaMath" }, + { 2023, "Microsoft-Publishing" }, + { 2023, "csMicrosoftPublishing" }, + { 2024, "Windows-31J" }, + { 2024, "csWindows31J" }, + { 2025, "GB2312" }, + { 2025, "csGB2312" }, + { 2026, "Big5" }, + { 2026, "csBig5" }, + { 2027, "macintosh" }, + { 2027, "mac" }, + { 2027, "csMacintosh" }, + { 2028, "IBM037" }, + { 2028, "cp037" }, + { 2028, "ebcdic-cp-us" }, + { 2028, "ebcdic-cp-ca" }, + { 2028, "ebcdic-cp-wt" }, + { 2028, "ebcdic-cp-nl" }, + { 2028, "csIBM037" }, + { 2029, "IBM038" }, + { 2029, "EBCDIC-INT" }, + { 2029, "cp038" }, + { 2029, "csIBM038" }, + { 2030, "IBM273" }, + { 2030, "CP273" }, + { 2030, "csIBM273" }, + { 2031, "IBM274" }, + { 2031, "EBCDIC-BE" }, + { 2031, "CP274" }, + { 2031, "csIBM274" }, + { 2032, "IBM275" }, + { 2032, "EBCDIC-BR" }, + { 2032, "cp275" }, + { 2032, "csIBM275" }, + { 2033, "IBM277" }, + { 2033, "EBCDIC-CP-DK" }, + { 2033, "EBCDIC-CP-NO" }, + { 2033, "csIBM277" }, + { 2034, "IBM278" }, + { 2034, "CP278" }, + { 2034, "ebcdic-cp-fi" }, + { 2034, "ebcdic-cp-se" }, + { 2034, "csIBM278" }, + { 2035, "IBM280" }, + { 2035, "CP280" }, + { 2035, "ebcdic-cp-it" }, + { 2035, "csIBM280" }, + { 2036, "IBM281" }, + { 2036, "EBCDIC-JP-E" }, + { 2036, "cp281" }, + { 2036, "csIBM281" }, + { 2037, "IBM284" }, + { 2037, "CP284" }, + { 2037, "ebcdic-cp-es" }, + { 2037, "csIBM284" }, + { 2038, "IBM285" }, + { 2038, "CP285" }, + { 2038, "ebcdic-cp-gb" }, + { 2038, "csIBM285" }, + { 2039, "IBM290" }, + { 2039, "cp290" }, + { 2039, "EBCDIC-JP-kana" }, + { 2039, "csIBM290" }, + { 2040, "IBM297" }, + { 2040, "cp297" }, + { 2040, "ebcdic-cp-fr" }, + { 2040, "csIBM297" }, + { 2041, "IBM420" }, + { 2041, "cp420" }, + { 2041, "ebcdic-cp-ar1" }, + { 2041, "csIBM420" }, + { 2042, "IBM423" }, + { 2042, "cp423" }, + { 2042, "ebcdic-cp-gr" }, + { 2042, "csIBM423" }, + { 2043, "IBM424" }, + { 2043, "cp424" }, + { 2043, "ebcdic-cp-he" }, + { 2043, "csIBM424" }, + { 2044, "IBM500" }, + { 2044, "CP500" }, + { 2044, "ebcdic-cp-be" }, + { 2044, "ebcdic-cp-ch" }, + { 2044, "csIBM500" }, + { 2045, "IBM851" }, + { 2045, "cp851" }, + { 2045, "851" }, + { 2045, "csIBM851" }, + { 2046, "IBM855" }, + { 2046, "cp855" }, + { 2046, "855" }, + { 2046, "csIBM855" }, + { 2047, "IBM857" }, + { 2047, "cp857" }, + { 2047, "857" }, + { 2047, "csIBM857" }, + { 2048, "IBM860" }, + { 2048, "cp860" }, + { 2048, "860" }, + { 2048, "csIBM860" }, + { 2049, "IBM861" }, + { 2049, "cp861" }, + { 2049, "861" }, + { 2049, "cp-is" }, + { 2049, "csIBM861" }, + { 2050, "IBM863" }, + { 2050, "cp863" }, + { 2050, "863" }, + { 2050, "csIBM863" }, + { 2051, "IBM864" }, + { 2051, "cp864" }, + { 2051, "csIBM864" }, + { 2052, "IBM865" }, + { 2052, "cp865" }, + { 2052, "865" }, + { 2052, "csIBM865" }, + { 2053, "IBM868" }, + { 2053, "CP868" }, + { 2053, "cp-ar" }, + { 2053, "csIBM868" }, + { 2054, "IBM869" }, + { 2054, "cp869" }, + { 2054, "869" }, + { 2054, "cp-gr" }, + { 2054, "csIBM869" }, + { 2055, "IBM870" }, + { 2055, "CP870" }, + { 2055, "ebcdic-cp-roece" }, + { 2055, "ebcdic-cp-yu" }, + { 2055, "csIBM870" }, + { 2056, "IBM871" }, + { 2056, "CP871" }, + { 2056, "ebcdic-cp-is" }, + { 2056, "csIBM871" }, + { 2057, "IBM880" }, + { 2057, "cp880" }, + { 2057, "EBCDIC-Cyrillic" }, + { 2057, "csIBM880" }, + { 2058, "IBM891" }, + { 2058, "cp891" }, + { 2058, "csIBM891" }, + { 2059, "IBM903" }, + { 2059, "cp903" }, + { 2059, "csIBM903" }, + { 2060, "IBM904" }, + { 2060, "cp904" }, + { 2060, "904" }, + { 2060, "csIBBM904" }, + { 2061, "IBM905" }, + { 2061, "CP905" }, + { 2061, "ebcdic-cp-tr" }, + { 2061, "csIBM905" }, + { 2062, "IBM918" }, + { 2062, "CP918" }, + { 2062, "ebcdic-cp-ar2" }, + { 2062, "csIBM918" }, + { 2063, "IBM1026" }, + { 2063, "CP1026" }, + { 2063, "csIBM1026" }, + { 2064, "EBCDIC-AT-DE" }, + { 2064, "csIBMEBCDICATDE" }, + { 2065, "EBCDIC-AT-DE-A" }, + { 2065, "csEBCDICATDEA" }, + { 2066, "EBCDIC-CA-FR" }, + { 2066, "csEBCDICCAFR" }, + { 2067, "EBCDIC-DK-NO" }, + { 2067, "csEBCDICDKNO" }, + { 2068, "EBCDIC-DK-NO-A" }, + { 2068, "csEBCDICDKNOA" }, + { 2069, "EBCDIC-FI-SE" }, + { 2069, "csEBCDICFISE" }, + { 2070, "EBCDIC-FI-SE-A" }, + { 2070, "csEBCDICFISEA" }, + { 2071, "EBCDIC-FR" }, + { 2071, "csEBCDICFR" }, + { 2072, "EBCDIC-IT" }, + { 2072, "csEBCDICIT" }, + { 2073, "EBCDIC-PT" }, + { 2073, "csEBCDICPT" }, + { 2074, "EBCDIC-ES" }, + { 2074, "csEBCDICES" }, + { 2075, "EBCDIC-ES-A" }, + { 2075, "csEBCDICESA" }, + { 2076, "EBCDIC-ES-S" }, + { 2076, "csEBCDICESS" }, + { 2077, "EBCDIC-UK" }, + { 2077, "csEBCDICUK" }, + { 2078, "EBCDIC-US" }, + { 2078, "csEBCDICUS" }, + { 2079, "UNKNOWN-8BIT" }, + { 2079, "csUnknown8BiT" }, + { 2080, "MNEMONIC" }, + { 2080, "csMnemonic" }, + { 2081, "MNEM" }, + { 2081, "csMnem" }, + { 2082, "VISCII" }, + { 2082, "csVISCII" }, + { 2083, "VIQR" }, + { 2083, "csVIQR" }, + { 2084, "KOI8-R" }, + { 2084, "csKOI8R" }, + { 2085, "HZ-GB-2312" }, + { 2086, "IBM866" }, + { 2086, "cp866" }, + { 2086, "866" }, + { 2086, "csIBM866" }, + { 2087, "IBM775" }, + { 2087, "cp775" }, + { 2087, "csPC775Baltic" }, + { 2088, "KOI8-U" }, + { 2088, "csKOI8U" }, + { 2089, "IBM00858" }, + { 2089, "CCSID00858" }, + { 2089, "CP00858" }, + { 2089, "PC-Multilingual-850+euro" }, + { 2089, "csIBM00858" }, + { 2090, "IBM00924" }, + { 2090, "CCSID00924" }, + { 2090, "CP00924" }, + { 2090, "ebcdic-Latin9--euro" }, + { 2090, "csIBM00924" }, + { 2091, "IBM01140" }, + { 2091, "CCSID01140" }, + { 2091, "CP01140" }, + { 2091, "ebcdic-us-37+euro" }, + { 2091, "csIBM01140" }, + { 2092, "IBM01141" }, + { 2092, "CCSID01141" }, + { 2092, "CP01141" }, + { 2092, "ebcdic-de-273+euro" }, + { 2092, "csIBM01141" }, + { 2093, "IBM01142" }, + { 2093, "CCSID01142" }, + { 2093, "CP01142" }, + { 2093, "ebcdic-dk-277+euro" }, + { 2093, "ebcdic-no-277+euro" }, + { 2093, "csIBM01142" }, + { 2094, "IBM01143" }, + { 2094, "CCSID01143" }, + { 2094, "CP01143" }, + { 2094, "ebcdic-fi-278+euro" }, + { 2094, "ebcdic-se-278+euro" }, + { 2094, "csIBM01143" }, + { 2095, "IBM01144" }, + { 2095, "CCSID01144" }, + { 2095, "CP01144" }, + { 2095, "ebcdic-it-280+euro" }, + { 2095, "csIBM01144" }, + { 2096, "IBM01145" }, + { 2096, "CCSID01145" }, + { 2096, "CP01145" }, + { 2096, "ebcdic-es-284+euro" }, + { 2096, "csIBM01145" }, + { 2097, "IBM01146" }, + { 2097, "CCSID01146" }, + { 2097, "CP01146" }, + { 2097, "ebcdic-gb-285+euro" }, + { 2097, "csIBM01146" }, + { 2098, "IBM01147" }, + { 2098, "CCSID01147" }, + { 2098, "CP01147" }, + { 2098, "ebcdic-fr-297+euro" }, + { 2098, "csIBM01147" }, + { 2099, "IBM01148" }, + { 2099, "CCSID01148" }, + { 2099, "CP01148" }, + { 2099, "ebcdic-international-500+euro" }, + { 2099, "csIBM01148" }, + { 2100, "IBM01149" }, + { 2100, "CCSID01149" }, + { 2100, "CP01149" }, + { 2100, "ebcdic-is-871+euro" }, + { 2100, "csIBM01149" }, + { 2101, "Big5-HKSCS" }, + { 2101, "csBig5HKSCS" }, + { 2102, "IBM1047" }, + { 2102, "IBM-1047" }, + { 2102, "csIBM1047" }, + { 2103, "PTCP154" }, + { 2103, "csPTCP154" }, + { 2103, "PT154" }, + { 2103, "CP154" }, + { 2103, "Cyrillic-Asian" }, + { 2104, "Amiga-1251" }, + { 2104, "Ami1251" }, + { 2104, "Amiga1251" }, + { 2104, "Ami-1251" }, + { 2104, "csAmiga1251" }, + { 2104, "(Aliases" }, + { 2104, "are" }, + { 2104, "provided" }, + { 2104, "for" }, + { 2104, "historical" }, + { 2104, "reasons" }, + { 2104, "and" }, + { 2104, "should" }, + { 2104, "not" }, + { 2104, "be" }, + { 2104, "used)" }, + { 2104, "[Malyshev]" }, + { 2105, "KOI7-switched" }, + { 2105, "csKOI7switched" }, + { 2106, "BRF" }, + { 2106, "csBRF" }, + { 2107, "TSCII" }, + { 2107, "csTSCII" }, + { 2108, "CP51932" }, + { 2108, "csCP51932" }, + { 2109, "windows-874" }, + { 2109, "cswindows874" }, + { 2250, "windows-1250" }, + { 2250, "cswindows1250" }, + { 2251, "windows-1251" }, + { 2251, "cswindows1251" }, + { 2252, "windows-1252" }, + { 2252, "cswindows1252" }, + { 2253, "windows-1253" }, + { 2253, "cswindows1253" }, + { 2254, "windows-1254" }, + { 2254, "cswindows1254" }, + { 2255, "windows-1255" }, + { 2255, "cswindows1255" }, + { 2256, "windows-1256" }, + { 2256, "cswindows1256" }, + { 2257, "windows-1257" }, + { 2257, "cswindows1257" }, + { 2258, "windows-1258" }, + { 2258, "cswindows1258" }, + { 2259, "TIS-620" }, + { 2259, "csTIS620" }, + { 2259, "ISO-8859-11" }, + { 2260, "CP50220" }, + { 2260, "csCP50220" }, + +#undef _GLIBCXX_GET_ENCODING_DATA diff --git a/libstdc++-v3/include/bits/unicode.h b/libstdc++-v3/include/bits/unicode.h index f1b2b359bdf..f531faa04d0 100644 --- a/libstdc++-v3/include/bits/unicode.h +++ b/libstdc++-v3/include/bits/unicode.h @@ -986,7 +986,7 @@ inline namespace __v15_1_0 return __n; } - template + template consteval bool __literal_encoding_is_unicode() { @@ -1056,6 +1056,161 @@ inline namespace __v15_1_0 __literal_encoding_is_utf8() { return __literal_encoding_is_unicode(); } + // https://www.unicode.org/reports/tr22/tr22-8.html#Charset_Alias_Matching + constexpr bool + __charset_alias_match(string_view __a, string_view __b) + { + auto __map = [](char __c, bool& __num) { + switch (__c) + { + case '0': + return __num ? __c : '\0'; + case '1': + case '2': + case '3': + case '4': + case '5': + case '6': + case '7': + case '8': + case '9': + __num = true; + return __c; + case 'A': + __c = 'a'; + break; + case 'B': + __c = 'b'; + break; + case 'C': + __c = 'c'; + break; + case 'D': + __c = 'd'; + break; + case 'E': + __c = 'e'; + break; + case 'F': + __c = 'f'; + break; + case 'G': + __c = 'g'; + break; + case 'H': + __c = 'h'; + break; + case 'I': + __c = 'i'; + break; + case 'J': + __c = 'j'; + break; + case 'K': + __c = 'k'; + break; + case 'L': + __c = 'l'; + break; + case 'M': + __c = 'm'; + break; + case 'N': + __c = 'n'; + break; + case 'O': + __c = 'o'; + break; + case 'P': + __c = 'p'; + break; + case 'Q': + __c = 'q'; + break; + case 'R': + __c = 'r'; + break; + case 'S': + __c = 's'; + break; + case 'T': + __c = 't'; + break; + case 'U': + __c = 'u'; + break; + case 'V': + __c = 'v'; + break; + case 'W': + __c = 'w'; + break; + case 'X': + __c = 'x'; + break; + case 'Y': + __c = 'y'; + break; + case 'Z': + __c = 'z'; + break; + case 'a': + case 'b': + case 'c': + case 'd': + case 'e': + case 'f': + case 'g': + case 'h': + case 'i': + case 'j': + case 'k': + case 'l': + case 'm': + case 'n': + case 'o': + case 'p': + case 'q': + case 'r': + case 's': + case 't': + case 'u': + case 'v': + case 'w': + case 'x': + case 'y': + case 'z': + break; + default: + __c = '\0'; + } + __num = false; + return __c; + }; + + auto __ptr_a = __a.begin(), __end_a = __a.end(); + auto __ptr_b = __b.begin(), __end_b = __b.end(); + bool __num_a = false, __num_b = false; + + while (true) + { + char __chr_a, __chr_b; + while (__ptr_a != __end_a && (__chr_a = __map(*__ptr_a, __num_a)) == 0) + ++__ptr_a; + while (__ptr_b != __end_b && (__chr_b = __map(*__ptr_b, __num_b)) == 0) + ++__ptr_b; + if (__ptr_a == __end_a) + return __ptr_b == __end_b; + if (__ptr_b == __end_b) + return false; + else if (__chr_a != __chr_b) + return false; // Found non-matching characters. + ++__ptr_a; + ++__ptr_b; + } + return true; + } + } // namespace __unicode _GLIBCXX_END_NAMESPACE_VERSION diff --git a/libstdc++-v3/include/bits/version.def b/libstdc++-v3/include/bits/version.def index 5de88f955aa..91d882edaa5 100644 --- a/libstdc++-v3/include/bits/version.def +++ b/libstdc++-v3/include/bits/version.def @@ -1756,6 +1756,16 @@ ftms = { }; }; +ftms = { + name = text_encoding; + values = { + v = 202306; + cxxmin = 26; + hosted = yes; + extra_cond = "_GLIBCXX_USE_NL_LANGINFO_L"; + }; +}; + ftms = { name = to_string; values = { diff --git a/libstdc++-v3/include/bits/version.h b/libstdc++-v3/include/bits/version.h index 00bb7a8cbd1..8436822a141 100644 --- a/libstdc++-v3/include/bits/version.h +++ b/libstdc++-v3/include/bits/version.h @@ -2142,6 +2142,17 @@ #undef __glibcxx_want_saturation_arithmetic // from version.def line 1760 +#if !defined(__cpp_lib_text_encoding) +# if (__cplusplus > 202302L) && _GLIBCXX_HOSTED && (_GLIBCXX_USE_NL_LANGINFO_L) +# define __glibcxx_text_encoding 202306L +# if defined(__glibcxx_want_all) || defined(__glibcxx_want_text_encoding) +# define __cpp_lib_text_encoding 202306L +# endif +# endif +#endif /* !defined(__cpp_lib_text_encoding) && defined(__glibcxx_want_text_encoding) */ +#undef __glibcxx_want_text_encoding + +// from version.def line 1770 #if !defined(__cpp_lib_to_string) # if (__cplusplus > 202302L) && _GLIBCXX_HOSTED && (__glibcxx_to_chars) # define __glibcxx_to_string 202306L @@ -2152,7 +2163,7 @@ #endif /* !defined(__cpp_lib_to_string) && defined(__glibcxx_want_to_string) */ #undef __glibcxx_want_to_string -// from version.def line 1770 +// from version.def line 1780 #if !defined(__cpp_lib_generator) # if (__cplusplus >= 202100L) && (__glibcxx_coroutine) # define __glibcxx_generator 202207L diff --git a/libstdc++-v3/include/std/text_encoding b/libstdc++-v3/include/std/text_encoding new file mode 100644 index 00000000000..df8a09c5810 --- /dev/null +++ b/libstdc++-v3/include/std/text_encoding @@ -0,0 +1,704 @@ +// -*- C++ -*- + +// Copyright The GNU Toolchain Authors. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// . + +/** @file include/text_encoding + * This is a Standard C++ Library header. + */ + +#ifndef _GLIBCXX_TEXT_ENCODING +#define _GLIBCXX_TEXT_ENCODING + +#pragma GCC system_header + +#include + +#define __glibcxx_want_text_encoding +#include + +#ifdef __cpp_lib_text_encoding +#include +#include +#include // hash +#include // view_interface +#include // __charset_alias_match +#include // __int_traits + +namespace std _GLIBCXX_VISIBILITY(default) +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + + /** + * @brief An interface for accessing the IANA Character Sets registry. + * @ingroup locales + * @since C++23 + */ + struct text_encoding + { + private: + struct _Rep + { + using id = __INT_LEAST32_TYPE__; + id _M_id; + const char* _M_name; + + friend constexpr bool + operator<(const _Rep& __r, id __m) noexcept + { return __r._M_id < __m; } + + friend constexpr bool + operator==(const _Rep& __r, string_view __name) noexcept + { return __r._M_name == __name; } + }; + + public: + static constexpr size_t max_name_length = 63; + + enum class id : _Rep::id + { + other = 1, + unknown = 2, + ASCII = 3, + ISOLatin1 = 4, + ISOLatin2 = 5, + ISOLatin3 = 6, + ISOLatin4 = 7, + ISOLatinCyrillic = 8, + ISOLatinArabic = 9, + ISOLatinGreek = 10, + ISOLatinHebrew = 11, + ISOLatin5 = 12, + ISOLatin6 = 13, + ISOTextComm = 14, + HalfWidthKatakana = 15, + JISEncoding = 16, + ShiftJIS = 17, + EUCPkdFmtJapanese = 18, + EUCFixWidJapanese = 19, + ISO4UnitedKingdom = 20, + ISO11SwedishForNames = 21, + ISO15Italian = 22, + ISO17Spanish = 23, + ISO21German = 24, + ISO60DanishNorwegian = 25, + ISO69French = 26, + ISO10646UTF1 = 27, + ISO646basic1983 = 28, + INVARIANT = 29, + ISO2IntlRefVersion = 30, + NATSSEFI = 31, + NATSSEFIADD = 32, + ISO10Swedish = 35, + KSC56011987 = 36, + ISO2022KR = 37, + EUCKR = 38, + ISO2022JP = 39, + ISO2022JP2 = 40, + ISO13JISC6220jp = 41, + ISO14JISC6220ro = 42, + ISO16Portuguese = 43, + ISO18Greek7Old = 44, + ISO19LatinGreek = 45, + ISO25French = 46, + ISO27LatinGreek1 = 47, + ISO5427Cyrillic = 48, + ISO42JISC62261978 = 49, + ISO47BSViewdata = 50, + ISO49INIS = 51, + ISO50INIS8 = 52, + ISO51INISCyrillic = 53, + ISO54271981 = 54, + ISO5428Greek = 55, + ISO57GB1988 = 56, + ISO58GB231280 = 57, + ISO61Norwegian2 = 58, + ISO70VideotexSupp1 = 59, + ISO84Portuguese2 = 60, + ISO85Spanish2 = 61, + ISO86Hungarian = 62, + ISO87JISX0208 = 63, + ISO88Greek7 = 64, + ISO89ASMO449 = 65, + ISO90 = 66, + ISO91JISC62291984a = 67, + ISO92JISC62991984b = 68, + ISO93JIS62291984badd = 69, + ISO94JIS62291984hand = 70, + ISO95JIS62291984handadd = 71, + ISO96JISC62291984kana = 72, + ISO2033 = 73, + ISO99NAPLPS = 74, + ISO102T617bit = 75, + ISO103T618bit = 76, + ISO111ECMACyrillic = 77, + ISO121Canadian1 = 78, + ISO122Canadian2 = 79, + ISO123CSAZ24341985gr = 80, + ISO88596E = 81, + ISO88596I = 82, + ISO128T101G2 = 83, + ISO88598E = 84, + ISO88598I = 85, + ISO139CSN369103 = 86, + ISO141JUSIB1002 = 87, + ISO143IECP271 = 88, + ISO146Serbian = 89, + ISO147Macedonian = 90, + ISO150 = 91, + ISO151Cuba = 92, + ISO6937Add = 93, + ISO153GOST1976874 = 94, + ISO8859Supp = 95, + ISO10367Box = 96, + ISO158Lap = 97, + ISO159JISX02121990 = 98, + ISO646Danish = 99, + USDK = 100, + DKUS = 101, + KSC5636 = 102, + Unicode11UTF7 = 103, + ISO2022CN = 104, + ISO2022CNEXT = 105, + UTF8 = 106, + ISO885913 = 109, + ISO885914 = 110, + ISO885915 = 111, + ISO885916 = 112, + GBK = 113, + GB18030 = 114, + OSDEBCDICDF0415 = 115, + OSDEBCDICDF03IRV = 116, + OSDEBCDICDF041 = 117, + ISO115481 = 118, + KZ1048 = 119, + UCS2 = 1000, + UCS4 = 1001, + UnicodeASCII = 1002, + UnicodeLatin1 = 1003, + UnicodeJapanese = 1004, + UnicodeIBM1261 = 1005, + UnicodeIBM1268 = 1006, + UnicodeIBM1276 = 1007, + UnicodeIBM1264 = 1008, + UnicodeIBM1265 = 1009, + Unicode11 = 1010, + SCSU = 1011, + UTF7 = 1012, + UTF16BE = 1013, + UTF16LE = 1014, + UTF16 = 1015, + CESU8 = 1016, + UTF32 = 1017, + UTF32BE = 1018, + UTF32LE = 1019, + BOCU1 = 1020, + UTF7IMAP = 1021, + Windows30Latin1 = 2000, + Windows31Latin1 = 2001, + Windows31Latin2 = 2002, + Windows31Latin5 = 2003, + HPRoman8 = 2004, + AdobeStandardEncoding = 2005, + VenturaUS = 2006, + VenturaInternational = 2007, + DECMCS = 2008, + PC850Multilingual = 2009, + PC8DanishNorwegian = 2012, + PC862LatinHebrew = 2013, + PC8Turkish = 2014, + IBMSymbols = 2015, + IBMThai = 2016, + HPLegal = 2017, + HPPiFont = 2018, + HPMath8 = 2019, + HPPSMath = 2020, + HPDesktop = 2021, + VenturaMath = 2022, + MicrosoftPublishing = 2023, + Windows31J = 2024, + GB2312 = 2025, + Big5 = 2026, + Macintosh = 2027, + IBM037 = 2028, + IBM038 = 2029, + IBM273 = 2030, + IBM274 = 2031, + IBM275 = 2032, + IBM277 = 2033, + IBM278 = 2034, + IBM280 = 2035, + IBM281 = 2036, + IBM284 = 2037, + IBM285 = 2038, + IBM290 = 2039, + IBM297 = 2040, + IBM420 = 2041, + IBM423 = 2042, + IBM424 = 2043, + PC8CodePage437 = 2011, + IBM500 = 2044, + IBM851 = 2045, + PCp852 = 2010, + IBM855 = 2046, + IBM857 = 2047, + IBM860 = 2048, + IBM861 = 2049, + IBM863 = 2050, + IBM864 = 2051, + IBM865 = 2052, + IBM868 = 2053, + IBM869 = 2054, + IBM870 = 2055, + IBM871 = 2056, + IBM880 = 2057, + IBM891 = 2058, + IBM903 = 2059, + IBM904 = 2060, + IBM905 = 2061, + IBM918 = 2062, + IBM1026 = 2063, + IBMEBCDICATDE = 2064, + EBCDICATDEA = 2065, + EBCDICCAFR = 2066, + EBCDICDKNO = 2067, + EBCDICDKNOA = 2068, + EBCDICFISE = 2069, + EBCDICFISEA = 2070, + EBCDICFR = 2071, + EBCDICIT = 2072, + EBCDICPT = 2073, + EBCDICES = 2074, + EBCDICESA = 2075, + EBCDICESS = 2076, + EBCDICUK = 2077, + EBCDICUS = 2078, + Unknown8BiT = 2079, + Mnemonic = 2080, + Mnem = 2081, + VISCII = 2082, + VIQR = 2083, + KOI8R = 2084, + HZGB2312 = 2085, + IBM866 = 2086, + PC775Baltic = 2087, + KOI8U = 2088, + IBM00858 = 2089, + IBM00924 = 2090, + IBM01140 = 2091, + IBM01141 = 2092, + IBM01142 = 2093, + IBM01143 = 2094, + IBM01144 = 2095, + IBM01145 = 2096, + IBM01146 = 2097, + IBM01147 = 2098, + IBM01148 = 2099, + IBM01149 = 2100, + Big5HKSCS = 2101, + IBM1047 = 2102, + PTCP154 = 2103, + Amiga1251 = 2104, + KOI7switched = 2105, + BRF = 2106, + TSCII = 2107, + CP51932 = 2108, + windows874 = 2109, + windows1250 = 2250, + windows1251 = 2251, + windows1252 = 2252, + windows1253 = 2253, + windows1254 = 2254, + windows1255 = 2255, + windows1256 = 2256, + windows1257 = 2257, + windows1258 = 2258, + TIS620 = 2259, + CP50220 = 2260 + }; + using enum id; + + constexpr text_encoding() = default; + + constexpr explicit + text_encoding(string_view __enc) noexcept + : _M_rep(_S_find_name(__enc)) + { + __enc.copy(_M_name, max_name_length); + } + + // @pre i has the value of one of the enumerators of id. + constexpr + text_encoding(id __i) noexcept + : _M_rep(_S_find_id(__i)) + { + if (string_view __name(_M_rep->_M_name); !__name.empty()) + __name.copy(_M_name, max_name_length); + } + + constexpr id mib() const noexcept { return id(_M_rep->_M_id); } + + constexpr const char* name() const noexcept { return _M_name; } + + struct aliases_view : ranges::view_interface + { + private: + class _Iterator; + struct _Sentinel { }; + + public: + constexpr _Iterator begin() const noexcept { return _Iterator(_M_begin); } + constexpr _Sentinel end() const noexcept { return _Sentinel{}; } + + private: + friend struct text_encoding; + + constexpr explicit aliases_view(const _Rep* __r) : _M_begin(__r) { } + + class _Iterator + { + public: + using value_type = const char*; + using reference = const char*; + using difference_type = int; + + constexpr _Iterator() = default; + constexpr value_type operator*() const; + constexpr _Iterator& operator++(); + constexpr _Iterator& operator--(); + constexpr _Iterator operator++(int); + constexpr _Iterator operator--(int); + constexpr value_type operator[](difference_type) const; + constexpr _Iterator& operator+=(difference_type); + constexpr _Iterator& operator-=(difference_type); + constexpr difference_type operator-(const _Iterator&) const; + constexpr bool operator==(const _Iterator&) const = default; + constexpr bool operator==(_Sentinel) const noexcept; + constexpr strong_ordering operator<=>(const _Iterator&) const; + + friend _Iterator + operator+(_Iterator __i, difference_type __n) + { + __i += __n; + return __i; + } + + friend _Iterator + operator+(difference_type __n, _Iterator __i) + { + __i += __n; + return __i; + } + + friend _Iterator + operator-(_Iterator __i, difference_type __n) + { + __i -= __n; + return __i; + } + + private: + friend class text_encoding; + + constexpr explicit + _Iterator(const _Rep* __r) noexcept + : _M_rep(__r), _M_id(__r ? __r->_M_id : 0) + { } + + constexpr bool _M_dereferenceable() const noexcept; + static constexpr difference_type _S_neg(difference_type) noexcept; + + const _Rep* _M_rep = nullptr; + _Rep::id _M_id = 0; + }; + + const _Rep* _M_begin = nullptr; + }; + + constexpr aliases_view + aliases() const noexcept + { + return _M_rep->_M_name[0] ? aliases_view(_M_rep) : aliases_view{nullptr}; + } + + friend constexpr bool + operator==(const text_encoding& __a, + const text_encoding& __b) noexcept + { + if (__a.mib() == id::other && __b.mib() == id::other) [[unlikely]] + return _S_comp(__a._M_name, __b._M_name); + else + return __a.mib() == __b.mib(); + } + + friend constexpr bool + operator==(const text_encoding& __encoding, id __i) noexcept + { return __encoding.mib() == __i; } + +#if __CHAR_BIT__ == 8 + static consteval text_encoding + literal() noexcept + { +#ifdef __GNUC_EXECUTION_CHARSET_NAME + return text_encoding(__GNUC_EXECUTION_CHARSET_NAME); +#elif defined __clang_literal_encoding__ + return text_encoding(__clang_literal_encoding__); +#else + return text_encoding(); +#endif + } + + static text_encoding + environment(); + + template + static bool + environment_is() + { return text_encoding(_Id)._M_is_environment(); } +#else + static text_encoding literal() = delete; + static text_encoding environment() = delete; + template static bool environment_is() = delete; +#endif + + private: + const _Rep* _M_rep = _S_reps + 1; // id::unknown + char _M_name[max_name_length + 1] = {0}; + + bool + _M_is_environment() const; + + static inline constexpr _Rep _S_reps[] = { + { 1, "" }, { 2, "" }, +#define _GLIBCXX_GET_ENCODING_DATA +#include +#ifdef _GLIBCXX_GET_ENCODING_DATA +# error "Invalid text_encoding data" +#endif + { 9999, nullptr }, // sentinel + }; + + static constexpr bool + _S_comp(string_view __a, string_view __b) + { return __unicode::__charset_alias_match(__a, __b); } + + static constexpr const _Rep* + _S_find_name(string_view __name) noexcept + { +#ifdef _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET + // Optimize the common UTF-8 case to avoid a linear search through all + // strings in the table using the _S_comp function. + if (__name == "UTF-8") + return _S_reps + 2 + _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET; +#endif + + // The first two array elements (other and unknown) don't have names. + // The last element is a sentinel that can never match anything. + const auto __first = _S_reps + 2, __end = std::end(_S_reps) - 1; + for (auto __r = __first; __r != __end; ++__r) + if (_S_comp(__r->_M_name, __name)) + { + // Might have matched an alias. Find the first entry for this ID. + const auto __id = __r->_M_id; + while (__r[-1]._M_id == __id) + --__r; + return __r; + } + return _S_reps; // id::other + } + + static constexpr const _Rep* + _S_find_id(id __id) noexcept + { + const auto __i = (_Rep::id)__id; + const auto __r = std::lower_bound(_S_reps, std::end(_S_reps) - 1, __i); + if (__r->_M_id == __i) [[likely]] + return __r; + else + { + // Preconditions: i has the value of one of the enumerators of id. + __glibcxx_assert(__r->_M_id == __i); + return _S_reps + 1; // id::unknown + } + } + }; + + template<> + struct hash + { + size_t + operator()(const text_encoding& __enc) const noexcept + { return std::hash()(__enc.mib()); } + }; + + constexpr auto + text_encoding::aliases_view:: + _Iterator::operator*() const + -> value_type + { + if (_M_dereferenceable()) [[likely]] + return _M_rep->_M_name; + else + { + __glibcxx_assert(_M_dereferenceable()); + return ""; + } + } + + constexpr auto + text_encoding::aliases_view:: + _Iterator::operator++() + -> _Iterator& + { + if (_M_dereferenceable()) [[likely]] + ++_M_rep; + else + { + __glibcxx_assert(_M_dereferenceable()); + *this = _Iterator{}; + } + return *this; + } + + constexpr auto + text_encoding::aliases_view:: + _Iterator::operator--() + -> _Iterator& + { + const bool __decrementable = _M_rep != nullptr && _M_rep[-1]._M_id == _M_id; + if (__decrementable) [[likely]] + --_M_rep; + else + { + __glibcxx_assert(__decrementable); + *this = _Iterator{}; + } + return *this; + } + + constexpr auto + text_encoding::aliases_view:: + _Iterator::operator++(int) + -> _Iterator + { + auto __it = *this; + ++*this; + return __it; + } + + constexpr auto + text_encoding::aliases_view:: + _Iterator::operator--(int) + -> _Iterator + { + auto __it = *this; + --*this; + return __it; + } + + constexpr auto + text_encoding::aliases_view:: + _Iterator::operator[](difference_type __n) const + -> value_type + { return *(*this + __n); } + + constexpr auto + text_encoding::aliases_view:: + _Iterator::operator+=(difference_type __n) + -> _Iterator& + { + if (_M_rep != nullptr) + { + if ((__n > 0 && __n < (std::end(_S_reps) - _M_rep)) + || (__n < 0 && __n > (_S_reps - _M_rep))) + { + if (_M_rep[__n]._M_id == _M_id) + _M_rep += __n; + else + *this = _Iterator{}; + } + else if (__n != 0) + *this = _Iterator{}; + } + __glibcxx_assert(_M_rep != nullptr); + return *this; + } + + constexpr auto + text_encoding::aliases_view:: + _Iterator::operator-=(difference_type __n) + -> _Iterator& + { return operator+=(_S_neg(__n)); } + + constexpr auto + text_encoding::aliases_view:: + _Iterator::operator-(const _Iterator& __i) const noexcept + -> difference_type + { + if (_M_id == __i._M_id) + return _M_rep - __i._M_rep; + __glibcxx_assert(_M_id == __i._M_id); + return __gnu_cxx::__int_traits::__max; + } + + constexpr bool + text_encoding::aliases_view:: + _Iterator::operator==(_Sentinel) const noexcept + { return !_M_dereferenceable(); } + + constexpr strong_ordering + text_encoding::aliases_view:: + _Iterator::operator<=>(const _Iterator& __i) const + { + __glibcxx_assert(_M_id == __i._M_id); + return _M_rep <=> __i._M_rep; + } + + constexpr bool + text_encoding::aliases_view:: + _Iterator::_M_dereferenceable() const noexcept + { return _M_rep != nullptr && _M_rep->_M_id == _M_id; } + + constexpr auto + text_encoding::aliases_view:: + _Iterator::_S_neg(difference_type __n) noexcept + -> difference_type + { + using _Traits = __gnu_cxx::__int_traits; + if (__n == _Traits::__min) [[unlikely]] + return _Traits::__max; + return -__n; + } + +namespace ranges +{ + // Opt-in to borrowed_range concept + template<> + inline constexpr bool + enable_borrowed_range = true; +} + +_GLIBCXX_END_NAMESPACE_VERSION +} // namespace std + +#endif // __cpp_lib_text_encoding +#endif // _GLIBCXX_TEXT_ENCODING diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py b/libstdc++-v3/python/libstdcxx/v6/printers.py index 032a7aa58a2..a6c2ed4599f 100644 --- a/libstdc++-v3/python/libstdcxx/v6/printers.py +++ b/libstdc++-v3/python/libstdcxx/v6/printers.py @@ -2324,6 +2324,21 @@ class StdIntegralConstantPrinter(printer_base): typename = strip_versioned_namespace(self._typename) return "{}<{}, {}>".format(typename, value_type, value) +class StdTextEncodingPrinter(printer_base): + """Print a std::text_encoding.""" + + def __init__(self, typename, val): + self._val = val + self._typename = typename + + def to_string(self): + rep = self._val['_M_rep'].dereference() + if rep['_M_id'] == 1: + return self._val['_M_name'] + if rep['_M_id'] == 2: + return 'unknown' + return rep['_M_name'] + # A "regular expression" printer which conforms to the # "SubPrettyPrinter" protocol from gdb.printing. class RxPrinter(object): @@ -2807,6 +2822,8 @@ def build_libstdcxx_dictionary(): libstdcxx_printer.add_version('std::', 'integral_constant', StdIntegralConstantPrinter) + libstdcxx_printer.add_version('std::', 'text_encoding', + StdTextEncodingPrinter) if hasattr(gdb.Value, 'dynamic_type'): libstdcxx_printer.add_version('std::', 'error_code', diff --git a/libstdc++-v3/scripts/gen_text_encoding_data.py b/libstdc++-v3/scripts/gen_text_encoding_data.py new file mode 100755 index 00000000000..2d6f3e4077a --- /dev/null +++ b/libstdc++-v3/scripts/gen_text_encoding_data.py @@ -0,0 +1,70 @@ +#!/usr/bin/env python3 +# +# Script to generate tables for libstdc++ std::text_encoding. +# +# This file is part of GCC. +# +# GCC is free software; you can redistribute it and/or modify it under +# the terms of the GNU General Public License as published by the Free +# Software Foundation; either version 3, or (at your option) any later +# version. +# +# GCC is distributed in the hope that it will be useful, but WITHOUT ANY +# WARRANTY; without even the implied warranty of MERCHANTABILITY or +# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +# for more details. +# +# You should have received a copy of the GNU General Public License +# along with GCC; see the file COPYING3. If not see +# . + +# To update the Libstdc++ static data in download +# the latest: +# https://www.iana.org/assignments/character-sets/character-sets-1.csv +# Then run this script and save the output to +# include/bits/text_encoding-data.h + +import sys +import csv + +if len(sys.argv) != 2: + print("Usage: %s " % sys.argv[0], file=sys.stderr) + sys.exit(1) + +print("// Generated by gen_text_encoding_data.py, do not edit.\n") +print("#ifndef _GLIBCXX_GET_ENCODING_DATA") +print('# error "This is not a public header, do not include it directly"') +print("#endif\n") + + +charsets = {} +with open(sys.argv[1], newline='') as f: + reader = csv.reader(f) + next(reader) # skip header row + for row in reader: + mib = int(row[2]) + if mib in charsets: + raise ValueError("Multiple rows for mibEnum={}".format(mib)) + name = row[1] + aliases = row[5].split() + # Ensure primary name comes first + if name in aliases: + aliases.remove(name) + charsets[mib] = [name] + aliases + +# Remove "NATS-DANO" and "NATS-DANO-ADD" +charsets.pop(33, None) +charsets.pop(34, None) + +count = 0 +for mib in sorted(charsets.keys()): + names = charsets[mib] + if names[0] == "UTF-8": + print("#define _GLIBCXX_TEXT_ENCODING_UTF8_OFFSET {}".format(count)) + for name in names: + print(' {{ {:4}, "{}" }},'.format(mib, name)) + count += len(names) + +# gives an error if this macro is left defined. +# Do this last, so that the generated output is not usable unless we reach here. +print("\n#undef _GLIBCXX_GET_ENCODING_DATA") diff --git a/libstdc++-v3/src/c++26/Makefile.am b/libstdc++-v3/src/c++26/Makefile.am index 8e9ec78fe0d..201999a3577 100644 --- a/libstdc++-v3/src/c++26/Makefile.am +++ b/libstdc++-v3/src/c++26/Makefile.am @@ -35,7 +35,7 @@ else inst_sources = endif -sources = debugging.cc +sources = debugging.cc text_encoding.cc vpath % $(top_srcdir)/src/c++26 diff --git a/libstdc++-v3/src/c++26/text_encoding.cc b/libstdc++-v3/src/c++26/text_encoding.cc new file mode 100644 index 00000000000..9a7df07db29 --- /dev/null +++ b/libstdc++-v3/src/c++26/text_encoding.cc @@ -0,0 +1,91 @@ +// Definitions for -*- C++ -*- + +// Copyright The GNU Toolchain Authors. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// Under Section 7 of GPL version 3, you are granted additional +// permissions described in the GCC Runtime Library Exception, version +// 3.1, as published by the Free Software Foundation. + +// You should have received a copy of the GNU General Public License and +// a copy of the GCC Runtime Library Exception along with this program; +// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +// . + +#include +#include + +#ifdef _GLIBCXX_USE_NL_LANGINFO_L +#include +#include + +#if __CHAR_BIT__ == 8 +namespace std +{ +_GLIBCXX_BEGIN_NAMESPACE_VERSION + +text_encoding +__locale_encoding(const char* name) +{ + text_encoding enc; + if (locale_t loc = ::newlocale(LC_ALL_MASK, name, (locale_t)0)) + { + if (const char* codeset = ::nl_langinfo_l(CODESET, loc)) + { + string_view s(codeset); + if (s.size() < text_encoding::max_name_length) + enc = text_encoding(s); + } + ::freelocale(loc); + } + return enc; +} + +_GLIBCXX_END_NAMESPACE_VERSION +} // namespace std + +std::text_encoding +std::text_encoding::environment() +{ + return std::__locale_encoding(""); +} + +bool +std::text_encoding::_M_is_environment() const +{ + bool matched = false; + if (locale_t loc = ::newlocale(LC_ALL_MASK, "", (locale_t)0)) + { + if (const char* codeset = ::nl_langinfo_l(CODESET, loc)) + { + string_view sv(codeset); + for (auto alias : aliases()) + if (__unicode::__charset_alias_match(alias, sv)) + { + matched = true; + break; + } + } + ::freelocale(loc); + } + return matched; +} + +std::text_encoding +std::locale::encoding() const +{ + return std::__locale_encoding(name().c_str()); +} +#endif // CHAR_BIT == 8 + +#endif // _GLIBCXX_USE_NL_LANGINFO_L diff --git a/libstdc++-v3/testsuite/ext/unicode/charset_alias_match.cc b/libstdc++-v3/testsuite/ext/unicode/charset_alias_match.cc new file mode 100644 index 00000000000..f6272ae998b --- /dev/null +++ b/libstdc++-v3/testsuite/ext/unicode/charset_alias_match.cc @@ -0,0 +1,18 @@ +// { dg-do compile { target c++20 } } +#include + +using std::__unicode::__charset_alias_match; +static_assert( __charset_alias_match("UTF-8", "utf8") == true ); +static_assert( __charset_alias_match("UTF-8", "u.t.f-008") == true ); +static_assert( __charset_alias_match("UTF-8", "utf-80") == false ); +static_assert( __charset_alias_match("UTF-8", "ut8") == false ); + +static_assert( __charset_alias_match("iso8859_1", "ISO-8859-1") == true ); + +static_assert( __charset_alias_match("", "") == true ); +static_assert( __charset_alias_match("", ".") == true ); +static_assert( __charset_alias_match("--", "...") == true ); +static_assert( __charset_alias_match("--a", "a...") == true ); +static_assert( __charset_alias_match("--a010", "a..10.") == true ); +static_assert( __charset_alias_match("--a010", "a..1.0") == false ); +static_assert( __charset_alias_match("aaaa", "000.00.0a0a)0aa...") == true ); diff --git a/libstdc++-v3/testsuite/std/text_encoding/cons.cc b/libstdc++-v3/testsuite/std/text_encoding/cons.cc new file mode 100644 index 00000000000..2d84d3bb394 --- /dev/null +++ b/libstdc++-v3/testsuite/std/text_encoding/cons.cc @@ -0,0 +1,106 @@ +// { dg-do run { target c++26 } } + +#include +#include +#include + +using namespace std::string_view_literals; + +constexpr void +test_default_construct() +{ + std::text_encoding e0; + VERIFY( e0.mib() == std::text_encoding::unknown ); + VERIFY( e0.name()[0] == '\0' ); // P2862R1 name() should never return null + VERIFY( e0.aliases().empty() ); +} + +constexpr void +test_construct_by_name() +{ + std::string_view s; + std::text_encoding e0(s); + VERIFY( e0.mib() == std::text_encoding::other ); + VERIFY( e0.name() == s ); + VERIFY( e0.aliases().empty() ); + + s = "not a real encoding"; + std::text_encoding e1(s); + VERIFY( e1.mib() == std::text_encoding::other ); + VERIFY( e1.name() == s ); + VERIFY( e1.aliases().empty() ); + + VERIFY( e1 != e0 ); + VERIFY( e1 == e0.mib() ); + + s = "utf8"; + std::text_encoding e2(s); + VERIFY( e2.mib() == std::text_encoding::UTF8 ); + VERIFY( e2.name() == s ); + VERIFY( ! e2.aliases().empty() ); + VERIFY( e2.aliases().front() == "UTF-8"sv ); + + s = "latin1"; + std::text_encoding e3(s); + VERIFY( e3.mib() == std::text_encoding::ISOLatin1 ); + VERIFY( e3.name() == s ); + VERIFY( ! e3.aliases().empty() ); + VERIFY( e3.aliases().front() == "ISO_8859-1:1987"sv ); // primary name +} + +constexpr void +test_construct_by_id() +{ + std::text_encoding e0(std::text_encoding::other); + VERIFY( e0.mib() == std::text_encoding::other ); + VERIFY( e0.name() == ""sv ); + VERIFY( e0.aliases().empty() ); + + std::text_encoding e1(std::text_encoding::unknown); + VERIFY( e1.mib() == std::text_encoding::unknown ); + VERIFY( e1.name() == ""sv ); + VERIFY( e1.aliases().empty() ); + + std::text_encoding e2(std::text_encoding::UTF8); + VERIFY( e2.mib() == std::text_encoding::UTF8 ); + VERIFY( e2.name() == "UTF-8"sv ); + VERIFY( ! e2.aliases().empty() ); + VERIFY( e2.aliases().front() == std::string_view(e2.name()) ); + bool found = false; + for (auto alias : e2.aliases()) + if (alias == "csUTF8"sv) + { + found = true; + break; + } + VERIFY( found ); +} + +constexpr void +test_copy_construct() +{ + std::text_encoding e0; + std::text_encoding e1 = e0; + VERIFY( e1 == e0 ); + + std::text_encoding e2(std::text_encoding::UTF8); + auto e3 = e2; + VERIFY( e3 == e2 ); + + e1 = e3; + VERIFY( e1 == e2 ); +} + +int main() +{ + auto run_tests = [] { + test_default_construct(); + test_construct_by_name(); + test_construct_by_id(); + test_copy_construct(); + return true; + }; + + run_tests(); + static_assert( run_tests() ); +} diff --git a/libstdc++-v3/testsuite/std/text_encoding/requirements.cc b/libstdc++-v3/testsuite/std/text_encoding/requirements.cc new file mode 100644 index 00000000000..d62d93dcda4 --- /dev/null +++ b/libstdc++-v3/testsuite/std/text_encoding/requirements.cc @@ -0,0 +1,31 @@ +// { dg-do compile { target c++26 } } +// { dg-add-options no_pch } + +#include +#ifndef __cpp_lib_text_encoding +# error "Feature-test macro for text_encoding missing in " +#elif __cpp_lib_text_encoding != 202306L +# error "Feature-test macro for text_encoding has wrong value in " +#endif + +#undef __cpp_lib_expected +#include +#ifndef __cpp_lib_text_encoding +# error "Feature-test macro for text_encoding missing in " +#elif __cpp_lib_text_encoding != 202306L +# error "Feature-test macro for text_encoding has wrong value in " +#endif + +#include +#include +static_assert( std::is_trivially_copyable_v ); + +using aliases_view = std::text_encoding::aliases_view; +static_assert( std::copyable ); +static_assert( std::ranges::view ); +static_assert( std::ranges::random_access_range ); +static_assert( std::ranges::borrowed_range ); +static_assert( std::same_as, + const char*> ); +static_assert( std::same_as, + const char*> ); -- 2.43.0