public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them
@ 2024-01-10 23:14 redi at gcc dot gnu.org
  2024-01-12 23:03 ` [Bug libstdc++/113318] " redi at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-10 23:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318

            Bug ID: 113318
           Summary: [C++23] Implement P1185R12, Naming text encodings to
                    demystify them
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: redi at gcc dot gnu.org
            Blocks: 106749
  Target Milestone: ---

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p1885r12.pdf

Fun with nl_langinfo_l("CODESET", newlocale(""))


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106749
[Bug 106749] Implement C++23 library features

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
  2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
@ 2024-01-12 23:03 ` redi at gcc dot gnu.org
  2024-01-12 23:03 ` redi at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-12 23:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |redi at gcc dot gnu.org
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2024-01-12

--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Patch posted:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642887.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
  2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
  2024-01-12 23:03 ` [Bug libstdc++/113318] " redi at gcc dot gnu.org
@ 2024-01-12 23:03 ` redi at gcc dot gnu.org
  2024-01-12 23:50 ` redi at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-12 23:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318

--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---

After sending that I realised that text_encoding::environment_is<id> can be
optimized like so:

    template<id _Id>
      static bool
      environment_is()
      { return text_encoding(_Id)._M_is_environment(); }

Where that calls an extern function in the library:

bool
std::text_encoding::_M_is_environment() const
{
  bool matched = false;
  if (locale_t loc = ::newlocale(LC_ALL_MASK, "", (locale_t)0))
    {
      if (const char* codeset = ::nl_langinfo_l(CODESET, loc))
        matched = ranges::contains(aliases(), string_view(codeset));
      ::freelocale(loc);
    }
  return matched;
}

That way we do a fast binary search for the ID in the static array, and only do
the slower string comparisons with the actual aliases of the specified id,
instead of searching the entire array doing string comparisons.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
  2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
  2024-01-12 23:03 ` [Bug libstdc++/113318] " redi at gcc dot gnu.org
  2024-01-12 23:03 ` redi at gcc dot gnu.org
@ 2024-01-12 23:50 ` redi at gcc dot gnu.org
  2024-01-13  0:38 ` redi at gcc dot gnu.org
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-12 23:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318

--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Oh, I need to filter out NATS-DANO and NATS-DANO-ADD from the generated file.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
  2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-01-12 23:50 ` redi at gcc dot gnu.org
@ 2024-01-13  0:38 ` redi at gcc dot gnu.org
  2024-01-13  0:55 ` redi at gcc dot gnu.org
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-13  0:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |14.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
  2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-01-13  0:38 ` redi at gcc dot gnu.org
@ 2024-01-13  0:55 ` redi at gcc dot gnu.org
  2024-01-13 12:49 ` redi at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-13  0:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318

--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
The static array can be compiled for 16-bit targets like msp640-elf, although
it's probably a bad idea to use it if you are memory-constrained.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
  2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-01-13  0:55 ` redi at gcc dot gnu.org
@ 2024-01-13 12:49 ` redi at gcc dot gnu.org
  2024-01-17 12:11 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-13 12:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318

--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> ---
V2 https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642908.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
  2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-01-13 12:49 ` redi at gcc dot gnu.org
@ 2024-01-17 12:11 ` cvs-commit at gcc dot gnu.org
  2024-01-17 16:51 ` redi at gcc dot gnu.org
  2024-02-02 13:51 ` [Bug libstdc++/113318] [C++26] Implement P1885R12, " redi at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-17 12:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:df0a668b784556fe4317317d58961652d93d53de

commit r14-8182-gdf0a668b784556fe4317317d58961652d93d53de
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Mon Jan 15 15:42:50 2024 +0000

    libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]

    This is another C++26 change, approved in Varna 2023. We require a new
    static array of data that is extracted from the IANA Character Sets
    database. A new Python script to generate a header from the IANA CSV
    file is added.

    The text_encoding class is basically just a pointer to an {ID,name} pair
    in the static array. The aliases view is also just the same pointer (or
    empty), and the view's iterator moves forwards and backwards in the
    array while the array elements have the same ID (or to one element
    further, for a past-the-end iterator).

    Because those iterators refer to a global array that never goes out of
    scope, there's no reason they should every produce undefined behaviour
    or indeterminate values.  They should either have well-defined
    behaviour, or abort. The overhead of ensuring those properties is pretty
    low, so seems worth it.

    This means that an aliases_view iterator should never be able to access
    out-of-bounds. A non-value-initialized iterator always points to an
    element of the static array even when not dereferenceable (the array has
    unreachable entries at the start and end, which means that even a
    past-the-end iterator for the last encoding in the array still points to
    valid memory).  Dereferencing an iterator can always return a valid
    array element, or "" for a non-dereferenceable iterator (but doing so
    will abort when assertions are enabled).  In the language being proposed
    for C++26, dereferencing an invalid iterator erroneously returns "".
    Attempting to increment/decrement past the last/first element in the
    view is erroneously a no-op, so aborts when assertions are enabled, and
    doesn't change value otherwise.

    Similarly, constructing a std::text_encoding with an invalid id (one
    that doesn't have the value of an enumerator) erroneously behaves the
    same as constructing with id::unknown, or aborts with assertions
    enabled.

    libstdc++-v3/ChangeLog:

            PR libstdc++/113318
            * acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
            (GLIBCXX_CHECK_TEXT_ENCODING): Define.
            * config.h.in: Regenerate.
            * configure: Regenerate.
            * configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
            * include/Makefile.am: Add new headers.
            * include/Makefile.in: Regenerate.
            * include/bits/locale_classes.h (locale::encoding): Declare new
            member function.
            * include/bits/unicode.h (__charset_alias_match): New function.
            * include/bits/text_encoding-data.h: New file.
            * include/bits/version.def (text_encoding): Define.
            * include/bits/version.h: Regenerate.
            * include/std/text_encoding: New file.
            * src/Makefile.am: Add new subdirectory.
            * src/Makefile.in: Regenerate.
            * src/c++26/Makefile.am: New file.
            * src/c++26/Makefile.in: New file.
            * src/c++26/text_encoding.cc: New file.
            * src/experimental/Makefile.am: Include c++26 convenience
            library.
            * src/experimental/Makefile.in: Regenerate.
            * python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
            printer.
            * scripts/gen_text_encoding_data.py: New file.
            * testsuite/22_locale/locale/encoding.cc: New test.
            * testsuite/ext/unicode/charset_alias_match.cc: New test.
            * testsuite/std/text_encoding/cons.cc: New test.
            * testsuite/std/text_encoding/members.cc: New test.
            * testsuite/std/text_encoding/requirements.cc: New test.

    Reviewed-by: Ulrich Drepper <drepper.fsp@gmail.com>
    Reviewed-by: Patrick Palka <ppalka@redhat.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
  2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2024-01-17 12:11 ` cvs-commit at gcc dot gnu.org
@ 2024-01-17 16:51 ` redi at gcc dot gnu.org
  2024-02-02 13:51 ` [Bug libstdc++/113318] [C++26] Implement P1885R12, " redi at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-17 16:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|ASSIGNED                    |RESOLVED

--- Comment #7 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Implemented for GCC 14

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug libstdc++/113318] [C++26] Implement P1885R12, Naming text encodings to demystify them
  2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2024-01-17 16:51 ` redi at gcc dot gnu.org
@ 2024-02-02 13:51 ` redi at gcc dot gnu.org
  8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-02-02 13:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318

--- Comment #8 from Jonathan Wakely <redi at gcc dot gnu.org> ---
I have a patch to add partial std::text_encoding support on Windows, using
GetACP() and _MSVC_EXECUTION_CHARACTER_SET to query the system codepage and the
execution charset codepage, and then mapping from Windows codepage identifiers
to IANA mib values.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-02-02 13:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
2024-01-12 23:03 ` [Bug libstdc++/113318] " redi at gcc dot gnu.org
2024-01-12 23:03 ` redi at gcc dot gnu.org
2024-01-12 23:50 ` redi at gcc dot gnu.org
2024-01-13  0:38 ` redi at gcc dot gnu.org
2024-01-13  0:55 ` redi at gcc dot gnu.org
2024-01-13 12:49 ` redi at gcc dot gnu.org
2024-01-17 12:11 ` cvs-commit at gcc dot gnu.org
2024-01-17 16:51 ` redi at gcc dot gnu.org
2024-02-02 13:51 ` [Bug libstdc++/113318] [C++26] Implement P1885R12, " redi at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).