public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them
@ 2024-01-10 23:14 redi at gcc dot gnu.org
2024-01-12 23:03 ` [Bug libstdc++/113318] " redi at gcc dot gnu.org
` (8 more replies)
0 siblings, 9 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-10 23:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318
Bug ID: 113318
Summary: [C++23] Implement P1185R12, Naming text encodings to
demystify them
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: redi at gcc dot gnu.org
Blocks: 106749
Target Milestone: ---
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p1885r12.pdf
Fun with nl_langinfo_l("CODESET", newlocale(""))
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106749
[Bug 106749] Implement C++23 library features
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
@ 2024-01-12 23:03 ` redi at gcc dot gnu.org
2024-01-12 23:03 ` redi at gcc dot gnu.org
` (7 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-12 23:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318
Jonathan Wakely <redi at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Assignee|unassigned at gcc dot gnu.org |redi at gcc dot gnu.org
Ever confirmed|0 |1
Status|UNCONFIRMED |ASSIGNED
Last reconfirmed| |2024-01-12
--- Comment #1 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Patch posted:
https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642887.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
2024-01-12 23:03 ` [Bug libstdc++/113318] " redi at gcc dot gnu.org
@ 2024-01-12 23:03 ` redi at gcc dot gnu.org
2024-01-12 23:50 ` redi at gcc dot gnu.org
` (6 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-12 23:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318
--- Comment #2 from Jonathan Wakely <redi at gcc dot gnu.org> ---
After sending that I realised that text_encoding::environment_is<id> can be
optimized like so:
template<id _Id>
static bool
environment_is()
{ return text_encoding(_Id)._M_is_environment(); }
Where that calls an extern function in the library:
bool
std::text_encoding::_M_is_environment() const
{
bool matched = false;
if (locale_t loc = ::newlocale(LC_ALL_MASK, "", (locale_t)0))
{
if (const char* codeset = ::nl_langinfo_l(CODESET, loc))
matched = ranges::contains(aliases(), string_view(codeset));
::freelocale(loc);
}
return matched;
}
That way we do a fast binary search for the ID in the static array, and only do
the slower string comparisons with the actual aliases of the specified id,
instead of searching the entire array doing string comparisons.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
2024-01-12 23:03 ` [Bug libstdc++/113318] " redi at gcc dot gnu.org
2024-01-12 23:03 ` redi at gcc dot gnu.org
@ 2024-01-12 23:50 ` redi at gcc dot gnu.org
2024-01-13 0:38 ` redi at gcc dot gnu.org
` (5 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-12 23:50 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318
--- Comment #3 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Oh, I need to filter out NATS-DANO and NATS-DANO-ADD from the generated file.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
` (2 preceding siblings ...)
2024-01-12 23:50 ` redi at gcc dot gnu.org
@ 2024-01-13 0:38 ` redi at gcc dot gnu.org
2024-01-13 0:55 ` redi at gcc dot gnu.org
` (4 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-13 0:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318
Jonathan Wakely <redi at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |14.0
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
` (3 preceding siblings ...)
2024-01-13 0:38 ` redi at gcc dot gnu.org
@ 2024-01-13 0:55 ` redi at gcc dot gnu.org
2024-01-13 12:49 ` redi at gcc dot gnu.org
` (3 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-13 0:55 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318
--- Comment #4 from Jonathan Wakely <redi at gcc dot gnu.org> ---
The static array can be compiled for 16-bit targets like msp640-elf, although
it's probably a bad idea to use it if you are memory-constrained.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
` (4 preceding siblings ...)
2024-01-13 0:55 ` redi at gcc dot gnu.org
@ 2024-01-13 12:49 ` redi at gcc dot gnu.org
2024-01-17 12:11 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-13 12:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318
--- Comment #5 from Jonathan Wakely <redi at gcc dot gnu.org> ---
V2 https://gcc.gnu.org/pipermail/gcc-patches/2024-January/642908.html
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
` (5 preceding siblings ...)
2024-01-13 12:49 ` redi at gcc dot gnu.org
@ 2024-01-17 12:11 ` cvs-commit at gcc dot gnu.org
2024-01-17 16:51 ` redi at gcc dot gnu.org
2024-02-02 13:51 ` [Bug libstdc++/113318] [C++26] Implement P1885R12, " redi at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-01-17 12:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318
--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:
https://gcc.gnu.org/g:df0a668b784556fe4317317d58961652d93d53de
commit r14-8182-gdf0a668b784556fe4317317d58961652d93d53de
Author: Jonathan Wakely <jwakely@redhat.com>
Date: Mon Jan 15 15:42:50 2024 +0000
libstdc++: Implement C++26 std::text_encoding (P1885R12) [PR113318]
This is another C++26 change, approved in Varna 2023. We require a new
static array of data that is extracted from the IANA Character Sets
database. A new Python script to generate a header from the IANA CSV
file is added.
The text_encoding class is basically just a pointer to an {ID,name} pair
in the static array. The aliases view is also just the same pointer (or
empty), and the view's iterator moves forwards and backwards in the
array while the array elements have the same ID (or to one element
further, for a past-the-end iterator).
Because those iterators refer to a global array that never goes out of
scope, there's no reason they should every produce undefined behaviour
or indeterminate values. They should either have well-defined
behaviour, or abort. The overhead of ensuring those properties is pretty
low, so seems worth it.
This means that an aliases_view iterator should never be able to access
out-of-bounds. A non-value-initialized iterator always points to an
element of the static array even when not dereferenceable (the array has
unreachable entries at the start and end, which means that even a
past-the-end iterator for the last encoding in the array still points to
valid memory). Dereferencing an iterator can always return a valid
array element, or "" for a non-dereferenceable iterator (but doing so
will abort when assertions are enabled). In the language being proposed
for C++26, dereferencing an invalid iterator erroneously returns "".
Attempting to increment/decrement past the last/first element in the
view is erroneously a no-op, so aborts when assertions are enabled, and
doesn't change value otherwise.
Similarly, constructing a std::text_encoding with an invalid id (one
that doesn't have the value of an enumerator) erroneously behaves the
same as constructing with id::unknown, or aborts with assertions
enabled.
libstdc++-v3/ChangeLog:
PR libstdc++/113318
* acinclude.m4 (GLIBCXX_CONFIGURE): Add c++26 directory.
(GLIBCXX_CHECK_TEXT_ENCODING): Define.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Use GLIBCXX_CHECK_TEXT_ENCODING.
* include/Makefile.am: Add new headers.
* include/Makefile.in: Regenerate.
* include/bits/locale_classes.h (locale::encoding): Declare new
member function.
* include/bits/unicode.h (__charset_alias_match): New function.
* include/bits/text_encoding-data.h: New file.
* include/bits/version.def (text_encoding): Define.
* include/bits/version.h: Regenerate.
* include/std/text_encoding: New file.
* src/Makefile.am: Add new subdirectory.
* src/Makefile.in: Regenerate.
* src/c++26/Makefile.am: New file.
* src/c++26/Makefile.in: New file.
* src/c++26/text_encoding.cc: New file.
* src/experimental/Makefile.am: Include c++26 convenience
library.
* src/experimental/Makefile.in: Regenerate.
* python/libstdcxx/v6/printers.py (StdTextEncodingPrinter): New
printer.
* scripts/gen_text_encoding_data.py: New file.
* testsuite/22_locale/locale/encoding.cc: New test.
* testsuite/ext/unicode/charset_alias_match.cc: New test.
* testsuite/std/text_encoding/cons.cc: New test.
* testsuite/std/text_encoding/members.cc: New test.
* testsuite/std/text_encoding/requirements.cc: New test.
Reviewed-by: Ulrich Drepper <drepper.fsp@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug libstdc++/113318] [C++23] Implement P1185R12, Naming text encodings to demystify them
2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
` (6 preceding siblings ...)
2024-01-17 12:11 ` cvs-commit at gcc dot gnu.org
@ 2024-01-17 16:51 ` redi at gcc dot gnu.org
2024-02-02 13:51 ` [Bug libstdc++/113318] [C++26] Implement P1885R12, " redi at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-01-17 16:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318
Jonathan Wakely <redi at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|ASSIGNED |RESOLVED
--- Comment #7 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Implemented for GCC 14
^ permalink raw reply [flat|nested] 10+ messages in thread
* [Bug libstdc++/113318] [C++26] Implement P1885R12, Naming text encodings to demystify them
2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
` (7 preceding siblings ...)
2024-01-17 16:51 ` redi at gcc dot gnu.org
@ 2024-02-02 13:51 ` redi at gcc dot gnu.org
8 siblings, 0 replies; 10+ messages in thread
From: redi at gcc dot gnu.org @ 2024-02-02 13:51 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113318
--- Comment #8 from Jonathan Wakely <redi at gcc dot gnu.org> ---
I have a patch to add partial std::text_encoding support on Windows, using
GetACP() and _MSVC_EXECUTION_CHARACTER_SET to query the system codepage and the
execution charset codepage, and then mapping from Windows codepage identifiers
to IANA mib values.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-02-02 13:51 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-10 23:14 [Bug libstdc++/113318] New: [C++23] Implement P1185R12, Naming text encodings to demystify them redi at gcc dot gnu.org
2024-01-12 23:03 ` [Bug libstdc++/113318] " redi at gcc dot gnu.org
2024-01-12 23:03 ` redi at gcc dot gnu.org
2024-01-12 23:50 ` redi at gcc dot gnu.org
2024-01-13 0:38 ` redi at gcc dot gnu.org
2024-01-13 0:55 ` redi at gcc dot gnu.org
2024-01-13 12:49 ` redi at gcc dot gnu.org
2024-01-17 12:11 ` cvs-commit at gcc dot gnu.org
2024-01-17 16:51 ` redi at gcc dot gnu.org
2024-02-02 13:51 ` [Bug libstdc++/113318] [C++26] Implement P1885R12, " redi at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).