* [PATCH 0/3]: C N2653 char8_t implementation
@ 2021-06-07 2:31 Tom Honermann
2021-06-07 21:03 ` Joseph Myers
0 siblings, 1 reply; 5+ messages in thread
From: Tom Honermann @ 2021-06-07 2:31 UTC (permalink / raw)
To: gcc-patches
This series of patches implements the core language features for the
WG14 N2653 [1] proposal to provide char8_t support in C. These changes
are intended to align char8_t support in C with the support provided in
C++20 via WG21 P0482R6 [2].
These changes do not impact default gcc behavior. The existing
-fchar8_t option is extended to C compilation to enable the N2653
changes, and -fno-char8_t is extended to explicitly disable them. N2653
has not yet been accepted by WG14, so no changes are made to handling of
the C2X language dialect.
Patch 1: Language support
Patch 2: New tests
Patch 3: Documentation updates
Tom.
[1]: WG14 N2653
"char8_t: A type for UTF-8 characters and strings (Revision 1)"
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2653.htm
[2]: WG21 P0482R6
"char8_t: A type for UTF-8 characters and strings (Revision 6)"
https://wg21.link/p0482r6
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3]: C N2653 char8_t implementation
2021-06-07 2:31 [PATCH 0/3]: C N2653 char8_t implementation Tom Honermann
@ 2021-06-07 21:03 ` Joseph Myers
2021-06-11 15:42 ` Tom Honermann
0 siblings, 1 reply; 5+ messages in thread
From: Joseph Myers @ 2021-06-07 21:03 UTC (permalink / raw)
To: Tom Honermann; +Cc: gcc-patches
On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote:
> These changes do not impact default gcc behavior. The existing -fchar8_t
> option is extended to C compilation to enable the N2653 changes, and
> -fno-char8_t is extended to explicitly disable them. N2653 has not yet been
> accepted by WG14, so no changes are made to handling of the C2X language
> dialect.
Why is that option needed? Normally I'd expect features to be enabled or
disabled based on the selected language version, rather than having
separate options to adjust the configuration for one very specific feature
in a language version. Adding extra language dialects not corresponding
to any standard version but to some peculiar mix of versions (such as C17
with a changed type for u8"", or C2X with a changed type for u8'') needs a
strong reason for those language dialects to be useful (for example, the
-fgnu89-inline option was justified by widespread use of GNU-style extern
inline in headers).
I think the whole patch series would best wait until after the proposal
has been considered by a WG14 meeting, in addition to not increasing the
number of language dialects supported.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3]: C N2653 char8_t implementation
2021-06-07 21:03 ` Joseph Myers
@ 2021-06-11 15:42 ` Tom Honermann
2021-06-11 17:27 ` Joseph Myers
0 siblings, 1 reply; 5+ messages in thread
From: Tom Honermann @ 2021-06-11 15:42 UTC (permalink / raw)
To: Joseph Myers; +Cc: gcc-patches
On 6/7/21 5:03 PM, Joseph Myers wrote:
> On Sun, 6 Jun 2021, Tom Honermann via Gcc-patches wrote:
>
>> These changes do not impact default gcc behavior. The existing -fchar8_t
>> option is extended to C compilation to enable the N2653 changes, and
>> -fno-char8_t is extended to explicitly disable them. N2653 has not yet been
>> accepted by WG14, so no changes are made to handling of the C2X language
>> dialect.
> Why is that option needed? Normally I'd expect features to be enabled or
> disabled based on the selected language version, rather than having
> separate options to adjust the configuration for one very specific feature
> in a language version. Adding extra language dialects not corresponding
> to any standard version but to some peculiar mix of versions (such as C17
> with a changed type for u8"", or C2X with a changed type for u8'') needs a
> strong reason for those language dialects to be useful (for example, the
> -fgnu89-inline option was justified by widespread use of GNU-style extern
> inline in headers).
The option is needed because it impacts core language backward
compatibility (for both C and C++, the type of u8 string literals; for
C++, the type of u8 character literals and the new char8_t fundamental
type).
The ability to opt-in or opt-out of the feature eases migration by
enabling source code compatibility. C and C++ standards are not
published at the same cadence. A project that targets C++20 and C17 may
therefore have a need to either opt-out of char8_t support on the C++
side (already possible via -fno-char8_t), or to opt-in to char8_t
support on the C side until such time as the targets change to C++20(+)
and C23(+); assuming WG14 approval at some point.
>
> I think the whole patch series would best wait until after the proposal
> has been considered by a WG14 meeting, in addition to not increasing the
> number of language dialects supported.
As an opt-in feature, this is useful to gain implementation and
deployment experience for WG14.
It would be appropriate to document this as an experimental feature
pending WG14 approval. If WG14 declines it or approves it with
different behavior, the feature can then be removed or changed.
The option could also be introduced as -fexperimental-char8_t if that
eases concerns, though I do not favor that approach due to misalignment
with the existing option for C++.
Tom.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3]: C N2653 char8_t implementation
2021-06-11 15:42 ` Tom Honermann
@ 2021-06-11 17:27 ` Joseph Myers
2021-06-13 15:35 ` Tom Honermann
0 siblings, 1 reply; 5+ messages in thread
From: Joseph Myers @ 2021-06-11 17:27 UTC (permalink / raw)
To: Tom Honermann; +Cc: gcc-patches
On Fri, 11 Jun 2021, Tom Honermann via Gcc-patches wrote:
> The option is needed because it impacts core language backward compatibility
> (for both C and C++, the type of u8 string literals; for C++, the type of u8
> character literals and the new char8_t fundamental type).
Lots of new features in new standard versions can affect backward
compatibility. We generally bundle all of those up into a single -std
option rather than having an explosion of different language variants with
different features enabled or disabled. I don't think this feature, for
C, reaches the threshold that would justify having a separate option to
control it, especially given that people can use -Wno-pointer-sign or
pointer casts or their own local char8_t typedef as an intermediate step
if they want code using u8"" strings to work for both old and new standard
versions.
I don't think u8"" strings are widely used in C library headers in a way
where the choice of type matters. (Use of a feature in library headers is
a key thing that can justify options such as -fgnu89-inline, because it
means the choice of language version is no longer fully under control of a
single project.)
The only feature proposed for C2x that I think is likely to have
significant compatibility implications in practice for a lot of code is
making bool, true and false into keywords. I still don't think a separate
option makes sense there. (If that feature is accepted for C2x, what
would be useful is for people to do distribution rebuilds with -std=gnu2x
as the default to find and fix code that breaks, in advance of the default
actually changing in GCC. But the workaround for not-yet-fixed code would
be -std=gnu11, not a separate option for that one feature.)
> > I think the whole patch series would best wait until after the proposal
> > has been considered by a WG14 meeting, in addition to not increasing the
> > number of language dialects supported.
>
> As an opt-in feature, this is useful to gain implementation and deployment
> experience for WG14.
I think this feature is one of the cases where experience in C++ is
sufficiently relevant for C (although there are certainly cases of other
language features where the languages are sufficiently different that
using C++ experience like that can be problematic).
E.g. we didn't need -fdigit-separators for C before digit separators were
added to C2x, and we don't need -fno-digit-separators now they are in C2x
(the feature is just enabled or disabled based on the language version),
although that's one of many features that do affect compatibility in
corner cases.
--
Joseph S. Myers
joseph@codesourcery.com
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 0/3]: C N2653 char8_t implementation
2021-06-11 17:27 ` Joseph Myers
@ 2021-06-13 15:35 ` Tom Honermann
0 siblings, 0 replies; 5+ messages in thread
From: Tom Honermann @ 2021-06-13 15:35 UTC (permalink / raw)
To: Joseph Myers; +Cc: gcc-patches
On 6/11/21 1:27 PM, Joseph Myers wrote:
> On Fri, 11 Jun 2021, Tom Honermann via Gcc-patches wrote:
>
>> The option is needed because it impacts core language backward compatibility
>> (for both C and C++, the type of u8 string literals; for C++, the type of u8
>> character literals and the new char8_t fundamental type).
> Lots of new features in new standard versions can affect backward
> compatibility. We generally bundle all of those up into a single -std
> option rather than having an explosion of different language variants with
> different features enabled or disabled. I don't think this feature, for
> C, reaches the threshold that would justify having a separate option to
> control it, especially given that people can use -Wno-pointer-sign or
> pointer casts or their own local char8_t typedef as an intermediate step
> if they want code using u8"" strings to work for both old and new standard
> versions.
Ok, I'm happy to defer to your experience. My perspective is likely
biased by the C++20 changes being more disruptive for that language.
>
> I don't think u8"" strings are widely used in C library headers in a way
> where the choice of type matters. (Use of a feature in library headers is
> a key thing that can justify options such as -fgnu89-inline, because it
> means the choice of language version is no longer fully under control of a
> single project.)
That aligns with my expectations.
>
> The only feature proposed for C2x that I think is likely to have
> significant compatibility implications in practice for a lot of code is
> making bool, true and false into keywords. I still don't think a separate
> option makes sense there. (If that feature is accepted for C2x, what
> would be useful is for people to do distribution rebuilds with -std=gnu2x
> as the default to find and fix code that breaks, in advance of the default
> actually changing in GCC. But the workaround for not-yet-fixed code would
> be -std=gnu11, not a separate option for that one feature.)
Ok, that comparison is helpful.
>
>>> I think the whole patch series would best wait until after the proposal
>>> has been considered by a WG14 meeting, in addition to not increasing the
>>> number of language dialects supported.
>> As an opt-in feature, this is useful to gain implementation and deployment
>> experience for WG14.
> I think this feature is one of the cases where experience in C++ is
> sufficiently relevant for C (although there are certainly cases of other
> language features where the languages are sufficiently different that
> using C++ experience like that can be problematic).
>
> E.g. we didn't need -fdigit-separators for C before digit separators were
> added to C2x, and we don't need -fno-digit-separators now they are in C2x
> (the feature is just enabled or disabled based on the language version),
> although that's one of many features that do affect compatibility in
> corner cases.
Got it, thanks again, that comparison is helpful.
Per this and prior messages, I'll revise the gcc patch series as follows
(I'll likewise revise the glibc changes, but will detail that in the
corresponding glibc mailing list thread).
1. Remove the proposed use of -fchar8_t and -fno-char8_t for C code.
2. Remove the updated documentation for the -fchar8_t option since it
won't be applicable to C code.
3. Remove the _CHAR8_T_SOURCE macro.
4. Enable the change of u8 string literal type based on -std=[gnu|c]2x
(by setting flag_char8_t if flag_isoc2x is set).
5. Condition the declarations of atomic_char8_t and
__GCC_ATOMIC_CHAR8_T_LOCK_FREE on _GNU_SOURCE or _ISOC2X_SOURCE.
6. Remove the char8 data member from cpp_options that I had added and
forgot to remove.
7. Revise the tests and rename them for consistency with other C2x tests.
If I've forgotten anything, please let me know.
Thank you for the thorough review!
Tom.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-06-13 15:35 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-07 2:31 [PATCH 0/3]: C N2653 char8_t implementation Tom Honermann
2021-06-07 21:03 ` Joseph Myers
2021-06-11 15:42 ` Tom Honermann
2021-06-11 17:27 ` Joseph Myers
2021-06-13 15:35 ` Tom Honermann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).