public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/41041] Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16
       [not found] <bug-41041-4@http.gcc.gnu.org/bugzilla/>
@ 2022-11-03 11:04 ` redi at gcc dot gnu.org
  2022-11-03 11:04 ` redi at gcc dot gnu.org
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: redi at gcc dot gnu.org @ 2022-11-03 11:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41041

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2022-11-03
             Status|RESOLVED                    |NEW
         Resolution|DUPLICATE                   |---
     Ever confirmed|0                           |1

--- Comment #6 from Jonathan Wakely <redi at gcc dot gnu.org> ---
I'm reopening this one, and closing 41040 as the dup, because this has all the
attachments.

Samuel, please send patches to the gcc-patches mailing list (as documented in
the contribution docs) instead of attaching them in bugzilla where they get
ignored for over a decade.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug c/41041] Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16
       [not found] <bug-41041-4@http.gcc.gnu.org/bugzilla/>
  2022-11-03 11:04 ` [Bug c/41041] Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16 redi at gcc dot gnu.org
@ 2022-11-03 11:04 ` redi at gcc dot gnu.org
  2022-11-03 11:10 ` redi at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: redi at gcc dot gnu.org @ 2022-11-03 11:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41041

--- Comment #7 from Jonathan Wakely <redi at gcc dot gnu.org> ---
*** Bug 41040 has been marked as a duplicate of this bug. ***

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug c/41041] Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16
       [not found] <bug-41041-4@http.gcc.gnu.org/bugzilla/>
  2022-11-03 11:04 ` [Bug c/41041] Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16 redi at gcc dot gnu.org
  2022-11-03 11:04 ` redi at gcc dot gnu.org
@ 2022-11-03 11:10 ` redi at gcc dot gnu.org
  2022-11-03 13:38 ` samuel.thibault@ens-lyon.org
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: redi at gcc dot gnu.org @ 2022-11-03 11:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41041

--- Comment #8 from Jonathan Wakely <redi at gcc dot gnu.org> ---
The difference with an explicit -fwide-exec-charset=UTF-32 seems to be the BOM.
It looks like the default is UTF-32LE, are you sure it's UCS4?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug c/41041] Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16
       [not found] <bug-41041-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2022-11-03 11:10 ` redi at gcc dot gnu.org
@ 2022-11-03 13:38 ` samuel.thibault@ens-lyon.org
  2022-11-04 10:29 ` redi at gcc dot gnu.org
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: samuel.thibault@ens-lyon.org @ 2022-11-03 13:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41041

Samuel Thibault <samuel.thibault@ens-lyon.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |WONTFIX

--- Comment #9 from Samuel Thibault <samuel.thibault@ens-lyon.org> ---
It seems it indeed is by default a UTF encoding rather than a UCS encoding:

$ LANG= gcc -fshort-wchar test.c -o test
$ LANG= gcc -fshort-wchar test.c -o test   -fwide-exec-charset=UTF-16LE 
$ LANG= gcc -fshort-wchar test.c -o test   -fwide-exec-charset=UCS-2LE 
test.c: In function `main':
test.c:7:27: error: converting to execution character set: Invalid or
incomplete multibyte or wide character
    7 |         wchar_t s[] = L"𝄞";
      |                           ^

Now there is indeed the question of the BOM. Ideally the text could mention all
of UTF-32LE, UTF-32BE, UTF-16LE, UTF-16BE, but not sure it's really worth it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug c/41041] Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16
       [not found] <bug-41041-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2022-11-03 13:38 ` samuel.thibault@ens-lyon.org
@ 2022-11-04 10:29 ` redi at gcc dot gnu.org
  2022-11-04 10:53 ` redi at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: redi at gcc dot gnu.org @ 2022-11-04 10:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41041

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|WONTFIX                     |---
           Assignee|unassigned at gcc dot gnu.org      |redi at gcc dot gnu.org
             Status|RESOLVED                    |ASSIGNED

--- Comment #10 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Now that we have macros exposing the execution character set, we can check it
easily:

$ gcc -E -dM -x c /dev/null | grep EXEC
#define __GNUC_WIDE_EXECUTION_CHARSET_NAME "UTF-32LE"
#define __GNUC_EXECUTION_CHARSET_NAME "UTF-8"

So the docs are misleading. I think I'll take this bug myself and try to
document it without too much verbosity.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug c/41041] Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16
       [not found] <bug-41041-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2022-11-04 10:29 ` redi at gcc dot gnu.org
@ 2022-11-04 10:53 ` redi at gcc dot gnu.org
  2022-11-05 12:37 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: redi at gcc dot gnu.org @ 2022-11-04 10:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41041

--- Comment #11 from Jonathan Wakely <redi at gcc dot gnu.org> ---
SOmething like this:

--- a/gcc/doc/cppopts.texi
+++ b/gcc/doc/cppopts.texi
@@ -318,9 +318,10 @@ supported by the system's @code{iconv} library routine.
 @opindex fwide-exec-charset
 @cindex character set, wide execution
 Set the wide execution character set, used for wide string and
-character constants.  The default is UTF-32 or UTF-16, whichever
-corresponds to the width of @code{wchar_t}.  As with
-@option{-fexec-charset}, @var{charset} can be any encoding supported
+character constants.  The default is one of UTF-32BE, UTF-32LE, UTF-16BE,
+or UTF-16LE, whichever corresponds to the width of @code{wchar_t} and the
+big-endian or little-endian byte order being used for code generation.  As
+with @option{-fexec-charset}, @var{charset} can be any encoding supported
 by the system's @code{iconv} library routine; however, you will have
 problems with encodings that do not fit exactly in @code{wchar_t}.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug c/41041] Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16
       [not found] <bug-41041-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2022-11-04 10:53 ` redi at gcc dot gnu.org
@ 2022-11-05 12:37 ` cvs-commit at gcc dot gnu.org
  2022-11-05 12:38 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-05 12:37 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41041

--- Comment #12 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jonathan Wakely <redi@gcc.gnu.org>:

https://gcc.gnu.org/g:e50ea3a42f058c14ee29327d5277ab0435e3d36b

commit r13-3694-ge50ea3a42f058c14ee29327d5277ab0435e3d36b
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Nov 4 12:10:32 2022 +0000

    doc: Document correct -fwide-exec-charset defaults [PR41041]

    As shown in the PR, the default is not UTF-32 but rather UTF-32BE or
    UTF-32LE, avoiding the need for a byte order mark in literals.

    gcc/ChangeLog:

            PR c/41041
            * doc/cppopts.texi: Document -fwide-exec-charset defaults
            correctly.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug c/41041] Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16
       [not found] <bug-41041-4@http.gcc.gnu.org/bugzilla/>
                   ` (6 preceding siblings ...)
  2022-11-05 12:37 ` cvs-commit at gcc dot gnu.org
@ 2022-11-05 12:38 ` cvs-commit at gcc dot gnu.org
  2022-11-05 12:38 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-05 12:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41041

--- Comment #13 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-12 branch has been updated by Jonathan Wakely
<redi@gcc.gnu.org>:

https://gcc.gnu.org/g:1342c7f46e6e3f8f29d7971531a0af18cd8429bc

commit r12-8893-g1342c7f46e6e3f8f29d7971531a0af18cd8429bc
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Nov 4 12:10:32 2022 +0000

    doc: Document correct -fwide-exec-charset defaults [PR41041]

    As shown in the PR, the default is not UTF-32 but rather UTF-32BE or
    UTF-32LE, avoiding the need for a byte order mark in literals.

    gcc/ChangeLog:

            PR c/41041
            * doc/cppopts.texi: Document -fwide-exec-charset defaults
            correctly.

    (cherry picked from commit e50ea3a42f058c14ee29327d5277ab0435e3d36b)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug c/41041] Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16
       [not found] <bug-41041-4@http.gcc.gnu.org/bugzilla/>
                   ` (7 preceding siblings ...)
  2022-11-05 12:38 ` cvs-commit at gcc dot gnu.org
@ 2022-11-05 12:38 ` cvs-commit at gcc dot gnu.org
  2022-11-05 12:45 ` cvs-commit at gcc dot gnu.org
  2022-11-05 12:45 ` redi at gcc dot gnu.org
  10 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-05 12:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41041

--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-11 branch has been updated by Jonathan Wakely
<redi@gcc.gnu.org>:

https://gcc.gnu.org/g:ae31f6acb2cf9d43a265f42c12f95e4687ac1fa4

commit r11-10365-gae31f6acb2cf9d43a265f42c12f95e4687ac1fa4
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Nov 4 12:10:32 2022 +0000

    doc: Document correct -fwide-exec-charset defaults [PR41041]

    As shown in the PR, the default is not UTF-32 but rather UTF-32BE or
    UTF-32LE, avoiding the need for a byte order mark in literals.

    gcc/ChangeLog:

            PR c/41041
            * doc/cppopts.texi: Document -fwide-exec-charset defaults
            correctly.

    (cherry picked from commit e50ea3a42f058c14ee29327d5277ab0435e3d36b)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug c/41041] Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16
       [not found] <bug-41041-4@http.gcc.gnu.org/bugzilla/>
                   ` (8 preceding siblings ...)
  2022-11-05 12:38 ` cvs-commit at gcc dot gnu.org
@ 2022-11-05 12:45 ` cvs-commit at gcc dot gnu.org
  2022-11-05 12:45 ` redi at gcc dot gnu.org
  10 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-11-05 12:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41041

--- Comment #15 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-10 branch has been updated by Jonathan Wakely
<redi@gcc.gnu.org>:

https://gcc.gnu.org/g:87b0935ed43d971a6eeebca963fb673628f138dd

commit r10-11071-g87b0935ed43d971a6eeebca963fb673628f138dd
Author: Jonathan Wakely <jwakely@redhat.com>
Date:   Fri Nov 4 12:10:32 2022 +0000

    doc: Document correct -fwide-exec-charset defaults [PR41041]

    As shown in the PR, the default is not UTF-32 but rather UTF-32BE or
    UTF-32LE, avoiding the need for a byte order mark in literals.

    gcc/ChangeLog:

            PR c/41041
            * doc/cppopts.texi: Document -fwide-exec-charset defaults
            correctly.

    (cherry picked from commit e50ea3a42f058c14ee29327d5277ab0435e3d36b)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug c/41041] Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16
       [not found] <bug-41041-4@http.gcc.gnu.org/bugzilla/>
                   ` (9 preceding siblings ...)
  2022-11-05 12:45 ` cvs-commit at gcc dot gnu.org
@ 2022-11-05 12:45 ` redi at gcc dot gnu.org
  10 siblings, 0 replies; 11+ messages in thread
From: redi at gcc dot gnu.org @ 2022-11-05 12:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41041

Jonathan Wakely <redi at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
   Target Milestone|---                         |10.5
         Resolution|---                         |FIXED

--- Comment #16 from Jonathan Wakely <redi at gcc dot gnu.org> ---
Docs fixed for 10.5, 11.4 and 12.3

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-11-05 12:45 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-41041-4@http.gcc.gnu.org/bugzilla/>
2022-11-03 11:04 ` [Bug c/41041] Documentation: -fwide-exec-charset defaults to UCS-4/UCS-2, not UTF-32/UTF-16 redi at gcc dot gnu.org
2022-11-03 11:04 ` redi at gcc dot gnu.org
2022-11-03 11:10 ` redi at gcc dot gnu.org
2022-11-03 13:38 ` samuel.thibault@ens-lyon.org
2022-11-04 10:29 ` redi at gcc dot gnu.org
2022-11-04 10:53 ` redi at gcc dot gnu.org
2022-11-05 12:37 ` cvs-commit at gcc dot gnu.org
2022-11-05 12:38 ` cvs-commit at gcc dot gnu.org
2022-11-05 12:38 ` cvs-commit at gcc dot gnu.org
2022-11-05 12:45 ` cvs-commit at gcc dot gnu.org
2022-11-05 12:45 ` redi at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).