public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "neil at daikokuya dot co dot uk" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug c/35908] Dubious charset conversions
Date: Sat, 12 Apr 2008 04:40:00 -0000 [thread overview]
Message-ID: <20080412044008.15231.qmail@sourceware.org> (raw)
In-Reply-To: <bug-35908-6@http.gcc.gnu.org/bugzilla/>
------- Comment #2 from neil at daikokuya dot co dot uk 2008-04-12 04:40 -------
Subject: Re: Dubious charset conversions
joseph at codesourcery dot com wrote:-
> > GCC accepts the following with -ansi -pedantic -Wall without diagnostics
> >
> > #include <stdlib.h>
> > wchar_t z[] = L"a" "\xff";
> >
> > GCC claims a default execution charset of UTF-8; presumably the default
> > execution wide character set is UTF-32. But "\xff" is a two-character narrow
> > execution character set string literal, with characters \xff \0, which is
> > invalid UTF-8 and so cannot be converted in a meaningful way to the execution
> > character set (whatever it is).
> >
> > I would expect the above code to be rejected, or at least diagnosed.
>
> Accepting it as equivalent to L"a\xff" (generating a wide character L'a'
> followed by one with value 0xff) seems in accordance with the principles
> of N951, the relevant ones of which are implemented in GCC.
>
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n951.htm
> http://gcc.gnu.org/ml/gcc-patches/2003-07/msg00532.html
Ah, I'd forgotten about that. That document does make much more
sense, thanks. However I think there are at least two things wrong
in "Principle 7"; I've mailed Clive about those. [The single byte
requirement cannot be fulfilled for Latin source charset to UTF-8
target, for example, and UCNs are escape sequences that typically
cannot be encoded as a single byte].
GCC should perhaps consider not creating invalid UTF-8 (i.e. no 5 or
6 bytes forms, or encodings of \ufffe \uffff etc.)
Please feel free to close this report.
Neil.
--
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35908
next prev parent reply other threads:[~2008-04-12 4:40 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-11 15:21 [Bug c/35908] New: " neil at gcc dot gnu dot org
2008-04-11 16:59 ` [Bug c/35908] " joseph at codesourcery dot com
2008-04-12 4:40 ` neil at daikokuya dot co dot uk [this message]
2009-03-30 1:03 ` jsm28 at gcc dot gnu dot org
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080412044008.15231.qmail@sourceware.org \
--to=gcc-bugzilla@gcc.gnu.org \
--cc=gcc-bugs@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).