From: me22 <me22.ca@gmail.com>
To: "Jim Cobban" <jcobban@magma.ca>
Cc: GCC-help <gcc-help@gcc.gnu.org>
Subject: Re: UTF-8, UTF-16 and UTF-32
Date: Wed, 27 Aug 2008 13:29:00 -0000 [thread overview]
Message-ID: <fa28b9250808261124s53988606l83dc8bba04b8dec6@mail.gmail.com> (raw)
Message-ID: <20080827132900.i4ENsQaqFeCgD56QAZytas8RflQ_8Vt-ycTc7gV1ugQ@z> (raw)
In-Reply-To: <48B44687.2040106@magma.ca>
On Tue, Aug 26, 2008 at 14:08, Jim Cobban <jcobban@magma.ca> wrote:
> The definition of a wchar_t string or std::wstring, even if a wchar_t is 16
> bits in size, is not the same thing as UTF-16. A wchar_t string or
> std::wstring, as defined by by the C, C++, and POSIX standards, contains ONE
> wchar_t value for each displayed glyph. Alternatively the value of strlen()
> for a wchar_t string is the same as the number of glyphs in the displayed
> representation of the string.
>
One wchar_t value for each codepoint -- glyphs can be formed from
multiple codepoints. (Combining characters and ligatures, for
example.)
> In these standards the size of a wchar_t is not explicitly defined except
> that it must be large enough to represent every text "character". It is
> critical to understand that a wchar_t string, as defined by these standards,
> is not the same thing as a UTF-16 string, even if a wchar_t is 16 bits in
> size. UTF-16 may use up to THREE 16-bit words to represent a single glyph,
> although I believe that almost all symbols actually used by living languages
> can be represented in a single word in UTF-16. I have not worked with
> Visual C++ recently precisely because it accepts a non-portable language.
> The last time I used it the M$ library was standards compliant, with the
> understanding that its definition of wchar_t as a 16-bit word meant the
> library could not support some languages. If the implementation of the
> wchar_t strings in the Visual C++ library has been changed to implement
> UTF-16 internally, then in my opinion it is not compliant with the POSIX, C,
> and C++ standards.
>
The outdated encoding that only supports codepoints 0x0000 through
0xFFFF is called UCS-2. ( See http://en.wikipedia.org/wiki/UTF-16 )
~ Scott
next prev parent reply other threads:[~2008-08-26 20:29 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <002901c903df$08265510$3b9c65dc@testserver>
2008-08-22 14:54 ` Eljay Love-Jensen
2008-08-23 2:00 ` Dallas Clarke
2008-08-23 2:24 ` me22
2008-08-23 2:45 ` Dallas Clarke
2008-08-23 3:06 ` me22
2008-08-23 3:52 ` Dallas Clarke
2008-08-23 4:31 ` Brian Dessent
2008-08-23 11:33 ` Andrew Haley
2008-08-23 21:41 ` Eljay Love-Jensen
2008-08-24 0:41 ` Dallas Clarke
2008-08-24 4:02 ` me22
2008-08-24 5:53 ` corey taylor
2008-08-24 6:02 ` Dallas Clarke
2008-08-24 11:11 ` me22
2008-08-24 19:11 ` Eljay Love-Jensen
2008-08-26 14:50 ` Marco Manfredini
2008-08-25 23:15 ` Matthew Woehlke
2008-08-26 4:14 ` Dallas Clarke
2008-08-26 6:03 ` Matthew Woehlke
2008-08-26 18:29 ` Jim Cobban
2008-08-26 18:37 ` me22 [this message]
2008-08-26 19:20 ` me22
2008-08-26 21:29 ` me22
2008-08-27 13:29 ` me22
2008-08-26 18:54 ` Andrew Haley
2008-08-26 21:19 ` me22
2008-08-27 8:18 ` me22
2008-08-27 11:45 ` me22
2008-08-21 5:16 Dallas Clarke
2008-08-21 9:30 ` me22
[not found] ` <004501c90350$87491330$0100a8c0@testserver>
2008-08-21 12:49 ` me22
2008-08-21 10:18 ` Andrew Haley
2008-08-21 11:50 ` Dallas Clarke
2008-08-21 12:15 ` John Love-Jensen
2008-08-21 14:38 ` John Gateley
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fa28b9250808261124s53988606l83dc8bba04b8dec6@mail.gmail.com \
--to=me22.ca@gmail.com \
--cc=gcc-help@gcc.gnu.org \
--cc=jcobban@magma.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).