public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Old UTF16 patch
@ 2007-11-01 23:31 Elena Zannoni
  2007-11-02  0:08 ` Joseph S. Myers
  0 siblings, 1 reply; 3+ messages in thread
From: Elena Zannoni @ 2007-11-01 23:31 UTC (permalink / raw)
  To: gcc; +Cc: Tom Tromey

Hi,
does anybody know if this patch ever got merged into GCC, or if UTF-16 
is currently supported?
ftp://ftp.sap.com/pub/i18N/utf16/ugcc-3.2/README

Tom, I saw you replied to this thread, so maybe you know about this:
http://mail.nl.linux.org/linux-utf8/2001-07/msg00064.html

I believe the patch was originally from Suse, if it hasn't been merged 
I'll do some more digging, and see if
somebody from Oracle can integrate this.  My understanding is that it 
hasn't been integrated yet.

thanks
elena

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Old UTF16 patch
  2007-11-01 23:31 Old UTF16 patch Elena Zannoni
@ 2007-11-02  0:08 ` Joseph S. Myers
  2007-11-06 20:53   ` Lawrence Crowl
  0 siblings, 1 reply; 3+ messages in thread
From: Joseph S. Myers @ 2007-11-02  0:08 UTC (permalink / raw)
  To: Elena Zannoni; +Cc: gcc, Tom Tromey

I haven't followed any developments relating to TR19769 in WG14 after its 
publication in detail; has WG14 yet given an answer on what should be done 
with u'C' where C represents a single character that requires a surrogate 
pair to represent in UTF-16 (to name one noted place where the TR 
underspecifies things)?

I don't think there's much worthwhile in those old patches.  Start with 
the ISO TR text, produce testcases that cover everything there and the 
desired semantics for everything the TR leaves unspecified or 
underspecified, and only once the testcases are settled work out an 
implementation for the agreed semantics.

A TR is not a standard, so for C this must be disabled in all strict 
conformance modes (note that it affects the rules for lexing and so 
changes the semantics of conforming programs); likewise for C++98.  The 
C++0x draft includes the notation from TR19769, so the feature should be 
enabled by default in C++0x (and so far as the C TR is compatible with 
C++0x, both should be followed in both C and C++ when the feature is 
enabled).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Old UTF16 patch
  2007-11-02  0:08 ` Joseph S. Myers
@ 2007-11-06 20:53   ` Lawrence Crowl
  0 siblings, 0 replies; 3+ messages in thread
From: Lawrence Crowl @ 2007-11-06 20:53 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: Elena Zannoni, gcc, Tom Tromey

On 11/1/07, Joseph S. Myers <joseph@codesourcery.com> wrote:
> I haven't followed any developments relating to TR19769 in WG14
> after its publication in detail; has WG14 yet given an answer
> on what should be done with u'C' where C represents a single
> character that requires a surrogate pair to represent in UTF-16
> (to name one noted place where the TR underspecifies things)?

Pending such an answer, I think gcc should make such characters
ill-formed.  The text in the C TR is "The corresponding character
constant is denoted by u'c-char-sequence' and has the type char16_t."
Given that surrogate pairs are unrepresentable in that type, I
conclude that the intent was to make character literals requiring
surrogates ill-formed.  The C++ standard also makes such characters
ill-formed.  Furthermore, making them ill-formed will be upward
compatible should the C committee choose some other interpretation.

> A TR is not a standard, so for C this must be disabled in all strict
> conformance modes (note that it affects the rules for lexing and so
> changes the semantics of conforming programs); likewise for C++98.
> The C++0x draft includes the notation from TR19769, so the feature
> should be enabled by default in C++0x (and so far as the C TR is
> compatible with C++0x, both should be followed in both C and C++
> when the feature is enabled).

Note that char16_t and char32_t are typedefs in C but primitive types
in C++, just like wchar_t.

-- 
Lawrence Crowl

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-11-06 19:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-01 23:31 Old UTF16 patch Elena Zannoni
2007-11-02  0:08 ` Joseph S. Myers
2007-11-06 20:53   ` Lawrence Crowl

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).