* Fwd: [[Gnome-bindings] Strings and bindings]
@ 2000-04-15 21:33 Ariel Rios
2000-04-16 12:45 ` Per Bothner
0 siblings, 1 reply; 2+ messages in thread
From: Ariel Rios @ 2000-04-15 21:33 UTC (permalink / raw)
To: guile, guile-gtk
Any thoughts?
Ariel
____________________________________________________________________
Get free email and a permanent address at http://www.netaddress.com/?N=1
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Fwd: [[Gnome-bindings] Strings and bindings]
2000-04-15 21:33 Fwd: [[Gnome-bindings] Strings and bindings] Ariel Rios
@ 2000-04-16 12:45 ` Per Bothner
0 siblings, 0 replies; 2+ messages in thread
From: Per Bothner @ 2000-04-16 12:45 UTC (permalink / raw)
To: Ariel Rios; +Cc: Owen Taylor, guile, guile-gtk
(Feel free to forward this appropriately.)
Owen Taylor <otaylor@redhat.com> writes
> The Unicode standard is currently only using a 16-bit characters,
> all common characters for living languages are planned to be
> included in the 16-bit space, and many systems do use 16-bit
> characters. (Windows, Java, Python)
>
> Howevever, there will soon be some character sets defined out
> side of the 16-bit "Basic Multilingual Plane", and allowing
> 32-bit characters, is, IMO, nicer than confining oneself to
> an almost-full character space.
Using 16 bits should not be a problem. Unicode has support for
"surrogates". This is an extension mechanism to support allowing 20
bits to be encoded using 2 16-bit Unicode characters. That 20-bit
space is *far* from full - as far as I know, it is still officially
empty (though proposals have been made for rare scripts and symbols).
> - Create an STL-string-like wrapper for a utf8 string. The
> problem here is that you don't get O(1) random access, which
> will no doubt disturb some of the people reading this.
But there is almost nothing useful you can do with strings that
requires O(1) random access using a character index, at least once
you're already dealing with non-trivial characters sets. What you
sometimes need is efficient access to a position in the string, but
that can be a "magic cookie" represented using a byte offset.
So using UTF8 is perfectly reasonable. Using 16-bit Unicode
with surrogates is also perfectly reasonable. Using arrays
of 32-bit wide characters does not make sense to me (though
I know that glibc maintainer Ulrich Drepper feels strongly
otherwise).
--
--Per Bothner
per@bothner.com http://www.bothner.com/~per/
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2000-04-16 12:45 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-04-15 21:33 Fwd: [[Gnome-bindings] Strings and bindings] Ariel Rios
2000-04-16 12:45 ` Per Bothner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).