public inbox for guile-gtk@sourceware.org
 help / color / mirror / Atom feed
From: Per Bothner <per@bothner.com>
To: Ariel Rios <jarios@usa.net>
Cc: Owen Taylor <otaylor@redhat.com>,
	guile@sourceware.cygnus.com, guile-gtk@sourceware.cygnus.com
Subject: Re: Fwd: [[Gnome-bindings] Strings and bindings]
Date: Sun, 16 Apr 2000 12:45:00 -0000	[thread overview]
Message-ID: <m23dol3kaa.fsf@kelso.bothner.com> (raw)
In-Reply-To: <20000416043318.14641.qmail@nwcst293.netaddress.usa.net>

(Feel free to forward this appropriately.)

Owen Taylor <otaylor@redhat.com> writes
> The Unicode standard is currently only using a 16-bit characters,
> all common characters for living languages are planned to be
> included in the 16-bit space, and many systems do use 16-bit
> characters. (Windows, Java, Python)
>
> Howevever, there will soon be some character sets defined out
> side of the 16-bit "Basic Multilingual Plane", and allowing
> 32-bit characters, is, IMO, nicer than confining oneself to
> an almost-full character space. 

Using 16 bits should not be a problem.  Unicode has support for
"surrogates".  This is an extension mechanism to support allowing 20
bits to be encoded using 2 16-bit Unicode characters.  That 20-bit
space is *far* from full - as far as I know, it is still officially
empty (though proposals have been made for rare scripts and symbols).

>  - Create an STL-string-like wrapper for a utf8 string. The
>    problem here is that you don't get O(1) random access, which
>    will no doubt disturb some of the people reading this.

But there is almost nothing useful you can do with strings that
requires O(1) random access using a character index, at least once
you're already dealing with non-trivial characters sets.  What you
sometimes need is efficient access to a position in the string, but
that can be a "magic cookie" represented using a byte offset.

So using UTF8 is perfectly reasonable.  Using 16-bit Unicode
with surrogates is also perfectly reasonable.  Using arrays
of 32-bit wide characters does not make sense to me (though
I know that glibc maintainer Ulrich Drepper feels strongly
otherwise).
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/~per/

      reply	other threads:[~2000-04-16 12:45 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2000-04-15 21:33 Ariel Rios
2000-04-16 12:45 ` Per Bothner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m23dol3kaa.fsf@kelso.bothner.com \
    --to=per@bothner.com \
    --cc=guile-gtk@sourceware.cygnus.com \
    --cc=guile@sourceware.cygnus.com \
    --cc=jarios@usa.net \
    --cc=otaylor@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).