public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
From: Thomas Wolff <towo@towo.net>
To: cygwin@cygwin.com
Subject: Re: Unicode width data inconsistent/outdated
Date: Mon, 07 Aug 2017 19:31:00 -0000	[thread overview]
Message-ID: <9f7a8d16-6ebc-52ff-15ae-b1a52d23986b@towo.net> (raw)
In-Reply-To: <401b6d26-35cb-3026-afde-6bd5d09b2d71@SystematicSw.ab.ca>

Hi Brian,

Am 07.08.2017 um 21:07 schrieb Brian Inglis:
> ...
> Implementation considerations for handling the Unicode tables described in
> 	http://www.unicode.org/versions/Unicode10.0.0/ch05.pdf
> and implemented in
> 	https://www.strchr.com/multi-stage_tables
>
> ICU icu4[cj] uses a folded trie of the properties, where the unique property
> combinations are indexed, strings of those indices are generated for fixed size
> groups of character codes, unique values of those strings are then indexed, and
> those indices assigned to each character code group. The result is a multi-level
> indexing operation that returns the required property combination for each
> character.
>
> https://slidegur.com/doc/4172411/folded-trie--efficient-data-structure-for-all-of-unicode
>
> The FOX Toolkit uses a similar approach, splitting the 21 bit character code
> into 7 bit groups, with two higher levels of 7 bit indices, and more tweaks to
> eliminate redundancy.
>
> ftp://ftp.fox-toolkit.org/pub/FOX_Unicode_Tables.pdf
>
Thanks for the interesting links, I'll chech them out.
But such multi-level tables don't really help without a given procedure 
how to update them (that's only available for the lowest level, not for 
the code-embedded levels).
Also, as I've demonstrated, my more straight-forward and more efficient 
approach will even use less total space than the multi-level approach if 
packed table entries are used.
Thomas

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

  reply	other threads:[~2017-08-07 19:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-26 14:23 Thomas Wolff
2017-07-26 14:59 ` Corinna Vinschen
2017-07-26 17:03   ` Yaakov Selkowitz
2017-07-26 17:06     ` Corinna Vinschen
2017-07-27 17:09       ` Thomas Wolff
2017-07-29 15:23         ` Corinna Vinschen
2017-08-03 19:44           ` Thomas Wolff
2017-08-04 17:02             ` Corinna Vinschen
2017-08-05 19:06               ` Thomas Wolff
2017-08-05 20:24                 ` Brian Inglis
2017-08-05 20:53                   ` Thomas Wolff
2017-08-07  9:28                 ` Corinna Vinschen
2017-08-07 10:41                   ` Corinna Vinschen
2017-08-07 19:07                   ` Brian Inglis
2017-08-07 19:31                     ` Thomas Wolff [this message]
2017-08-07 21:29                       ` Brian Inglis
2017-08-08  0:29                         ` Thomas Wolff
2017-08-07 19:27                   ` Thomas Wolff
2017-08-08  8:22                     ` Corinna Vinschen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f7a8d16-6ebc-52ff-15ae-b1a52d23986b@towo.net \
    --to=towo@towo.net \
    --cc=cygwin@cygwin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).