From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 17400 invoked by alias); 7 Aug 2017 19:31:04 -0000 Mailing-List: contact cygwin-help@cygwin.com; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner@cygwin.com Mail-Followup-To: cygwin@cygwin.com Received: (qmail 17138 invoked by uid 89); 7 Aug 2017 19:30:46 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.4 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM autolearn=no version=3.3.2 spammy=07082017, Hx-languages-length:1473, 07.08.2017, demonstrated X-HELO: mout.kundenserver.de Received: from mout.kundenserver.de (HELO mout.kundenserver.de) (212.227.126.187) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 07 Aug 2017 19:30:41 +0000 Received: from [192.168.178.45] ([95.91.246.195]) by mrelayeu.kundenserver.de (mreue005 [212.227.15.167]) with ESMTPSA (Nemesis) id 0LqYLt-1dA7bv1Ep7-00e4k1 for ; Mon, 07 Aug 2017 21:30:34 +0200 Subject: Re: Unicode width data inconsistent/outdated To: cygwin@cygwin.com References: <20170726080859.GA24312@calimero.vinschen.de> <5d3cb047-49f8-26a6-d816-387a71486e99@cygwin.com> <20170726095016.GA25666@calimero.vinschen.de> <289bd98b-e644-888d-07f8-8965b6538373@towo.net> <20170728195826.GI24013@calimero.vinschen.de> <1244bd24-bb27-d185-1f24-61beae02c2cd@towo.net> <20170804170156.GL25551@calimero.vinschen.de> <30486790-c59d-9a78-6000-b3c20fb86d9d@towo.net> <20170807092820.GQ25551@calimero.vinschen.de> <401b6d26-35cb-3026-afde-6bd5d09b2d71@SystematicSw.ab.ca> From: Thomas Wolff Message-ID: <9f7a8d16-6ebc-52ff-15ae-b1a52d23986b@towo.net> Date: Mon, 07 Aug 2017 19:31:00 -0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <401b6d26-35cb-3026-afde-6bd5d09b2d71@SystematicSw.ab.ca> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-UI-Out-Filterresults: notjunk:1;V01:K0:a1KqQ8mSyOY=:YdCbibSlC/l6tq3c1bxY3L +s19k6pbffJjSgWwRIeSc9xuNWlCSz6xPbVjAVvEXBvhe/3X8gthTAsFsPb8n3HiLEyt3o3IL PuwFQBLcuY87b/f7CaTE5LAGUmhjpvIMKNHDCoC5G3kodLmbRdSFvvWml0x9o67L+8PSWm9FN RSAGtDxvCoH797/8p6lUD9yMCUjotiaNrCFUh/p/Z8emifWBKoVfrhv9OoieprvQriisGZHtF /a9DCJbGAzgdlVzqDmWBHk3SXNTACYW6VohFltty7VXhE+WrWi6bXvyzC10UQ66irTjpAKRZ6 kBHLi61TWEaDDfsUKlM+3X73idsvb/+X7gP6twOXygFVf2BniSwjeQasy5SIE6WH/KN6sNDdk Z4Bpbd5GVYurFnKPxEhgrtD/uik+O8NacTVsByxIJi/H/Bha0KmndAp68ku1zMqy+M1id/ut2 aL0G552dmjlfJhvFi3UbUcIUVmWEp89seJMm+2HYVm4/tCZR5Jh9Jeckst9Et6jXRff+LVf5i KobCykbDMYcGdkO83vP7wRgg2CNmY4eeP+Gf3p25+KlOErrlRmsoOZF7va3H+kedv6Cx3QpEJ OgfIDliYMaMODXIZZAKYDlURH+usYxLzF75sx5egGspdgwoHq8z2OwFk8dlh5u2o8DV7MsaMO u6q2b287vvfQFjwYxR+ODy29pPtZBV9chjFy/1S2mBn78mr0mnMPv3pq5x0T3JJrag9Uohjhv jC2qDX+Je3fMQhESRKD4yetAGVUD/d7Tyu1dO32BHfUHWl02BYNvnVTYdZA= X-IsSubscribed: yes X-SW-Source: 2017-08/txt/msg00078.txt.bz2 Hi Brian, Am 07.08.2017 um 21:07 schrieb Brian Inglis: > ... > Implementation considerations for handling the Unicode tables described in > http://www.unicode.org/versions/Unicode10.0.0/ch05.pdf > and implemented in > https://www.strchr.com/multi-stage_tables > > ICU icu4[cj] uses a folded trie of the properties, where the unique property > combinations are indexed, strings of those indices are generated for fixed size > groups of character codes, unique values of those strings are then indexed, and > those indices assigned to each character code group. The result is a multi-level > indexing operation that returns the required property combination for each > character. > > https://slidegur.com/doc/4172411/folded-trie--efficient-data-structure-for-all-of-unicode > > The FOX Toolkit uses a similar approach, splitting the 21 bit character code > into 7 bit groups, with two higher levels of 7 bit indices, and more tweaks to > eliminate redundancy. > > ftp://ftp.fox-toolkit.org/pub/FOX_Unicode_Tables.pdf > Thanks for the interesting links, I'll chech them out. But such multi-level tables don't really help without a given procedure how to update them (that's only available for the lowest level, not for the code-embedded levels). Also, as I've demonstrated, my more straight-forward and more efficient approach will even use less total space than the multi-level approach if packed table entries are used. Thomas -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple