Unicode update of width and other character properties

public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed

* Unicode update of width and other character properties
@ 2017-08-06  5:36 Thomas Wolff
  2017-08-07 10:31 ` Corinna Vinschen
  2017-12-02 11:25 ` Ping: " Thomas Wolff
  0 siblings, 2 replies; 15+ messages in thread
From: Thomas Wolff @ 2017-08-06  5:36 UTC (permalink / raw)
  To: newlib

Hi,
this is a proposal to update wcwidth and the character properties 
functions isw*/towupper/towlower to Unicode 10.0, as discussed in the 
mail thread https://cygwin.com/ml/cygwin/2017-07/msg00366.html,
as well as to simplify automatic generation of respective tables for an 
easier update step.
Table size is moderate (using ranges for character properties) but there 
is still an option to reduce the two big tables in size.

The patch can be retrieved from http://towo.net/cygwin/charprops10.zip .

The Makefile.widthdata does not yet distinguish the two subdirectories
(libc/string, libc/ctypw) as it comes from a common development directory.

There is a test program in which comparison for isw*/tow* functions
between current and patched implementation can be compared.

I also provide a log of deviations of the new approach to the current 
implementation, based on Unicode 5.2 data, to compare and check.
If there are any disputable cases, I would consider that of course.

My main aim was actually to get the wcwidth data updated, for which the 
change is more obviously clear.

Thanks
Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unicode update of width and other character properties
  2017-08-06  5:36 Unicode update of width and other character properties Thomas Wolff
@ 2017-08-07 10:31 ` Corinna Vinschen
  2017-08-07 19:18   ` Thomas Wolff
  2017-12-02 11:25 ` Ping: " Thomas Wolff
  1 sibling, 1 reply; 15+ messages in thread
From: Corinna Vinschen @ 2017-08-07 10:31 UTC (permalink / raw)
  To: newlib

[-- Attachment #1: Type: text/plain, Size: 1024 bytes --]

On Aug  6 07:36, Thomas Wolff wrote:
> Hi,
> this is a proposal to update wcwidth and the character properties functions
> isw*/towupper/towlower to Unicode 10.0, as discussed in the mail thread
> https://cygwin.com/ml/cygwin/2017-07/msg00366.html,
> as well as to simplify automatic generation of respective tables for an
> easier update step.
> Table size is moderate (using ranges for character properties) but there is
> still an option to reduce the two big tables in size.

As per the aforementioned discussion the table sizes are at least
twice as big, so this should be done with all due caution towards
the goals of smaller targets.

> The patch can be retrieved from http://towo.net/cygwin/charprops10.zip .

That's not how it works.  Please create a git patch series and post
it here.

There's probably also a bit more to discuss before changing how this
works since it affects all targets using wide char functions.


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unicode update of width and other character properties
  2017-08-07 10:31 ` Corinna Vinschen
@ 2017-08-07 19:18   ` Thomas Wolff
  2017-08-08  8:30     ` Corinna Vinschen
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Wolff @ 2017-08-07 19:18 UTC (permalink / raw)
  To: newlib

Am 07.08.2017 um 12:30 schrieb Corinna Vinschen:
> On Aug  6 07:36, Thomas Wolff wrote:
>> Hi,
>> this is a proposal to update wcwidth and the character properties functions
>> isw*/towupper/towlower to Unicode 10.0, as discussed in the mail thread
>> https://cygwin.com/ml/cygwin/2017-07/msg00366.html,
>> as well as to simplify automatic generation of respective tables for an
>> easier update step.
>> Table size is moderate (using ranges for character properties) but there is
>> still an option to reduce the two big tables in size.
> As per the aforementioned discussion the table sizes are at least
> twice as big, so this should be done with all due caution towards
> the goals of smaller targets.
If I'm going to implement the packed versions, they will be even smaller 
than the current tables.

>> The patch can be retrieved from http://towo.net/cygwin/charprops10.zip .
> That's not how it works.  Please create a git patch series and post it here.
Any howto available, please? What's the git URL, how to produce the 
desired patch format/series.
And then the patch would be included here by email?

Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unicode update of width and other character properties
  2017-08-07 19:18   ` Thomas Wolff
@ 2017-08-08  8:30     ` Corinna Vinschen
  2017-08-17 11:03       ` Thomas Wolff
  0 siblings, 1 reply; 15+ messages in thread
From: Corinna Vinschen @ 2017-08-08  8:30 UTC (permalink / raw)
  To: newlib

[-- Attachment #1: Type: text/plain, Size: 1566 bytes --]

On Aug  7 21:18, Thomas Wolff wrote:
> Am 07.08.2017 um 12:30 schrieb Corinna Vinschen:
> > On Aug  6 07:36, Thomas Wolff wrote:
> > > Hi,
> > > this is a proposal to update wcwidth and the character properties functions
> > > isw*/towupper/towlower to Unicode 10.0, as discussed in the mail thread
> > > https://cygwin.com/ml/cygwin/2017-07/msg00366.html,
> > > as well as to simplify automatic generation of respective tables for an
> > > easier update step.
> > > Table size is moderate (using ranges for character properties) but there is
> > > still an option to reduce the two big tables in size.
> > As per the aforementioned discussion the table sizes are at least
> > twice as big, so this should be done with all due caution towards
> > the goals of smaller targets.
> If I'm going to implement the packed versions, they will be even smaller
> than the current tables.
> 
> > > The patch can be retrieved from http://towo.net/cygwin/charprops10.zip .
> > That's not how it works.  Please create a git patch series and post it here.
> Any howto available, please? What's the git URL,

  https://cygwin.com/git.html

> how to produce the desired patch format/series.

Just as with any other git-based project:

  $ git co -b my-stuff
  [hack, hack, hack]
  $ git commit [in useful chunks]
  $ git format-patch -X (X == number of commits)

> And then the patch would be included here by email?

Yes:

$ git send-email --to="newlib@sourceware.org"


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unicode update of width and other character properties
  2017-08-08  8:30     ` Corinna Vinschen
@ 2017-08-17 11:03       ` Thomas Wolff
  2017-12-03 14:07         ` Corinna Vinschen
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Wolff @ 2017-08-17 11:03 UTC (permalink / raw)
  To: newlib

[-- Attachment #1: Type: text/plain, Size: 2396 bytes --]

Am 08.08.2017 um 10:30 schrieb Corinna Vinschen:
> On Aug  7 21:18, Thomas Wolff wrote:
>> Am 07.08.2017 um 12:30 schrieb Corinna Vinschen:
>>> On Aug  6 07:36, Thomas Wolff wrote:
>>>> Hi,
>>>> this is a proposal to update wcwidth and the character properties functions
>>>> isw*/towupper/towlower to Unicode 10.0, as discussed in the mail thread
>>>> https://cygwin.com/ml/cygwin/2017-07/msg00366.html,
>>>> as well as to simplify automatic generation of respective tables for an
>>>> easier update step.
>>>> Table size is moderate (using ranges for character properties) but there is
>>>> still an option to reduce the two big tables in size.
>>> As per the aforementioned discussion the table sizes are at least
>>> twice as big, so this should be done with all due caution towards
>>> the goals of smaller targets.
>> If I'm going to implement the packed versions, they will be even smaller
>> than the current tables.
>>
>> ...
>> how to produce the desired patch format/series.
> Just as with any other git-based project:
>
>    $ git co -b my-stuff
>    [hack, hack, hack]
>    $ git commit [in useful chunks]
>    $ git format-patch -X (X == number of commits)
>
>> And then the patch would be included here by email?
> Yes:
>
> $ git send-email --to="newlib@sourceware.org"
I'm attaching my patches here for assessment.
I have revised table handling further, using gcc bit struct packing. The 
two big tables have a total size of 14340 bytes now, for Unicode 10.0.
I have fixed locale handling in the isw* and tow* functions, but I've 
not yet changed JP conversion. Unfortunately, the routines from 
newlib/iconvdata are not as straight-forward to be employed as I 
thought, because the work on multi-byte representations.
Also the mapping of ctype charsets (JIS, SJIS, EUC-JP) to the subsets 
handled in iconvdata (JIS-201/208/212) is a little bit obscure.
Likewise obscure is the relation between newlib/iconvdata and 
newlib/libc/iconv.
To be on the safe side, Iâ€™m leaving the actual jp2uc conversion 
untouched for now, and Iâ€™ve just added a dummy back-conversion uc2jp 
with a #warning. If the #warning is ignored or removed, the non-Cygwin 
build should work as before, fixing just locale handling.

I'm attaching the wcwidth part here, all patches are available at 
http://towo.net/cygwin/Unicode_and_locale_tweaks.zip (don't fit in the 
mailbox size limit).
Thomas


[-- Attachment #2: 0001-creation-of-width-data-supporting-Unicode-updates.patch --]
[-- Type: text/plain, Size: 26840 bytes --]

From 9c5d6b1adcf949269e3fceeaf31203921745d2c9 Mon Sep 17 00:00:00 2001
From: mintty <mintty@users.noreply.github.com>
Date: Mon, 14 Aug 2017 21:59:25 +0200
Subject: [PATCH 1/4] creation of width data, supporting Unicode updates

---
 newlib/libc/string/Makefile.widthdata |  47 +++
 newlib/libc/string/mkwide             |  49 +++
 newlib/libc/string/mkwidthA           |  20 +
 newlib/libc/string/uniset             | 678 ++++++++++++++++++++++++++++++++++
 4 files changed, 794 insertions(+)
 create mode 100644 newlib/libc/string/Makefile.widthdata
 create mode 100755 newlib/libc/string/mkwide
 create mode 100755 newlib/libc/string/mkwidthA
 create mode 100755 newlib/libc/string/uniset

diff --git a/newlib/libc/string/Makefile.widthdata b/newlib/libc/string/Makefile.widthdata
new file mode 100644
index 0000000..14adab5
--- /dev/null
+++ b/newlib/libc/string/Makefile.widthdata
@@ -0,0 +1,47 @@
+#############################################################################
+# generate Unicode width data for newlib/libc/string/wcwidth.c
+
+
+#############################################################################
+# table sets to be generated
+
+widthdata=combining.t ambiguous.t wide.t
+
+widthdata:	$(widthdata)
+
+
+#############################################################################
+# tools and data
+
+#WGET=wget -N -t 1 --timeout=55
+WGET=curl -R -O --connect-timeout 55
+WGET+=-z $@
+
+%.txt:
+	ln -s /usr/share/unicode/ucd/$@ . || $(WGET) http://unicode.org/Public/UNIDATA/$@
+
+uniset.tar.gz:
+	$(WGET) http://www.cl.cam.ac.uk/~mgk25/download/uniset.tar.gz
+
+uniset:	uniset.tar.gz
+	gzip -dc uniset.tar.gz | tar xvf - uniset
+
+
+#############################################################################
+# width data for libc/string/wcwidth.c
+
+combining.t:	uniset UnicodeData.txt Blocks.txt
+	PATH="${PATH}:." uniset +cat=Me +cat=Mn +cat=Cf -00AD +1160-11FF +200B +D7B0-D7C6 +D7CB-D7FB c > combining.t
+
+WIDTH-A:	uniset UnicodeData.txt Blocks.txt EastAsianWidth.txt
+	PATH="${PATH}:." sh ./mkwidthA
+
+ambiguous.t:	uniset WIDTH-A UnicodeData.txt Blocks.txt
+	PATH="${PATH}:." uniset +WIDTH-A -cat=Me -cat=Mn -cat=Cf c > ambiguous.t
+
+wide.t:	uniset UnicodeData.txt Blocks.txt EastAsianWidth.txt
+	PATH="${PATH}:." sh ./mkwide
+
+
+#############################################################################
+# end
diff --git a/newlib/libc/string/mkwide b/newlib/libc/string/mkwide
new file mode 100755
index 0000000..55a0bab
--- /dev/null
+++ b/newlib/libc/string/mkwide
@@ -0,0 +1,49 @@
+#! /bin/sh
+
+# generate list of wide characters, with convex closure
+
+skipcheck=false
+
+if [ ! -r EastAsianWidth.txt ]
+then	ln -s /usr/share/unicode/ucd/EastAsianWidth.txt . || exit 1
+fi
+if [ ! -r UnicodeData.txt ]
+then	ln -s /usr/share/unicode/ucd/UnicodeData.txt . || exit 1
+fi
+if [ ! -r Blocks.txt ]
+then	ln -s /usr/share/unicode/ucd/Blocks.txt . || exit 1
+fi
+
+sed -e "s,^\([^;]*\);[NAH],\1," -e t -e d EastAsianWidth.txt > wide.na
+sed -e "s,^\([^;]*\);[WF],\1," -e t -e d EastAsianWidth.txt > wide.fw
+
+PATH="$PATH:." # for uniset
+
+nrfw=`uniset +wide.fw nr | sed -e 's,.*:,,'`
+echo FW $nrfw
+nrna=`uniset +wide.na nr | sed -e 's,.*:,,'`
+echo NAH $nrna
+
+extrablocks="2E80-303E"
+
+# check all blocks
+includes () {
+	nr=`uniset +wide.$2 -$1 nr | sed -e 's,.*:,,'`
+	test $nr != $3
+}
+echo "adding compact closure of wide ranges, this may take ~10min"
+for b in $extrablocks `sed -e 's,^\([0-9A-F]*\)\.\.\([0-9A-F]*\).*,\1-\2,' -e t -e d Blocks.txt`
+do	range=$b
+	echo checking $range $* >&2
+	if includes $range fw $nrfw && ! includes $range na $nrna
+	then	echo $range
+	fi
+done > wide.blocks
+
+(
+sed -e "s,^,//," -e 1q EastAsianWidth.txt
+sed -e "s,^,//," -e 1q Blocks.txt
+uniset `sed -e 's,^,+,' wide.blocks` +wide.fw c
+) > wide.t
+
+rm -f wide.na wide.fw wide.blocks
diff --git a/newlib/libc/string/mkwidthA b/newlib/libc/string/mkwidthA
new file mode 100755
index 0000000..343ab40
--- /dev/null
+++ b/newlib/libc/string/mkwidthA
@@ -0,0 +1,20 @@
+#! /bin/sh
+
+# generate WIDTH-A file, listing Unicode characters with width property
+# Ambiguous, from EastAsianWidth.txt
+
+if [ ! -r EastAsianWidth.txt ]
+then	ln -s /usr/share/unicode/ucd/EastAsianWidth.txt . || exit 1
+fi
+if [ ! -r UnicodeData.txt ]
+then	ln -s /usr/share/unicode/ucd/UnicodeData.txt . || exit 1
+fi
+if [ ! -r Blocks.txt ]
+then	ln -s /usr/share/unicode/ucd/Blocks.txt . || exit 1
+fi
+
+sed -e "s,^\([^;]*\);A,\1," -e t -e d EastAsianWidth.txt > width-a-new
+rm -f WIDTH-A
+echo "# UAX #11: East Asian Ambiguous" > WIDTH-A
+PATH="$PATH:." uniset +width-a-new compact >> WIDTH-A
+rm -f width-a-new
diff --git a/newlib/libc/string/uniset b/newlib/libc/string/uniset
new file mode 100755
index 0000000..415e219
--- /dev/null
+++ b/newlib/libc/string/uniset
@@ -0,0 +1,678 @@
+#!/usr/bin/perl
+# Uniset -- Unicode subset manager -- Markus Kuhn
+# http://www.cl.cam.ac.uk/~mgk25/download/uniset.tar.gz
+# $Id: uniset,v 1.18 2004-04-10 21:19:39+01 mgk25 Exp mgk25 $
+
+require 5.008;
+use open ':utf8';
+
+binmode(STDOUT, ":utf8");
+binmode(STDIN, ":utf8");
+
+my (%name, %invname, %category, %comment);
+
+print <<End if $#ARGV < 0;
+Uniset -- Unicode subset manager -- Markus Kuhn
+
+Uniset allows to merge and subtract Unicode subsets. It can output and
+analyse the resulting set in various formats.
+
+The following commands can be supplied to uniset on the command line:
+
+Commands to define a set of characters:
+
+  + filename   add the character set described in the file to the set
+  - filename   remove the character set described in the file from the set
+  +: filename  add the characters in the UTF-8 file to the set
+  -: filename  remove the characters in the UTF-8 file from the set
+  +xxxx..yyyy  add the range to the set (xxxx and yyyy are hex numbers)
+  -xxxx..yyyy  remove the range from the set (xxxx and yyyy are hex numbers)
+  +cat=Xx      add all Unicode characters with category code Xx
+  -cat=Xx      remove all Unicode characters with category code Xx
+  -cat!=Xx     remove all Unicode characters without category code Xx
+  clean        remove any elements that do not appear in the Unicode database
+  unknown      remove any elements that do appear in the Unicode database
+
+Command to output descriptions of the constructed set of characters:
+
+  table        write a full table with one line per character
+  compact      output the set in compact MES format
+  c            output the set as C interval array
+  nr           output the number of characters
+  sources      output a table that shows the number of characters contributed
+               by the various combinations of input sets added with +.
+  utf8-list    output a list of all characters encoded in UTF-8
+
+Commands to tailor the following output commands:
+
+  html         write HTML tables instead of plain text
+  ucs          add the unicode character itself to the table (UTF-8 in
+               plain table, numeric character reference in HTML)
+
+Formats of character set input files read by the + and - command:
+
+Empty lines, white space at the start and end of the line and any
+comment text following a \# are ignored. The following formats are
+recognized
+
+xx yyyy             xx is the hex code in an 8-bit character set and yyyy
+                    is the corresponding Unicode value. Both can optionally
+                    be prefixed by 0x. This is the format used in the
+                    files on <ftp://ftp.unicode.org/Public/MAPPINGS/>.
+
+yyyy                yyyy (optionally prefixed with 0x) is a Unicode character
+                    belonging to the specified subset.
+
+yyyy-yyyy           a range of Unicode characters belonging to
+yyyy..yyyy          the specified subset.
+
+xx yy yy yy-yy yy   xx denotes a row (high-byte) and the yy specify
+                    corresponding low bytes or with a hyphen also ranges of
+                    low bytes in the Unicode values that belong to this
+                    subset. This is also the format that is generated by
+                    the compact command.
+End
+exit 1 if $#ARGV < 0;
+
+
+# Subroutine to identify whether the ISO 10646/Unicode character code
+# ucs belongs into the East Asian Wide (W) or East Asian FullWidth
+# (F) category as defined in Unicode Technical Report #11.
+
+sub iswide ($) {
+    my $ucs = shift(@_);
+
+    return ($ucs >= 0x1100 &&
+	    ($ucs <= 0x115f ||                     # Hangul Jamo
+	     $ucs == 0x2329 || $ucs == 0x232a ||
+	     ($ucs >= 0x2e80 && $ucs <= 0xa4cf &&
+	      $ucs != 0x303f) ||                   # CJK .. Yi
+	     ($ucs >= 0xac00 && $ucs <= 0xd7a3) || # Hangul Syllables
+	     ($ucs >= 0xf900 && $ucs <= 0xfaff) || # CJK Comp. Ideographs
+	     ($ucs >= 0xfe30 && $ucs <= 0xfe6f) || # CJK Comp. Forms
+	     ($ucs >= 0xff00 && $ucs <= 0xff60) || # Fullwidth Forms
+	     ($ucs >= 0xffe0 && $ucs <= 0xffe6) ||
+	     ($ucs >= 0x20000 && $ucs <= 0x2fffd) ||
+	     ($ucs >= 0x30000 && $ucs <= 0x3fffd)));
+}
+
+# Return the Unicode name that belongs to a given character code
+
+# Jamo short names, see Unicode 3.0, table 4-4, page 86
+
+my @lname = ('G', 'GG', 'N', 'D', 'DD', 'R', 'M', 'B', 'BB', 'S', 'SS', '',
+	     'J', 'JJ', 'C', 'K', 'T', 'P', 'H'); # 1100..1112
+my @vname = ('A', 'AE', 'YA', 'YAE', 'EO', 'E', 'YEO', 'YE', 'O',
+	     'WA', 'WAE', 'OE', 'YO', 'U', 'WEO', 'WE', 'WI', 'YU',
+	     'EU', 'YI', 'I'); # 1161..1175
+my @tname = ('G', 'GG', 'GS', 'N', 'NJ', 'NH', 'D', 'L', 'LG', 'LM',
+	     'LB', 'LS', 'LT', 'LP', 'LH', 'M', 'B', 'BS', 'S', 'SS',
+	     'NG', 'J', 'C', 'K', 'T', 'P', 'H'); # 11a8..11c2
+
+sub name {
+    my $ucs = shift(@_);
+    
+    # The intervals used here reflect Unicode Version 3.2
+    if (($ucs >=  0x3400 && $ucs <=  0x4db5) ||
+	($ucs >=  0x4e00 && $ucs <=  0x9fa5) ||
+	($ucs >= 0x20000 && $ucs <= 0x2a6d6)) {
+	return "CJK UNIFIED IDEOGRAPH-" . sprintf("%04X", $ucs);
+    }
+    
+    if ($ucs >= 0xac00 && $ucs <= 0xd7a3) {
+	my $s = $ucs - 0xac00;
+	my $l = 0x1100 + int($s / (21 * 28));
+	my $v = 0x1161 + int(($s % (21 * 28)) / 28);
+	my $t = 0x11a7 + $s % 28;
+	return "HANGUL SYLLABLE " . 
+	    ($lname[int($s / (21 * 28))] .
+	     $vname[int(($s % (21 * 28)) / 28)] .
+	     $tname[$s % 28 - 1]);
+    }
+    
+    return $name{$ucs};
+}
+
+sub is_unicode {
+    my $ucs = shift(@_);
+
+    # The intervals used here reflect Unicode Version 3.2
+    if (($ucs >=  0x3400 && $ucs <=  0x4db5) ||
+	($ucs >=  0x4e00 && $ucs <=  0x9fa5) ||
+	($ucs >=  0xac00 && $ucs <=  0xd7a3) ||
+	($ucs >= 0x20000 && $ucs <= 0x2a6d6)) {
+	return 1;
+    }
+    
+    return exists $name{$ucs};
+}
+
+
+my $html = 0;
+my $image = 0;
+my $adducs = 0;
+my $unicodedata = "UnicodeData.txt";
+my $blockdata = "Blocks.txt";
+my $datadir = "$ENV{HOME}/local/lib/ucs";
+
+# read list of all Unicode names
+if (!open(UDATA, $unicodedata) && !open(UDATA, "$datadir/$unicodedata")) {
+    die ("Can't open Unicode database '$unicodedata':\n$!\n\n" .
+	 "Please make sure that you have downloaded the file\n" .
+	 "ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt\n");
+}
+while (<UDATA>) {
+    if (/^([0-9,A-F]{4,8});([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*)$/) {
+	next if $2 ne '<control>' && substr($2, 0, 1) eq '<';
+	$ucs = hex($1);
+        $name{$ucs} = $2;
+	$invname{$2} = $ucs;
+	$category{$ucs} = $3;
+        $comment{$ucs} = $12;
+    } else {
+        die("Syntax error in line '$_' in file '$unicodedata'");
+    }
+}
+close(UDATA);
+
+# read list of all Unicode blocks
+if (!open(UDATA, $blockdata) && !open(UDATA, "$datadir/$blockdata")) {
+    die ("Can't open Unicode blockname list '$blockdata':\n$!\n\n" .
+	 "Please make sure that you have downloaded the file\n" .
+	 "ftp://ftp.unicode.org/Public/UNIDATA/Blocks.txt\n");
+}
+my $blocks = 0;
+my (@blockstart, @blockend, @blockname);
+while (<UDATA>) {
+    if (/^\s*([0-9,A-F]{4,8})\s*\.\.\s*([0-9,A-F]{4,8})\s*;\s*(.*)$/) {
+        $blockstart[$blocks] = hex($1);
+	$blockend  [$blocks] = hex($2);
+        $blockname [$blocks] = $3;
+	$blocks++;
+    } elsif (/^\s*\#/ || /^\s*$/) {
+	# ignore comments and empty lines
+    } else {
+        die("Syntax error in line '$_' in file '$blockdata'");
+    }
+}
+close(UDATA);
+if ($blockend[$blocks-1] < 0x110000) {
+    $blockstart[$blocks] = 0x110000;
+    $blockend  [$blocks] = 0x7FFFFFFF;
+    $blockname [$blocks] = "Beyond Plane 16";
+    $blocks++;
+}
+
+# process command line arguments
+while ($_ = shift(@ARGV)) {
+    if (/^html$/) {
+	$html = 1;
+    } elsif (/^ucs$/) {
+	$adducs = 1;
+    } elsif (/^img$/) {
+	$html = 1;
+	$image = 1;
+    } elsif (/^template$/) {
+	$template = shift(@ARGV);
+	open(TEMPLATE, $template) || die("Can't open template file '$template': '$!'");
+	while (<TEMPLATE>) {
+	    if (/^\#\s*include\s+\"([^\"]*)\"\s*$/) {
+		open(INCLUDE, $1) || die("Can't open template include file '$1': '$!'");
+		while (<INCLUDE>) {
+		    print $_;
+		}
+		close(INCLUDE);
+	    } elsif (/^\#\s*quote\s+\"([^\"]*)\"\s*$/) {
+		open(INCLUDE, $1) || die("Can't open template include file '$1': '$!'");
+		while (<INCLUDE>) {
+		    s/&/&amp;/g;
+		    s/</&lt;/g;
+		    print $_;
+		}
+		close(INCLUDE);
+	    } else {
+		print $_;
+	    }
+	}
+	close(TEMPLATE);
+    } elsif (/^\+cat=(.+)$/) {
+	# add characters with given category
+	$cat = $1;
+	for $i (keys(%category)) {
+	    $used{$i} = "[${cat}]" if $category{$i} eq $cat;
+	}
+    } elsif (/^\-cat=(.+)$/) {
+	# remove characters with given category
+	$cat = $1;
+	for $i (keys(%category)) {
+	    delete $used{$i} if $category{$i} eq $cat;
+	}
+    } elsif (/^\-cat!=(.+)$/) {
+	# remove characters without given category
+	$cat = $1;
+	for $i (keys(%category)) {
+	    delete $used{$i} unless $category{$i} eq $cat;
+	}
+    } elsif (/^([+-]):(.*)/) {
+	$remove = $1 eq "-";
+	$setfile = $2;
+	$setfile = shift(@ARGV) if $setfile eq "";
+	push(@SETS, $setfile);
+	open(SET, $setfile) || die("Can't open set file '$setfile': '$!'");
+	$setname = $setfile;
+	while (<SET>) {
+	    while ($_) {
+		$i = ord($_);
+		$used{$i} .= "[${setname}]" unless $remove;
+		delete $used{$i} if $remove;
+		$_ = substr($_, 1);
+	    }
+	}
+	close SET;
+    } elsif (/^([+-])(.*)/) {
+	$remove = $1 eq "-";
+	$setfile = $2;
+	$setfile = "$setfile..$setfile" if $setfile =~ /^([0-9A-Fa-f]{4,8})$/;
+	if ($setfile =~ /^([0-9A-Fa-f]{4,8})(-|\.\.)([0-9A-Fa-f]{4,8})$/) {
+	    # handle intervall specification on command line
+	    $first = hex($1);
+	    $last = hex($3);
+	    for ($i = $first; $i <= $last; $i++) {
+		$used{$i} .= "[ARG]" unless $remove;
+		delete $used{$i} if $remove;
+	    }
+	    next;
+	}
+	$setfile = shift(@ARGV) if $setfile eq "";
+	push(@SETS, $setfile);
+	open(SET, $setfile) || die("Can't open set file '$setfile': '$!'");
+	$cedf = ($setfile =~ /cedf/); # detect Kosta Kosti's trans CEDF format by path name
+	$setname = $setfile;
+	$setname =~ s/([^.\[\]]*)\..*/$1/;
+	while (<SET>) {
+	    if (/^<code_set_name>/) {
+		# handle ISO 15897 (POSIX registry) charset mapping format
+		undef $comment_char;
+		undef $escape_char;
+		while (<SET>) {
+		    if ($comment_char && /^$comment_char/) {
+			# remove comments
+			$_ = $`;
+		    }
+		    next if (/^\032?\s*$/);                                             # skip empty lines
+		    if (/^<comment_char> (\S)$/) {
+			$comment_char = $1;
+		    } elsif (/^<escape_char> (\S)$/) {
+			$escape_char = $1;
+		    } elsif (/^(END )?CHARMAP$/) {
+			#ignore
+		    } elsif (/^<.*>\s*\/x([0-9A-F]{2})\s*<U([0-9A-F]{4,8})>/) {
+			$used{hex($2)} .= "[${setname}{$1}]" unless $remove;
+			delete $used{hex($2)} if $remove;
+		    } else {
+			die("Syntax error in line $. in file '$setfile':\n'$_'\n");
+		    }
+		}
+		next;
+	    } elsif (/^STARTFONT /) {
+		# handle X11 BDF file
+		while (<SET>) {
+		    if (/^ENCODING\s+([0-9]+)/) { 
+			$used{$1} .= "[${setname}]" unless $remove;
+			delete $used{$1} if $remove;
+		    }
+		}
+		next;
+	    }
+	    tr/a-z/A-Z/;           # make input uppercase
+	    if ($cedf) {
+		if ($. > 4) {
+		    if (/^([0-9A-F]{2})\t.?\t(.*)$/) {
+			# handle Kosta Kosti's trans CEDF format
+			next if (hex($1) < 32 || (hex($1) > 0x7e && hex($1) < 0xa0));
+			$ucs = $invname{$2};
+			die "unknown ISO 10646 name '$2' in '$setfile' line $..\n" if ! $ucs;
+			$used{$ucs} .= "[${setname}{$1}]" unless $remove;
+			delete $used{$ucs} if $remove;
+		    } else {
+			die("Syntax error in line $. in CEDF file '$setfile':\n'$_'\n");
+		    }
+		}
+		next;
+	    }
+	    if (/^\s*(0X|U\+|U-)?([0-9A-F]{2})\s+\#\s*UNDEFINED\s*$/) {
+		# ignore ftp.unicode.org mapping file lines with #UNDEFINED
+		next;
+	    }
+	    s/^([^\#]*)\#.*$/$1/;  # remove comments
+	    next if (/^\032?\s*$/);     # skip empty lines
+	    if (/^\s*(0X)?([0-9A-F-]{2})\s+(0X|U\+|U-)?([0-9A-F]{4,8})\s*$/) {
+		# handle entry from a ftp.unicode.org mapping file
+		$used{hex($4)} .= "[${setname}{$2}]" unless $remove;
+		delete $used{hex($4)} if $remove;
+	    } elsif (/^\s*(0X|U\+|U-)?([0-9A-F]{4,8})(\s*-\s*|\s*\.\.\s*|\s+)(0X|U\+|U-)?([0-9A-F]{4,8})\s*$/) {
+		# handle interval specification
+		$first = hex($2);
+		$last = hex($5);
+		for ($i = $first; $i <= $last; $i++) {
+		    $used{$i} .= "[${setname}]" unless $remove;
+		    delete $used{$i} if $remove;
+		}
+	    } elsif (/^\s*([0-9A-F]{2,6})(\s+[0-9A-F]{2},?|\s+[0-9A-F]{2}-[0-9A-F]{2},?)+/) {
+		# handle lines from P10 MES draft
+		$row = $1;
+		$cols = $_;
+		$cols =~ s/^\s*([0-9A-F]{2,6})\s*(.*)\s*$/$2/;
+		$cols =~ tr/,//d;
+		@cols = split(/\s+/, $cols);
+		for (@cols) {
+		    if (/^(..)$/) {
+			$first = hex("$row$1");
+			$last  = $first;
+		    } elsif (/^(..)-(..)$/) {
+			$first = hex("$row$1");
+			$last  = hex("$row$2");
+		    } else {
+			die ("this should never happen '$_'");
+		    }
+		    for ($i = $first; $i <= $last; $i++) {
+			$used{$i} .= "[${setname}]" unless $remove;
+			delete $used{$i} if $remove;
+		    }
+		}
+	    } elsif (/^\s*(0X|U\+|U-)?([0-9A-F]{4,8})\s*/) {
+		# handle single character
+		$used{hex($2)} .= "[${setname}]" unless $remove;
+		delete $used{hex($2)} if $remove;
+	    } else {
+		die("Syntax error in line $. in file '$setfile':\n'$_'\n") unless /^\s*(\#.*)?$/;
+	    }
+	}
+	close SET;
+    } elsif (/^loadimages$/ || /^loadbigimages$/) {
+	if (/^loadimages$/) {
+	    $prefix = "Small.Glyphs";
+	} else {
+	    $prefix = "Glyphs";
+	}
+	$total = 0;
+	for $i (keys(%used)) {
+	    next if ($name{$i} eq "<control>");
+	    $total++;
+	}
+	$count = 0;
+	$| = 1;
+	for $i (sort({$a <=> $b} keys(%used))) {
+	    next if ($name{$i} eq "<control>");
+	    $count++;
+	    $j = sprintf("%04X", $i);
+	    $j =~ /(..)(..)/;
+	    $gif = "http://charts.unicode.org/Unicode.charts/$prefix/$1/U$j.gif";
+	    print("\r$count/$total: $gif");
+	    system("mkdir -p $prefix/$1; cd $prefix/$1; webcopy -u -s $gif &");
+	    select(undef, undef, undef, 0.2);
+	}
+	print("\n");
+	exit 0;
+    } elsif (/^giftable/) {
+	# form a table of glyphs (requires pbmtools installed)
+	$count = 0;
+	for $i (keys(%used)) {
+	    $count++ unless $name{$i} eq "<control>";
+	}
+	$width = int(sqrt($count/sqrt(2)) + 0.5);
+	$width = $1 if /^giftable([0-9]+)$/;
+	system("rm -f tmp-*.pnm table.pnm~ table.pnm");
+	$col = 0;
+	$row = 0;
+	for $i (sort({$a <=> $b} keys(%used))) {
+	    next if ($name{$i} eq "<control>");
+	    $j = sprintf("%04X", $i);
+	    $j =~ /(..)(..)/;
+	    $gif = "Small.Glyphs/$1/U$j.gif";
+	    $pnm = sprintf("tmp-%02x.pnm", $col);
+	    $fallback = "Small.Glyphs/FF/UFFFD.gif";
+	    system("giftopnm $gif >$pnm || { rm $pnm ; giftopnm $fallback >$pnm ; }");
+	    if (++$col == $width) {
+		system("pnmcat -lr tmp-*.pnm | cat >tmp-row.pnm");
+		if ($row == 0) {
+		    system("mv tmp-row.pnm table.pnm");
+		} else {
+		    system("mv table.pnm table.pnm~; pnmcat -tb table.pnm~ tmp-row.pnm >table.pnm");
+		}
+		$row++;
+		$col = 0;
+		system("rm -f tmp-*.pnm table.pnm~");
+	    }
+	}
+	if ($col > 0) {
+	    system("pnmcat -lr tmp-*.pnm | cat >tmp-row.pnm");
+	    if ($row == 0) {
+		system("mv tmp-row.pnm table.pnm");
+	    } else {
+		system("mv table.pnm table.pnm~; pnmcat -tb -jleft -black table.pnm~ tmp-row.pnm >table.pnm");
+	    }
+	}
+	system("rm -f table.gif ; ppmtogif table.pnm > table.gif");
+	system("rm -f tmp-*.pnm table.pnm~ table.pnm");
+    } elsif (/^table$/) {
+	# go through all used names to print full table
+	print "<TABLE border=2>\n" if $html;
+	for $i (sort({$a <=> $b} keys(%used))) {
+	    next if ($name{$i} eq "<control>");
+	    if ($html) {
+		$sources = $used{$i};
+		$sources =~ s/\]\[/, /g;
+		$sources =~ s/^\[//g;
+		$sources =~ s/\]$//g;
+		$sources =~ s/\{(..)\}/<SUB>$1<\/SUB>/g;
+		$j = sprintf("%04X", $i);
+		$j =~ /(..)(..)/;
+		$gif = "Small.Glyphs/$1/U$j.gif";
+		print "<TR>";
+		print "<TD><img width=32 height=32 src=\"$gif\">" if $image;
+		printf("<TD>&#%d;", $i) if $adducs;
+		print "<TD><SAMP>$j</SAMP><TD><SAMP>" . name($i);
+		print " ($comment{$i})" if $comment{$i};
+		print "</SAMP><TD><SMALL>$sources</SMALL>\n";
+	    } else {
+		printf("%04X \# ", $i);
+		print pack("U", $i) . " " if $adducs;
+		print name($i) ."\n";
+	    }
+	}
+	print "</TABLE>\n" if $html;
+    } elsif (/^imgblock$/) {
+	$width = 16;
+	$width = $1 if /giftable([0-9]+)/;
+	$col = 0;
+	$subline = "";
+	print "\n<P><TABLE cellspacing=0 cellpadding=0>";
+	for $i (sort({$a <=> $b} keys(%used))) {
+	    print "<TR>" if $col == 0;
+	    $j = sprintf("%04X", $i);
+	    $j =~ /(..)(..)/;
+	    $gif = "Small.Glyphs/$1/U$j.gif";
+	    $alt = name($i);
+	    print "<TD><img width=32 height=32 src=\"$gif\" alt=\"$alt\">";
+	    $subline .= "<TD><SMALL><SAMP>$j</SAMP></SMALL>";
+	    if (++$col == $width) {
+		print "<TR align=center>$subline";
+		$col = 0;
+		$subline = "";
+	    }
+	}
+	print "<TR align=center>$subline" if ($col > 0);
+	print "</TABLE>\n";
+    } elsif (/^sources$/) {
+	# count how many characters are attributed to the various source set combinations
+	print "<P>Number of occurences of source character set combinations:\n<TABLE border=2>" if $html;
+	for $i (keys(%used)) {
+	    next if ($name{$i} eq "<control>");
+	    $sources = $used{$i};
+	    $sources =~ s/\]\[/, /g;
+	    $sources =~ s/^\[//g;
+	    $sources =~ s/\]$//g;
+	    $sources =~ s/\{(..)\}//g;
+	    $contribs{$sources} += 1;
+	}
+	for $j (keys(%contribs)) {
+	    print "<TR><TD>$contribs{$j}<TD>$j\n" if $html;
+	}
+	print "</TABLE>\n" if $html;
+    } elsif (/^compact$/) {
+	# print compact table in P10 MES format
+	print "<P>Compact representation of this character set:\n<TABLE border=2>" if $html;
+	print "<TR><TD><B>Rows</B><TD><B>Positions (Cells)</B>" if $html;
+	print "\n# Plane 00\n# Rows\tPositions (Cells)\n" unless $html;
+	$current_row = '';
+	$start_col = '';
+	$last_col = '';
+	for $i (sort({$a <=> $b} keys(%used))) {
+	    next if ($name{$i} eq "<control>");
+	    $row = sprintf("%02X", $i >> 8);
+	    $col = sprintf("%02X", $i & 0xff);
+	    if ($row ne $current_row) {
+		if (($last_col ne '') and ($last_col ne $start_col)) {
+		    print "-$last_col";
+		    print "</SAMP>" if $html;
+		}
+		print "<TR><TD><SAMP>$row</SAMP><TD><SAMP>" if $html;
+		print "\n  $row\t" unless $html;
+		$len = 0;
+		$current_row = $row;
+		$start_col = '';
+	    }
+	    if ($start_col eq '') {
+		print "$col";
+		$len += 2;
+		$start_col = $col;
+		$last_col = $col;
+	    } elsif (hex($col) == hex($last_col) + 1) {
+		$last_col = $col;
+	    } else {
+		if ($last_col ne $start_col) {
+		    print "-$last_col";
+		    $len += 3;
+		}
+		if ($len > 60 && !$html) {
+		    print "\n  $row\t";
+		    $len = 0;
+		};
+		print " " if $len;
+		print "$col";
+		$len += 2 + !! $len;
+		$start_col = $col;
+		$last_col = $col;
+	    }
+	}
+	if (($last_col ne '') and ($last_col ne $start_col)) {
+	    print "-$last_col";
+	    print "</SAMP>" if $html;
+	}
+	print "\n" if ($current_row ne '');
+	print "</TABLE>\n" if $html;
+	print "\n";
+    } elsif (/^c$/) {
+	# print table as C interval array
+	print "{";
+	$last_i = '';
+	$columns = 3;
+	$col = $columns;
+	for $i (sort({$a <=> $b} keys(%used))) {
+	    next if ($name{$i} eq "<control>");
+	    if ($last_i eq '') {
+		if (++$col > $columns) { $col = 1; print "\n "; }
+		printf(" { 0x%04X, ", $i);
+		$last_i = $i;
+	    } elsif ($i == $last_i + 1) {
+		$last_i = $i;
+	    } else {
+		printf("0x%04X },", $last_i);
+		if (++$col > $columns) { $col = 1; print "\n "; }
+		printf(" { 0x%04X, ", $i);
+		$last_i = $i;
+	    }
+	}
+	if ($last_i ne '') {
+	    printf("0x%04X }", $last_i);
+	}
+	print "\n};\n";
+    } elsif (/^utf8-list$/) {
+	$col = 0;
+	$block = 0;
+	$last = -1;
+	for $i (sort({$a <=> $b} keys(%used))) {
+	    next if ($name{$i} eq "<control>");
+	    while ($blockend[$block] < $i && $block < $blocks - 1) {
+		$block++;
+	    }
+	    if ($last <= $blockend[$block-1] &&
+		$i < $blockstart[$block]) {
+		print "\n" if ($col);
+		printf "\nFree block (U+%04X-U+%04X):\n\n",
+		    $blockend[$block-1] + 1, $blockstart[$block] - 1;
+		$col = 0;
+	    }
+	    if ($last < $blockstart[$block] && $i >= $blockstart[$block]) {
+		print "\n" if ($col);
+		printf "\n$blockname[$block] (U+%04X-U+%04X):\n\n",
+		$blockstart[$block], $blockend[$block];
+		$col = 0;
+	    }
+	    if ($category{$i} eq 'Mn') {
+		# prefix non-spacing character with U+25CC DOTTED CIRCLE
+		print "\x{25CC}";
+	    } elsif ($category{$i} eq 'Me') {
+		# prefix enclosing non-spacing character with space
+		print " ";
+	    }
+	    print pack("U", $i);
+	    $col += 1 + iswide($i);
+	    if ($col >= 64) {
+		print "\n";
+		$col = 0;
+	    }
+	    $last = $i;
+	}
+	print "\n" if ($col);
+    } elsif (/^collections$/) {
+	$block = 0;
+	$last = -1;
+	for $i (sort({$a <=> $b} keys(%used))) {
+	    next if ($name{$i} eq "<control>");
+	    while ($blockend[$block] < $i && $block < $blocks - 1) {
+		$block++;
+	    }
+	    if ($last < $blockstart[$block] && $i >= $blockstart[$block]) {
+		print $blockname[$block],
+		  " " x (40 - length($blockname[$block]));
+		printf "%04X-%04X\n",
+		  $blockstart[$block], $blockend[$block];
+	    }
+	    $last = $i;
+	}
+    } elsif (/^nr$/) {
+	print "<P>" if $html;
+	print "# " unless $html;
+	print "Number of characters in above table: ";
+	$count = 0;
+	for $i (keys(%used)) {
+	    $count++ unless $name{$i} eq "<control>";
+	}
+	print $count;
+	print "\n";
+    } elsif (/^clean$/) {
+	# remove characters from set that are not in $unicodedata
+	for $i (keys(%used)) {
+	    delete $used{$i} unless is_unicode($i);
+	}
+    } elsif (/^unknown$/) {
+	# remove characters from set that are in $unicodedata
+	for $i (keys(%used)) {
+	    delete $used{$i} if is_unicode($i);
+	}
+    } else {
+	die("Unknown command line command '$_'");
+    };
+}
-- 
2.13.2


[-- Attachment #3: 0002-generated-width-data-included-in-repository-because-.patch --]
[-- Type: text/plain, Size: 22167 bytes --]

From 00c7da38274b433f952a87732e58f2e22fc5229e Mon Sep 17 00:00:00 2001
From: mintty <mintty@users.noreply.github.com>
Date: Mon, 14 Aug 2017 22:00:44 +0200
Subject: [PATCH 2/4] generated width data, included in repository because of
 long creation time

---
 newlib/libc/string/WIDTH-A     | 569 +++++++++++++++++++++++++++++++++++++++++
 newlib/libc/string/ambiguous.t |  61 +++++
 newlib/libc/string/combining.t | 107 ++++++++
 newlib/libc/string/wide.t      |  33 +++
 4 files changed, 770 insertions(+)
 create mode 100644 newlib/libc/string/WIDTH-A
 create mode 100644 newlib/libc/string/ambiguous.t
 create mode 100644 newlib/libc/string/combining.t
 create mode 100644 newlib/libc/string/wide.t

diff --git a/newlib/libc/string/WIDTH-A b/newlib/libc/string/WIDTH-A
new file mode 100644
index 0000000..51e8f23
--- /dev/null
+++ b/newlib/libc/string/WIDTH-A
@@ -0,0 +1,569 @@
+# UAX #11: East Asian Ambiguous
+
+# Plane 00
+# Rows	Positions (Cells)
+
+  00	A1 A4 A7-A8 AA AD-AE B0-B4 B6-BA BC-BF C6 D0 D7-D8 DE-E1 E6 E8-EA
+  00	EC-ED F0 F2-F3 F7-FA FC FE
+  01	01 11 13 1B 26-27 2B 31-33 38 3F-42 44 48-4B 4D 52-53 66-67 6B
+  01	CE D0 D2 D4 D6 D8 DA DC
+  02	51 61 C4 C7 C9-CB CD D0 D8-DB DD DF
+  03	00-6F 91-A1 A3-A9 B1-C1 C3-C9
+  04	01 10-4F 51
+  20	10 13-16 18-19 1C-1D 20-22 24-27 30 32-33 35 3B 3E 74 7F 81-84
+  20	AC
+  21	03 05 09 13 16 21-22 26 2B 53-54 5B-5E 60-6B 70-79 89 90-99 B8-B9
+  21	D2 D4 E7
+  22	00 02-03 07-08 0B 0F 11 15 1A 1D-20 23 25 27-2C 2E 34-37 3C-3D
+  22	48 4C 52 60-61 64-67 6A-6B 6E-6F 82-83 86-87 95 99 A5 BF
+  23	12
+  24	60-E9 EB-FF
+  25	00-4B 50-73 80-8F 92-95 A0-A1 A3-A9 B2-B3 B6-B7 BC-BD C0-C1 C6-C8
+  25	CB CE-D1 E2-E5 EF
+  26	05-06 09 0E-0F 1C 1E 40 42 60-61 63-65 67-6A 6C-6D 6F 9E-9F BF
+  26	C6-CD CF-D3 D5-E1 E3 E8-E9 EB-F1 F4 F6-F9 FB-FC FE-FF
+  27	3D 76-7F
+  2B	56-59
+  32	48-4F
+  E0	00-FF
+  E1	00-FF
+  E2	00-FF
+  E3	00-FF
+  E4	00-FF
+  E5	00-FF
+  E6	00-FF
+  E7	00-FF
+  E8	00-FF
+  E9	00-FF
+  EA	00-FF
+  EB	00-FF
+  EC	00-FF
+  ED	00-FF
+  EE	00-FF
+  EF	00-FF
+  F0	00-FF
+  F1	00-FF
+  F2	00-FF
+  F3	00-FF
+  F4	00-FF
+  F5	00-FF
+  F6	00-FF
+  F7	00-FF
+  F8	00-FF
+  FE	00-0F
+  FF	FD
+  1F1	00-0A 10-2D 30-69 70-8D 8F-90 9B-AC
+  E01	00-EF
+  F00	00-FF
+  F01	00-FF
+  F02	00-FF
+  F03	00-FF
+  F04	00-FF
+  F05	00-FF
+  F06	00-FF
+  F07	00-FF
+  F08	00-FF
+  F09	00-FF
+  F0A	00-FF
+  F0B	00-FF
+  F0C	00-FF
+  F0D	00-FF
+  F0E	00-FF
+  F0F	00-FF
+  F10	00-FF
+  F11	00-FF
+  F12	00-FF
+  F13	00-FF
+  F14	00-FF
+  F15	00-FF
+  F16	00-FF
+  F17	00-FF
+  F18	00-FF
+  F19	00-FF
+  F1A	00-FF
+  F1B	00-FF
+  F1C	00-FF
+  F1D	00-FF
+  F1E	00-FF
+  F1F	00-FF
+  F20	00-FF
+  F21	00-FF
+  F22	00-FF
+  F23	00-FF
+  F24	00-FF
+  F25	00-FF
+  F26	00-FF
+  F27	00-FF
+  F28	00-FF
+  F29	00-FF
+  F2A	00-FF
+  F2B	00-FF
+  F2C	00-FF
+  F2D	00-FF
+  F2E	00-FF
+  F2F	00-FF
+  F30	00-FF
+  F31	00-FF
+  F32	00-FF
+  F33	00-FF
+  F34	00-FF
+  F35	00-FF
+  F36	00-FF
+  F37	00-FF
+  F38	00-FF
+  F39	00-FF
+  F3A	00-FF
+  F3B	00-FF
+  F3C	00-FF
+  F3D	00-FF
+  F3E	00-FF
+  F3F	00-FF
+  F40	00-FF
+  F41	00-FF
+  F42	00-FF
+  F43	00-FF
+  F44	00-FF
+  F45	00-FF
+  F46	00-FF
+  F47	00-FF
+  F48	00-FF
+  F49	00-FF
+  F4A	00-FF
+  F4B	00-FF
+  F4C	00-FF
+  F4D	00-FF
+  F4E	00-FF
+  F4F	00-FF
+  F50	00-FF
+  F51	00-FF
+  F52	00-FF
+  F53	00-FF
+  F54	00-FF
+  F55	00-FF
+  F56	00-FF
+  F57	00-FF
+  F58	00-FF
+  F59	00-FF
+  F5A	00-FF
+  F5B	00-FF
+  F5C	00-FF
+  F5D	00-FF
+  F5E	00-FF
+  F5F	00-FF
+  F60	00-FF
+  F61	00-FF
+  F62	00-FF
+  F63	00-FF
+  F64	00-FF
+  F65	00-FF
+  F66	00-FF
+  F67	00-FF
+  F68	00-FF
+  F69	00-FF
+  F6A	00-FF
+  F6B	00-FF
+  F6C	00-FF
+  F6D	00-FF
+  F6E	00-FF
+  F6F	00-FF
+  F70	00-FF
+  F71	00-FF
+  F72	00-FF
+  F73	00-FF
+  F74	00-FF
+  F75	00-FF
+  F76	00-FF
+  F77	00-FF
+  F78	00-FF
+  F79	00-FF
+  F7A	00-FF
+  F7B	00-FF
+  F7C	00-FF
+  F7D	00-FF
+  F7E	00-FF
+  F7F	00-FF
+  F80	00-FF
+  F81	00-FF
+  F82	00-FF
+  F83	00-FF
+  F84	00-FF
+  F85	00-FF
+  F86	00-FF
+  F87	00-FF
+  F88	00-FF
+  F89	00-FF
+  F8A	00-FF
+  F8B	00-FF
+  F8C	00-FF
+  F8D	00-FF
+  F8E	00-FF
+  F8F	00-FF
+  F90	00-FF
+  F91	00-FF
+  F92	00-FF
+  F93	00-FF
+  F94	00-FF
+  F95	00-FF
+  F96	00-FF
+  F97	00-FF
+  F98	00-FF
+  F99	00-FF
+  F9A	00-FF
+  F9B	00-FF
+  F9C	00-FF
+  F9D	00-FF
+  F9E	00-FF
+  F9F	00-FF
+  FA0	00-FF
+  FA1	00-FF
+  FA2	00-FF
+  FA3	00-FF
+  FA4	00-FF
+  FA5	00-FF
+  FA6	00-FF
+  FA7	00-FF
+  FA8	00-FF
+  FA9	00-FF
+  FAA	00-FF
+  FAB	00-FF
+  FAC	00-FF
+  FAD	00-FF
+  FAE	00-FF
+  FAF	00-FF
+  FB0	00-FF
+  FB1	00-FF
+  FB2	00-FF
+  FB3	00-FF
+  FB4	00-FF
+  FB5	00-FF
+  FB6	00-FF
+  FB7	00-FF
+  FB8	00-FF
+  FB9	00-FF
+  FBA	00-FF
+  FBB	00-FF
+  FBC	00-FF
+  FBD	00-FF
+  FBE	00-FF
+  FBF	00-FF
+  FC0	00-FF
+  FC1	00-FF
+  FC2	00-FF
+  FC3	00-FF
+  FC4	00-FF
+  FC5	00-FF
+  FC6	00-FF
+  FC7	00-FF
+  FC8	00-FF
+  FC9	00-FF
+  FCA	00-FF
+  FCB	00-FF
+  FCC	00-FF
+  FCD	00-FF
+  FCE	00-FF
+  FCF	00-FF
+  FD0	00-FF
+  FD1	00-FF
+  FD2	00-FF
+  FD3	00-FF
+  FD4	00-FF
+  FD5	00-FF
+  FD6	00-FF
+  FD7	00-FF
+  FD8	00-FF
+  FD9	00-FF
+  FDA	00-FF
+  FDB	00-FF
+  FDC	00-FF
+  FDD	00-FF
+  FDE	00-FF
+  FDF	00-FF
+  FE0	00-FF
+  FE1	00-FF
+  FE2	00-FF
+  FE3	00-FF
+  FE4	00-FF
+  FE5	00-FF
+  FE6	00-FF
+  FE7	00-FF
+  FE8	00-FF
+  FE9	00-FF
+  FEA	00-FF
+  FEB	00-FF
+  FEC	00-FF
+  FED	00-FF
+  FEE	00-FF
+  FEF	00-FF
+  FF0	00-FF
+  FF1	00-FF
+  FF2	00-FF
+  FF3	00-FF
+  FF4	00-FF
+  FF5	00-FF
+  FF6	00-FF
+  FF7	00-FF
+  FF8	00-FF
+  FF9	00-FF
+  FFA	00-FF
+  FFB	00-FF
+  FFC	00-FF
+  FFD	00-FF
+  FFE	00-FF
+  FFF	00-FD
+  1000	00-FF
+  1001	00-FF
+  1002	00-FF
+  1003	00-FF
+  1004	00-FF
+  1005	00-FF
+  1006	00-FF
+  1007	00-FF
+  1008	00-FF
+  1009	00-FF
+  100A	00-FF
+  100B	00-FF
+  100C	00-FF
+  100D	00-FF
+  100E	00-FF
+  100F	00-FF
+  1010	00-FF
+  1011	00-FF
+  1012	00-FF
+  1013	00-FF
+  1014	00-FF
+  1015	00-FF
+  1016	00-FF
+  1017	00-FF
+  1018	00-FF
+  1019	00-FF
+  101A	00-FF
+  101B	00-FF
+  101C	00-FF
+  101D	00-FF
+  101E	00-FF
+  101F	00-FF
+  1020	00-FF
+  1021	00-FF
+  1022	00-FF
+  1023	00-FF
+  1024	00-FF
+  1025	00-FF
+  1026	00-FF
+  1027	00-FF
+  1028	00-FF
+  1029	00-FF
+  102A	00-FF
+  102B	00-FF
+  102C	00-FF
+  102D	00-FF
+  102E	00-FF
+  102F	00-FF
+  1030	00-FF
+  1031	00-FF
+  1032	00-FF
+  1033	00-FF
+  1034	00-FF
+  1035	00-FF
+  1036	00-FF
+  1037	00-FF
+  1038	00-FF
+  1039	00-FF
+  103A	00-FF
+  103B	00-FF
+  103C	00-FF
+  103D	00-FF
+  103E	00-FF
+  103F	00-FF
+  1040	00-FF
+  1041	00-FF
+  1042	00-FF
+  1043	00-FF
+  1044	00-FF
+  1045	00-FF
+  1046	00-FF
+  1047	00-FF
+  1048	00-FF
+  1049	00-FF
+  104A	00-FF
+  104B	00-FF
+  104C	00-FF
+  104D	00-FF
+  104E	00-FF
+  104F	00-FF
+  1050	00-FF
+  1051	00-FF
+  1052	00-FF
+  1053	00-FF
+  1054	00-FF
+  1055	00-FF
+  1056	00-FF
+  1057	00-FF
+  1058	00-FF
+  1059	00-FF
+  105A	00-FF
+  105B	00-FF
+  105C	00-FF
+  105D	00-FF
+  105E	00-FF
+  105F	00-FF
+  1060	00-FF
+  1061	00-FF
+  1062	00-FF
+  1063	00-FF
+  1064	00-FF
+  1065	00-FF
+  1066	00-FF
+  1067	00-FF
+  1068	00-FF
+  1069	00-FF
+  106A	00-FF
+  106B	00-FF
+  106C	00-FF
+  106D	00-FF
+  106E	00-FF
+  106F	00-FF
+  1070	00-FF
+  1071	00-FF
+  1072	00-FF
+  1073	00-FF
+  1074	00-FF
+  1075	00-FF
+  1076	00-FF
+  1077	00-FF
+  1078	00-FF
+  1079	00-FF
+  107A	00-FF
+  107B	00-FF
+  107C	00-FF
+  107D	00-FF
+  107E	00-FF
+  107F	00-FF
+  1080	00-FF
+  1081	00-FF
+  1082	00-FF
+  1083	00-FF
+  1084	00-FF
+  1085	00-FF
+  1086	00-FF
+  1087	00-FF
+  1088	00-FF
+  1089	00-FF
+  108A	00-FF
+  108B	00-FF
+  108C	00-FF
+  108D	00-FF
+  108E	00-FF
+  108F	00-FF
+  1090	00-FF
+  1091	00-FF
+  1092	00-FF
+  1093	00-FF
+  1094	00-FF
+  1095	00-FF
+  1096	00-FF
+  1097	00-FF
+  1098	00-FF
+  1099	00-FF
+  109A	00-FF
+  109B	00-FF
+  109C	00-FF
+  109D	00-FF
+  109E	00-FF
+  109F	00-FF
+  10A0	00-FF
+  10A1	00-FF
+  10A2	00-FF
+  10A3	00-FF
+  10A4	00-FF
+  10A5	00-FF
+  10A6	00-FF
+  10A7	00-FF
+  10A8	00-FF
+  10A9	00-FF
+  10AA	00-FF
+  10AB	00-FF
+  10AC	00-FF
+  10AD	00-FF
+  10AE	00-FF
+  10AF	00-FF
+  10B0	00-FF
+  10B1	00-FF
+  10B2	00-FF
+  10B3	00-FF
+  10B4	00-FF
+  10B5	00-FF
+  10B6	00-FF
+  10B7	00-FF
+  10B8	00-FF
+  10B9	00-FF
+  10BA	00-FF
+  10BB	00-FF
+  10BC	00-FF
+  10BD	00-FF
+  10BE	00-FF
+  10BF	00-FF
+  10C0	00-FF
+  10C1	00-FF
+  10C2	00-FF
+  10C3	00-FF
+  10C4	00-FF
+  10C5	00-FF
+  10C6	00-FF
+  10C7	00-FF
+  10C8	00-FF
+  10C9	00-FF
+  10CA	00-FF
+  10CB	00-FF
+  10CC	00-FF
+  10CD	00-FF
+  10CE	00-FF
+  10CF	00-FF
+  10D0	00-FF
+  10D1	00-FF
+  10D2	00-FF
+  10D3	00-FF
+  10D4	00-FF
+  10D5	00-FF
+  10D6	00-FF
+  10D7	00-FF
+  10D8	00-FF
+  10D9	00-FF
+  10DA	00-FF
+  10DB	00-FF
+  10DC	00-FF
+  10DD	00-FF
+  10DE	00-FF
+  10DF	00-FF
+  10E0	00-FF
+  10E1	00-FF
+  10E2	00-FF
+  10E3	00-FF
+  10E4	00-FF
+  10E5	00-FF
+  10E6	00-FF
+  10E7	00-FF
+  10E8	00-FF
+  10E9	00-FF
+  10EA	00-FF
+  10EB	00-FF
+  10EC	00-FF
+  10ED	00-FF
+  10EE	00-FF
+  10EF	00-FF
+  10F0	00-FF
+  10F1	00-FF
+  10F2	00-FF
+  10F3	00-FF
+  10F4	00-FF
+  10F5	00-FF
+  10F6	00-FF
+  10F7	00-FF
+  10F8	00-FF
+  10F9	00-FF
+  10FA	00-FF
+  10FB	00-FF
+  10FC	00-FF
+  10FD	00-FF
+  10FE	00-FF
+  10FF	00-FD
+
diff --git a/newlib/libc/string/ambiguous.t b/newlib/libc/string/ambiguous.t
new file mode 100644
index 0000000..f8b7842
--- /dev/null
+++ b/newlib/libc/string/ambiguous.t
@@ -0,0 +1,61 @@
+{
+  { 0x00A1, 0x00A1 }, { 0x00A4, 0x00A4 }, { 0x00A7, 0x00A8 },
+  { 0x00AA, 0x00AA }, { 0x00AE, 0x00AE }, { 0x00B0, 0x00B4 },
+  { 0x00B6, 0x00BA }, { 0x00BC, 0x00BF }, { 0x00C6, 0x00C6 },
+  { 0x00D0, 0x00D0 }, { 0x00D7, 0x00D8 }, { 0x00DE, 0x00E1 },
+  { 0x00E6, 0x00E6 }, { 0x00E8, 0x00EA }, { 0x00EC, 0x00ED },
+  { 0x00F0, 0x00F0 }, { 0x00F2, 0x00F3 }, { 0x00F7, 0x00FA },
+  { 0x00FC, 0x00FC }, { 0x00FE, 0x00FE }, { 0x0101, 0x0101 },
+  { 0x0111, 0x0111 }, { 0x0113, 0x0113 }, { 0x011B, 0x011B },
+  { 0x0126, 0x0127 }, { 0x012B, 0x012B }, { 0x0131, 0x0133 },
+  { 0x0138, 0x0138 }, { 0x013F, 0x0142 }, { 0x0144, 0x0144 },
+  { 0x0148, 0x014B }, { 0x014D, 0x014D }, { 0x0152, 0x0153 },
+  { 0x0166, 0x0167 }, { 0x016B, 0x016B }, { 0x01CE, 0x01CE },
+  { 0x01D0, 0x01D0 }, { 0x01D2, 0x01D2 }, { 0x01D4, 0x01D4 },
+  { 0x01D6, 0x01D6 }, { 0x01D8, 0x01D8 }, { 0x01DA, 0x01DA },
+  { 0x01DC, 0x01DC }, { 0x0251, 0x0251 }, { 0x0261, 0x0261 },
+  { 0x02C4, 0x02C4 }, { 0x02C7, 0x02C7 }, { 0x02C9, 0x02CB },
+  { 0x02CD, 0x02CD }, { 0x02D0, 0x02D0 }, { 0x02D8, 0x02DB },
+  { 0x02DD, 0x02DD }, { 0x02DF, 0x02DF }, { 0x0391, 0x03A1 },
+  { 0x03A3, 0x03A9 }, { 0x03B1, 0x03C1 }, { 0x03C3, 0x03C9 },
+  { 0x0401, 0x0401 }, { 0x0410, 0x044F }, { 0x0451, 0x0451 },
+  { 0x2010, 0x2010 }, { 0x2013, 0x2016 }, { 0x2018, 0x2019 },
+  { 0x201C, 0x201D }, { 0x2020, 0x2022 }, { 0x2024, 0x2027 },
+  { 0x2030, 0x2030 }, { 0x2032, 0x2033 }, { 0x2035, 0x2035 },
+  { 0x203B, 0x203B }, { 0x203E, 0x203E }, { 0x2074, 0x2074 },
+  { 0x207F, 0x207F }, { 0x2081, 0x2084 }, { 0x20AC, 0x20AC },
+  { 0x2103, 0x2103 }, { 0x2105, 0x2105 }, { 0x2109, 0x2109 },
+  { 0x2113, 0x2113 }, { 0x2116, 0x2116 }, { 0x2121, 0x2122 },
+  { 0x2126, 0x2126 }, { 0x212B, 0x212B }, { 0x2153, 0x2154 },
+  { 0x215B, 0x215E }, { 0x2160, 0x216B }, { 0x2170, 0x2179 },
+  { 0x2189, 0x2189 }, { 0x2190, 0x2199 }, { 0x21B8, 0x21B9 },
+  { 0x21D2, 0x21D2 }, { 0x21D4, 0x21D4 }, { 0x21E7, 0x21E7 },
+  { 0x2200, 0x2200 }, { 0x2202, 0x2203 }, { 0x2207, 0x2208 },
+  { 0x220B, 0x220B }, { 0x220F, 0x220F }, { 0x2211, 0x2211 },
+  { 0x2215, 0x2215 }, { 0x221A, 0x221A }, { 0x221D, 0x2220 },
+  { 0x2223, 0x2223 }, { 0x2225, 0x2225 }, { 0x2227, 0x222C },
+  { 0x222E, 0x222E }, { 0x2234, 0x2237 }, { 0x223C, 0x223D },
+  { 0x2248, 0x2248 }, { 0x224C, 0x224C }, { 0x2252, 0x2252 },
+  { 0x2260, 0x2261 }, { 0x2264, 0x2267 }, { 0x226A, 0x226B },
+  { 0x226E, 0x226F }, { 0x2282, 0x2283 }, { 0x2286, 0x2287 },
+  { 0x2295, 0x2295 }, { 0x2299, 0x2299 }, { 0x22A5, 0x22A5 },
+  { 0x22BF, 0x22BF }, { 0x2312, 0x2312 }, { 0x2460, 0x24E9 },
+  { 0x24EB, 0x254B }, { 0x2550, 0x2573 }, { 0x2580, 0x258F },
+  { 0x2592, 0x2595 }, { 0x25A0, 0x25A1 }, { 0x25A3, 0x25A9 },
+  { 0x25B2, 0x25B3 }, { 0x25B6, 0x25B7 }, { 0x25BC, 0x25BD },
+  { 0x25C0, 0x25C1 }, { 0x25C6, 0x25C8 }, { 0x25CB, 0x25CB },
+  { 0x25CE, 0x25D1 }, { 0x25E2, 0x25E5 }, { 0x25EF, 0x25EF },
+  { 0x2605, 0x2606 }, { 0x2609, 0x2609 }, { 0x260E, 0x260F },
+  { 0x261C, 0x261C }, { 0x261E, 0x261E }, { 0x2640, 0x2640 },
+  { 0x2642, 0x2642 }, { 0x2660, 0x2661 }, { 0x2663, 0x2665 },
+  { 0x2667, 0x266A }, { 0x266C, 0x266D }, { 0x266F, 0x266F },
+  { 0x269E, 0x269F }, { 0x26BF, 0x26BF }, { 0x26C6, 0x26CD },
+  { 0x26CF, 0x26D3 }, { 0x26D5, 0x26E1 }, { 0x26E3, 0x26E3 },
+  { 0x26E8, 0x26E9 }, { 0x26EB, 0x26F1 }, { 0x26F4, 0x26F4 },
+  { 0x26F6, 0x26F9 }, { 0x26FB, 0x26FC }, { 0x26FE, 0x26FF },
+  { 0x273D, 0x273D }, { 0x2776, 0x277F }, { 0x2B56, 0x2B59 },
+  { 0x3248, 0x324F }, { 0xE000, 0xF8FF }, { 0xFFFD, 0xFFFD },
+  { 0x1F100, 0x1F10A }, { 0x1F110, 0x1F12D }, { 0x1F130, 0x1F169 },
+  { 0x1F170, 0x1F18D }, { 0x1F18F, 0x1F190 }, { 0x1F19B, 0x1F1AC },
+  { 0xF0000, 0xFFFFD }, { 0x100000, 0x10FFFD }
+};
diff --git a/newlib/libc/string/combining.t b/newlib/libc/string/combining.t
new file mode 100644
index 0000000..629d8f8
--- /dev/null
+++ b/newlib/libc/string/combining.t
@@ -0,0 +1,107 @@
+{
+  { 0x0300, 0x036F }, { 0x0483, 0x0489 }, { 0x0591, 0x05BD },
+  { 0x05BF, 0x05BF }, { 0x05C1, 0x05C2 }, { 0x05C4, 0x05C5 },
+  { 0x05C7, 0x05C7 }, { 0x0600, 0x0605 }, { 0x0610, 0x061A },
+  { 0x061C, 0x061C }, { 0x064B, 0x065F }, { 0x0670, 0x0670 },
+  { 0x06D6, 0x06DD }, { 0x06DF, 0x06E4 }, { 0x06E7, 0x06E8 },
+  { 0x06EA, 0x06ED }, { 0x070F, 0x070F }, { 0x0711, 0x0711 },
+  { 0x0730, 0x074A }, { 0x07A6, 0x07B0 }, { 0x07EB, 0x07F3 },
+  { 0x0816, 0x0819 }, { 0x081B, 0x0823 }, { 0x0825, 0x0827 },
+  { 0x0829, 0x082D }, { 0x0859, 0x085B }, { 0x08D4, 0x0902 },
+  { 0x093A, 0x093A }, { 0x093C, 0x093C }, { 0x0941, 0x0948 },
+  { 0x094D, 0x094D }, { 0x0951, 0x0957 }, { 0x0962, 0x0963 },
+  { 0x0981, 0x0981 }, { 0x09BC, 0x09BC }, { 0x09C1, 0x09C4 },
+  { 0x09CD, 0x09CD }, { 0x09E2, 0x09E3 }, { 0x0A01, 0x0A02 },
+  { 0x0A3C, 0x0A3C }, { 0x0A41, 0x0A42 }, { 0x0A47, 0x0A48 },
+  { 0x0A4B, 0x0A4D }, { 0x0A51, 0x0A51 }, { 0x0A70, 0x0A71 },
+  { 0x0A75, 0x0A75 }, { 0x0A81, 0x0A82 }, { 0x0ABC, 0x0ABC },
+  { 0x0AC1, 0x0AC5 }, { 0x0AC7, 0x0AC8 }, { 0x0ACD, 0x0ACD },
+  { 0x0AE2, 0x0AE3 }, { 0x0AFA, 0x0AFF }, { 0x0B01, 0x0B01 },
+  { 0x0B3C, 0x0B3C }, { 0x0B3F, 0x0B3F }, { 0x0B41, 0x0B44 },
+  { 0x0B4D, 0x0B4D }, { 0x0B56, 0x0B56 }, { 0x0B62, 0x0B63 },
+  { 0x0B82, 0x0B82 }, { 0x0BC0, 0x0BC0 }, { 0x0BCD, 0x0BCD },
+  { 0x0C00, 0x0C00 }, { 0x0C3E, 0x0C40 }, { 0x0C46, 0x0C48 },
+  { 0x0C4A, 0x0C4D }, { 0x0C55, 0x0C56 }, { 0x0C62, 0x0C63 },
+  { 0x0C81, 0x0C81 }, { 0x0CBC, 0x0CBC }, { 0x0CBF, 0x0CBF },
+  { 0x0CC6, 0x0CC6 }, { 0x0CCC, 0x0CCD }, { 0x0CE2, 0x0CE3 },
+  { 0x0D00, 0x0D01 }, { 0x0D3B, 0x0D3C }, { 0x0D41, 0x0D44 },
+  { 0x0D4D, 0x0D4D }, { 0x0D62, 0x0D63 }, { 0x0DCA, 0x0DCA },
+  { 0x0DD2, 0x0DD4 }, { 0x0DD6, 0x0DD6 }, { 0x0E31, 0x0E31 },
+  { 0x0E34, 0x0E3A }, { 0x0E47, 0x0E4E }, { 0x0EB1, 0x0EB1 },
+  { 0x0EB4, 0x0EB9 }, { 0x0EBB, 0x0EBC }, { 0x0EC8, 0x0ECD },
+  { 0x0F18, 0x0F19 }, { 0x0F35, 0x0F35 }, { 0x0F37, 0x0F37 },
+  { 0x0F39, 0x0F39 }, { 0x0F71, 0x0F7E }, { 0x0F80, 0x0F84 },
+  { 0x0F86, 0x0F87 }, { 0x0F8D, 0x0F97 }, { 0x0F99, 0x0FBC },
+  { 0x0FC6, 0x0FC6 }, { 0x102D, 0x1030 }, { 0x1032, 0x1037 },
+  { 0x1039, 0x103A }, { 0x103D, 0x103E }, { 0x1058, 0x1059 },
+  { 0x105E, 0x1060 }, { 0x1071, 0x1074 }, { 0x1082, 0x1082 },
+  { 0x1085, 0x1086 }, { 0x108D, 0x108D }, { 0x109D, 0x109D },
+  { 0x1160, 0x11FF }, { 0x135D, 0x135F }, { 0x1712, 0x1714 },
+  { 0x1732, 0x1734 }, { 0x1752, 0x1753 }, { 0x1772, 0x1773 },
+  { 0x17B4, 0x17B5 }, { 0x17B7, 0x17BD }, { 0x17C6, 0x17C6 },
+  { 0x17C9, 0x17D3 }, { 0x17DD, 0x17DD }, { 0x180B, 0x180E },
+  { 0x1885, 0x1886 }, { 0x18A9, 0x18A9 }, { 0x1920, 0x1922 },
+  { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193B },
+  { 0x1A17, 0x1A18 }, { 0x1A1B, 0x1A1B }, { 0x1A56, 0x1A56 },
+  { 0x1A58, 0x1A5E }, { 0x1A60, 0x1A60 }, { 0x1A62, 0x1A62 },
+  { 0x1A65, 0x1A6C }, { 0x1A73, 0x1A7C }, { 0x1A7F, 0x1A7F },
+  { 0x1AB0, 0x1ABE }, { 0x1B00, 0x1B03 }, { 0x1B34, 0x1B34 },
+  { 0x1B36, 0x1B3A }, { 0x1B3C, 0x1B3C }, { 0x1B42, 0x1B42 },
+  { 0x1B6B, 0x1B73 }, { 0x1B80, 0x1B81 }, { 0x1BA2, 0x1BA5 },
+  { 0x1BA8, 0x1BA9 }, { 0x1BAB, 0x1BAD }, { 0x1BE6, 0x1BE6 },
+  { 0x1BE8, 0x1BE9 }, { 0x1BED, 0x1BED }, { 0x1BEF, 0x1BF1 },
+  { 0x1C2C, 0x1C33 }, { 0x1C36, 0x1C37 }, { 0x1CD0, 0x1CD2 },
+  { 0x1CD4, 0x1CE0 }, { 0x1CE2, 0x1CE8 }, { 0x1CED, 0x1CED },
+  { 0x1CF4, 0x1CF4 }, { 0x1CF8, 0x1CF9 }, { 0x1DC0, 0x1DF9 },
+  { 0x1DFB, 0x1DFF }, { 0x200B, 0x200F }, { 0x202A, 0x202E },
+  { 0x2060, 0x2064 }, { 0x2066, 0x206F }, { 0x20D0, 0x20F0 },
+  { 0x2CEF, 0x2CF1 }, { 0x2D7F, 0x2D7F }, { 0x2DE0, 0x2DFF },
+  { 0x302A, 0x302D }, { 0x3099, 0x309A }, { 0xA66F, 0xA672 },
+  { 0xA674, 0xA67D }, { 0xA69E, 0xA69F }, { 0xA6F0, 0xA6F1 },
+  { 0xA802, 0xA802 }, { 0xA806, 0xA806 }, { 0xA80B, 0xA80B },
+  { 0xA825, 0xA826 }, { 0xA8C4, 0xA8C5 }, { 0xA8E0, 0xA8F1 },
+  { 0xA926, 0xA92D }, { 0xA947, 0xA951 }, { 0xA980, 0xA982 },
+  { 0xA9B3, 0xA9B3 }, { 0xA9B6, 0xA9B9 }, { 0xA9BC, 0xA9BC },
+  { 0xA9E5, 0xA9E5 }, { 0xAA29, 0xAA2E }, { 0xAA31, 0xAA32 },
+  { 0xAA35, 0xAA36 }, { 0xAA43, 0xAA43 }, { 0xAA4C, 0xAA4C },
+  { 0xAA7C, 0xAA7C }, { 0xAAB0, 0xAAB0 }, { 0xAAB2, 0xAAB4 },
+  { 0xAAB7, 0xAAB8 }, { 0xAABE, 0xAABF }, { 0xAAC1, 0xAAC1 },
+  { 0xAAEC, 0xAAED }, { 0xAAF6, 0xAAF6 }, { 0xABE5, 0xABE5 },
+  { 0xABE8, 0xABE8 }, { 0xABED, 0xABED }, { 0xD7B0, 0xD7C6 },
+  { 0xD7CB, 0xD7FB }, { 0xFB1E, 0xFB1E }, { 0xFE00, 0xFE0F },
+  { 0xFE20, 0xFE2F }, { 0xFEFF, 0xFEFF }, { 0xFFF9, 0xFFFB },
+  { 0x101FD, 0x101FD }, { 0x102E0, 0x102E0 }, { 0x10376, 0x1037A },
+  { 0x10A01, 0x10A03 }, { 0x10A05, 0x10A06 }, { 0x10A0C, 0x10A0F },
+  { 0x10A38, 0x10A3A }, { 0x10A3F, 0x10A3F }, { 0x10AE5, 0x10AE6 },
+  { 0x11001, 0x11001 }, { 0x11038, 0x11046 }, { 0x1107F, 0x11081 },
+  { 0x110B3, 0x110B6 }, { 0x110B9, 0x110BA }, { 0x110BD, 0x110BD },
+  { 0x11100, 0x11102 }, { 0x11127, 0x1112B }, { 0x1112D, 0x11134 },
+  { 0x11173, 0x11173 }, { 0x11180, 0x11181 }, { 0x111B6, 0x111BE },
+  { 0x111CA, 0x111CC }, { 0x1122F, 0x11231 }, { 0x11234, 0x11234 },
+  { 0x11236, 0x11237 }, { 0x1123E, 0x1123E }, { 0x112DF, 0x112DF },
+  { 0x112E3, 0x112EA }, { 0x11300, 0x11301 }, { 0x1133C, 0x1133C },
+  { 0x11340, 0x11340 }, { 0x11366, 0x1136C }, { 0x11370, 0x11374 },
+  { 0x11438, 0x1143F }, { 0x11442, 0x11444 }, { 0x11446, 0x11446 },
+  { 0x114B3, 0x114B8 }, { 0x114BA, 0x114BA }, { 0x114BF, 0x114C0 },
+  { 0x114C2, 0x114C3 }, { 0x115B2, 0x115B5 }, { 0x115BC, 0x115BD },
+  { 0x115BF, 0x115C0 }, { 0x115DC, 0x115DD }, { 0x11633, 0x1163A },
+  { 0x1163D, 0x1163D }, { 0x1163F, 0x11640 }, { 0x116AB, 0x116AB },
+  { 0x116AD, 0x116AD }, { 0x116B0, 0x116B5 }, { 0x116B7, 0x116B7 },
+  { 0x1171D, 0x1171F }, { 0x11722, 0x11725 }, { 0x11727, 0x1172B },
+  { 0x11A01, 0x11A06 }, { 0x11A09, 0x11A0A }, { 0x11A33, 0x11A38 },
+  { 0x11A3B, 0x11A3E }, { 0x11A47, 0x11A47 }, { 0x11A51, 0x11A56 },
+  { 0x11A59, 0x11A5B }, { 0x11A8A, 0x11A96 }, { 0x11A98, 0x11A99 },
+  { 0x11C30, 0x11C36 }, { 0x11C38, 0x11C3D }, { 0x11C3F, 0x11C3F },
+  { 0x11C92, 0x11CA7 }, { 0x11CAA, 0x11CB0 }, { 0x11CB2, 0x11CB3 },
+  { 0x11CB5, 0x11CB6 }, { 0x11D31, 0x11D36 }, { 0x11D3A, 0x11D3A },
+  { 0x11D3C, 0x11D3D }, { 0x11D3F, 0x11D45 }, { 0x11D47, 0x11D47 },
+  { 0x16AF0, 0x16AF4 }, { 0x16B30, 0x16B36 }, { 0x16F8F, 0x16F92 },
+  { 0x1BC9D, 0x1BC9E }, { 0x1BCA0, 0x1BCA3 }, { 0x1D167, 0x1D169 },
+  { 0x1D173, 0x1D182 }, { 0x1D185, 0x1D18B }, { 0x1D1AA, 0x1D1AD },
+  { 0x1D242, 0x1D244 }, { 0x1DA00, 0x1DA36 }, { 0x1DA3B, 0x1DA6C },
+  { 0x1DA75, 0x1DA75 }, { 0x1DA84, 0x1DA84 }, { 0x1DA9B, 0x1DA9F },
+  { 0x1DAA1, 0x1DAAF }, { 0x1E000, 0x1E006 }, { 0x1E008, 0x1E018 },
+  { 0x1E01B, 0x1E021 }, { 0x1E023, 0x1E024 }, { 0x1E026, 0x1E02A },
+  { 0x1E8D0, 0x1E8D6 }, { 0x1E944, 0x1E94A }, { 0xE0001, 0xE0001 },
+  { 0xE0020, 0xE007F }, { 0xE0100, 0xE01EF }
+};
diff --git a/newlib/libc/string/wide.t b/newlib/libc/string/wide.t
new file mode 100644
index 0000000..8d0e243
--- /dev/null
+++ b/newlib/libc/string/wide.t
@@ -0,0 +1,33 @@
+//# EastAsianWidth-10.0.0.txt
+//# Blocks-10.0.0.txt
+{
+  { 0x1100, 0x115F }, { 0x231A, 0x231B }, { 0x2329, 0x232A },
+  { 0x23E9, 0x23EC }, { 0x23F0, 0x23F0 }, { 0x23F3, 0x23F3 },
+  { 0x25FD, 0x25FE }, { 0x2614, 0x2615 }, { 0x2648, 0x2653 },
+  { 0x267F, 0x267F }, { 0x2693, 0x2693 }, { 0x26A1, 0x26A1 },
+  { 0x26AA, 0x26AB }, { 0x26BD, 0x26BE }, { 0x26C4, 0x26C5 },
+  { 0x26CE, 0x26CE }, { 0x26D4, 0x26D4 }, { 0x26EA, 0x26EA },
+  { 0x26F2, 0x26F3 }, { 0x26F5, 0x26F5 }, { 0x26FA, 0x26FA },
+  { 0x26FD, 0x26FD }, { 0x2705, 0x2705 }, { 0x270A, 0x270B },
+  { 0x2728, 0x2728 }, { 0x274C, 0x274C }, { 0x274E, 0x274E },
+  { 0x2753, 0x2755 }, { 0x2757, 0x2757 }, { 0x2795, 0x2797 },
+  { 0x27B0, 0x27B0 }, { 0x27BF, 0x27BF }, { 0x2B1B, 0x2B1C },
+  { 0x2B50, 0x2B50 }, { 0x2B55, 0x2B55 }, { 0x2E80, 0x303E },
+  { 0x3040, 0x321E }, { 0x3220, 0x3247 }, { 0x3250, 0x32FE },
+  { 0x3300, 0x4DBF }, { 0x4E00, 0xA4CF }, { 0xA960, 0xA97F },
+  { 0xAC00, 0xD7AF }, { 0xF900, 0xFAFF }, { 0xFE10, 0xFE1F },
+  { 0xFE30, 0xFE6F }, { 0xFF01, 0xFF60 }, { 0xFFE0, 0xFFE6 },
+  { 0x16FE0, 0x18AFF }, { 0x1B000, 0x1B12F }, { 0x1B170, 0x1B2FF },
+  { 0x1F004, 0x1F004 }, { 0x1F0CF, 0x1F0CF }, { 0x1F18E, 0x1F18E },
+  { 0x1F191, 0x1F19A }, { 0x1F200, 0x1F320 }, { 0x1F32D, 0x1F335 },
+  { 0x1F337, 0x1F37C }, { 0x1F37E, 0x1F393 }, { 0x1F3A0, 0x1F3CA },
+  { 0x1F3CF, 0x1F3D3 }, { 0x1F3E0, 0x1F3F0 }, { 0x1F3F4, 0x1F3F4 },
+  { 0x1F3F8, 0x1F43E }, { 0x1F440, 0x1F440 }, { 0x1F442, 0x1F4FC },
+  { 0x1F4FF, 0x1F53D }, { 0x1F54B, 0x1F54E }, { 0x1F550, 0x1F567 },
+  { 0x1F57A, 0x1F57A }, { 0x1F595, 0x1F596 }, { 0x1F5A4, 0x1F5A4 },
+  { 0x1F5FB, 0x1F64F }, { 0x1F680, 0x1F6C5 }, { 0x1F6CC, 0x1F6CC },
+  { 0x1F6D0, 0x1F6D2 }, { 0x1F6EB, 0x1F6EC }, { 0x1F6F4, 0x1F6F8 },
+  { 0x1F910, 0x1F93E }, { 0x1F940, 0x1F94C }, { 0x1F950, 0x1F96B },
+  { 0x1F980, 0x1F997 }, { 0x1F9C0, 0x1F9C0 }, { 0x1F9D0, 0x1F9E6 },
+  { 0x20000, 0x2FFFD }, { 0x30000, 0x3FFFD }
+};
-- 
2.13.2


[-- Attachment #4: 0003-use-generated-width-data.patch --]
[-- Type: text/plain, Size: 9789 bytes --]

From 5d73691295b0013d78c1ce7c7ab0b0be0549d754 Mon Sep 17 00:00:00 2001
From: mintty <mintty@users.noreply.github.com>
Date: Mon, 14 Aug 2017 22:01:01 +0200
Subject: [PATCH 3/4] use generated width data

---
 newlib/libc/string/wcwidth.c | 146 +++++++------------------------------------
 1 file changed, 22 insertions(+), 124 deletions(-)

diff --git a/newlib/libc/string/wcwidth.c b/newlib/libc/string/wcwidth.c
index ac5c47f..73c036a 100644
--- a/newlib/libc/string/wcwidth.c
+++ b/newlib/libc/string/wcwidth.c
@@ -7,18 +7,18 @@ INDEX
 
 ANSI_SYNOPSIS
 	#include <wchar.h>
-	int wcwidth(const wchar_t <[wc]>);
+	int wcwidth(const wint_t <[wc]>);
 
 TRAD_SYNOPSIS
 	#include <wchar.h>
 	int wcwidth(<[wc]>)
-	wchar_t *<[wc]>;
+	wint_t *<[wc]>;
 
 DESCRIPTION
 	The <<wcwidth>> function shall determine the number of column
 	positions required for the wide character <[wc]>. The application
 	shall ensure that the value of <[wc]> is a character representable
-	as a wchar_t, and is a wide-character code corresponding to a
+	as a wint_t, and is a wide-character code corresponding to a
 	valid character in the current locale.
 
 RETURNS
@@ -174,112 +174,18 @@ _DEFUN (__wcwidth, (ucs),
 #ifdef _MB_CAPABLE
   /* sorted list of non-overlapping intervals of East Asian Ambiguous
    * characters, generated by "uniset +WIDTH-A -cat=Me -cat=Mn -cat=Cf c" */
-  static const struct interval ambiguous[] = {
-    { 0x00A1, 0x00A1 }, { 0x00A4, 0x00A4 }, { 0x00A7, 0x00A8 },
-    { 0x00AA, 0x00AA }, { 0x00AE, 0x00AE }, { 0x00B0, 0x00B4 },
-    { 0x00B6, 0x00BA }, { 0x00BC, 0x00BF }, { 0x00C6, 0x00C6 },
-    { 0x00D0, 0x00D0 }, { 0x00D7, 0x00D8 }, { 0x00DE, 0x00E1 },
-    { 0x00E6, 0x00E6 }, { 0x00E8, 0x00EA }, { 0x00EC, 0x00ED },
-    { 0x00F0, 0x00F0 }, { 0x00F2, 0x00F3 }, { 0x00F7, 0x00FA },
-    { 0x00FC, 0x00FC }, { 0x00FE, 0x00FE }, { 0x0101, 0x0101 },
-    { 0x0111, 0x0111 }, { 0x0113, 0x0113 }, { 0x011B, 0x011B },
-    { 0x0126, 0x0127 }, { 0x012B, 0x012B }, { 0x0131, 0x0133 },
-    { 0x0138, 0x0138 }, { 0x013F, 0x0142 }, { 0x0144, 0x0144 },
-    { 0x0148, 0x014B }, { 0x014D, 0x014D }, { 0x0152, 0x0153 },
-    { 0x0166, 0x0167 }, { 0x016B, 0x016B }, { 0x01CE, 0x01CE },
-    { 0x01D0, 0x01D0 }, { 0x01D2, 0x01D2 }, { 0x01D4, 0x01D4 },
-    { 0x01D6, 0x01D6 }, { 0x01D8, 0x01D8 }, { 0x01DA, 0x01DA },
-    { 0x01DC, 0x01DC }, { 0x0251, 0x0251 }, { 0x0261, 0x0261 },
-    { 0x02C4, 0x02C4 }, { 0x02C7, 0x02C7 }, { 0x02C9, 0x02CB },
-    { 0x02CD, 0x02CD }, { 0x02D0, 0x02D0 }, { 0x02D8, 0x02DB },
-    { 0x02DD, 0x02DD }, { 0x02DF, 0x02DF }, { 0x0391, 0x03A1 },
-    { 0x03A3, 0x03A9 }, { 0x03B1, 0x03C1 }, { 0x03C3, 0x03C9 },
-    { 0x0401, 0x0401 }, { 0x0410, 0x044F }, { 0x0451, 0x0451 },
-    { 0x2010, 0x2010 }, { 0x2013, 0x2016 }, { 0x2018, 0x2019 },
-    { 0x201C, 0x201D }, { 0x2020, 0x2022 }, { 0x2024, 0x2027 },
-    { 0x2030, 0x2030 }, { 0x2032, 0x2033 }, { 0x2035, 0x2035 },
-    { 0x203B, 0x203B }, { 0x203E, 0x203E }, { 0x2074, 0x2074 },
-    { 0x207F, 0x207F }, { 0x2081, 0x2084 }, { 0x20AC, 0x20AC },
-    { 0x2103, 0x2103 }, { 0x2105, 0x2105 }, { 0x2109, 0x2109 },
-    { 0x2113, 0x2113 }, { 0x2116, 0x2116 }, { 0x2121, 0x2122 },
-    { 0x2126, 0x2126 }, { 0x212B, 0x212B }, { 0x2153, 0x2154 },
-    { 0x215B, 0x215E }, { 0x2160, 0x216B }, { 0x2170, 0x2179 },
-    { 0x2190, 0x2199 }, { 0x21B8, 0x21B9 }, { 0x21D2, 0x21D2 },
-    { 0x21D4, 0x21D4 }, { 0x21E7, 0x21E7 }, { 0x2200, 0x2200 },
-    { 0x2202, 0x2203 }, { 0x2207, 0x2208 }, { 0x220B, 0x220B },
-    { 0x220F, 0x220F }, { 0x2211, 0x2211 }, { 0x2215, 0x2215 },
-    { 0x221A, 0x221A }, { 0x221D, 0x2220 }, { 0x2223, 0x2223 },
-    { 0x2225, 0x2225 }, { 0x2227, 0x222C }, { 0x222E, 0x222E },
-    { 0x2234, 0x2237 }, { 0x223C, 0x223D }, { 0x2248, 0x2248 },
-    { 0x224C, 0x224C }, { 0x2252, 0x2252 }, { 0x2260, 0x2261 },
-    { 0x2264, 0x2267 }, { 0x226A, 0x226B }, { 0x226E, 0x226F },
-    { 0x2282, 0x2283 }, { 0x2286, 0x2287 }, { 0x2295, 0x2295 },
-    { 0x2299, 0x2299 }, { 0x22A5, 0x22A5 }, { 0x22BF, 0x22BF },
-    { 0x2312, 0x2312 }, { 0x2460, 0x24E9 }, { 0x24EB, 0x254B },
-    { 0x2550, 0x2573 }, { 0x2580, 0x258F }, { 0x2592, 0x2595 },
-    { 0x25A0, 0x25A1 }, { 0x25A3, 0x25A9 }, { 0x25B2, 0x25B3 },
-    { 0x25B6, 0x25B7 }, { 0x25BC, 0x25BD }, { 0x25C0, 0x25C1 },
-    { 0x25C6, 0x25C8 }, { 0x25CB, 0x25CB }, { 0x25CE, 0x25D1 },
-    { 0x25E2, 0x25E5 }, { 0x25EF, 0x25EF }, { 0x2605, 0x2606 },
-    { 0x2609, 0x2609 }, { 0x260E, 0x260F }, { 0x2614, 0x2615 },
-    { 0x261C, 0x261C }, { 0x261E, 0x261E }, { 0x2640, 0x2640 },
-    { 0x2642, 0x2642 }, { 0x2660, 0x2661 }, { 0x2663, 0x2665 },
-    { 0x2667, 0x266A }, { 0x266C, 0x266D }, { 0x266F, 0x266F },
-    { 0x273D, 0x273D }, { 0x2776, 0x277F }, { 0xE000, 0xF8FF },
-    { 0xFFFD, 0xFFFD }, { 0xF0000, 0xFFFFD }, { 0x100000, 0x10FFFD }
-  };
+  static const struct interval ambiguous[] =
+#include "ambiguous.t"
+
   /* sorted list of non-overlapping intervals of non-spacing characters */
-  /* generated by "uniset +cat=Me +cat=Mn +cat=Cf -00AD +1160-11FF +200B c" */
-  static const struct interval combining[] = {
-    { 0x0300, 0x036F }, { 0x0483, 0x0486 }, { 0x0488, 0x0489 },
-    { 0x0591, 0x05BD }, { 0x05BF, 0x05BF }, { 0x05C1, 0x05C2 },
-    { 0x05C4, 0x05C5 }, { 0x05C7, 0x05C7 }, { 0x0600, 0x0603 },
-    { 0x0610, 0x0615 }, { 0x064B, 0x065E }, { 0x0670, 0x0670 },
-    { 0x06D6, 0x06E4 }, { 0x06E7, 0x06E8 }, { 0x06EA, 0x06ED },
-    { 0x070F, 0x070F }, { 0x0711, 0x0711 }, { 0x0730, 0x074A },
-    { 0x07A6, 0x07B0 }, { 0x07EB, 0x07F3 }, { 0x0901, 0x0902 },
-    { 0x093C, 0x093C }, { 0x0941, 0x0948 }, { 0x094D, 0x094D },
-    { 0x0951, 0x0954 }, { 0x0962, 0x0963 }, { 0x0981, 0x0981 },
-    { 0x09BC, 0x09BC }, { 0x09C1, 0x09C4 }, { 0x09CD, 0x09CD },
-    { 0x09E2, 0x09E3 }, { 0x0A01, 0x0A02 }, { 0x0A3C, 0x0A3C },
-    { 0x0A41, 0x0A42 }, { 0x0A47, 0x0A48 }, { 0x0A4B, 0x0A4D },
-    { 0x0A70, 0x0A71 }, { 0x0A81, 0x0A82 }, { 0x0ABC, 0x0ABC },
-    { 0x0AC1, 0x0AC5 }, { 0x0AC7, 0x0AC8 }, { 0x0ACD, 0x0ACD },
-    { 0x0AE2, 0x0AE3 }, { 0x0B01, 0x0B01 }, { 0x0B3C, 0x0B3C },
-    { 0x0B3F, 0x0B3F }, { 0x0B41, 0x0B43 }, { 0x0B4D, 0x0B4D },
-    { 0x0B56, 0x0B56 }, { 0x0B82, 0x0B82 }, { 0x0BC0, 0x0BC0 },
-    { 0x0BCD, 0x0BCD }, { 0x0C3E, 0x0C40 }, { 0x0C46, 0x0C48 },
-    { 0x0C4A, 0x0C4D }, { 0x0C55, 0x0C56 }, { 0x0CBC, 0x0CBC },
-    { 0x0CBF, 0x0CBF }, { 0x0CC6, 0x0CC6 }, { 0x0CCC, 0x0CCD },
-    { 0x0CE2, 0x0CE3 }, { 0x0D41, 0x0D43 }, { 0x0D4D, 0x0D4D },
-    { 0x0DCA, 0x0DCA }, { 0x0DD2, 0x0DD4 }, { 0x0DD6, 0x0DD6 },
-    { 0x0E31, 0x0E31 }, { 0x0E34, 0x0E3A }, { 0x0E47, 0x0E4E },
-    { 0x0EB1, 0x0EB1 }, { 0x0EB4, 0x0EB9 }, { 0x0EBB, 0x0EBC },
-    { 0x0EC8, 0x0ECD }, { 0x0F18, 0x0F19 }, { 0x0F35, 0x0F35 },
-    { 0x0F37, 0x0F37 }, { 0x0F39, 0x0F39 }, { 0x0F71, 0x0F7E },
-    { 0x0F80, 0x0F84 }, { 0x0F86, 0x0F87 }, { 0x0F90, 0x0F97 },
-    { 0x0F99, 0x0FBC }, { 0x0FC6, 0x0FC6 }, { 0x102D, 0x1030 },
-    { 0x1032, 0x1032 }, { 0x1036, 0x1037 }, { 0x1039, 0x1039 },
-    { 0x1058, 0x1059 }, { 0x1160, 0x11FF }, { 0x135F, 0x135F },
-    { 0x1712, 0x1714 }, { 0x1732, 0x1734 }, { 0x1752, 0x1753 },
-    { 0x1772, 0x1773 }, { 0x17B4, 0x17B5 }, { 0x17B7, 0x17BD },
-    { 0x17C6, 0x17C6 }, { 0x17C9, 0x17D3 }, { 0x17DD, 0x17DD },
-    { 0x180B, 0x180D }, { 0x18A9, 0x18A9 }, { 0x1920, 0x1922 },
-    { 0x1927, 0x1928 }, { 0x1932, 0x1932 }, { 0x1939, 0x193B },
-    { 0x1A17, 0x1A18 }, { 0x1B00, 0x1B03 }, { 0x1B34, 0x1B34 },
-    { 0x1B36, 0x1B3A }, { 0x1B3C, 0x1B3C }, { 0x1B42, 0x1B42 },
-    { 0x1B6B, 0x1B73 }, { 0x1DC0, 0x1DCA }, { 0x1DFE, 0x1DFF },
-    { 0x200B, 0x200F }, { 0x202A, 0x202E }, { 0x2060, 0x2063 },
-    { 0x206A, 0x206F }, { 0x20D0, 0x20EF }, { 0x302A, 0x302F },
-    { 0x3099, 0x309A }, { 0xA806, 0xA806 }, { 0xA80B, 0xA80B },
-    { 0xA825, 0xA826 }, { 0xFB1E, 0xFB1E }, { 0xFE00, 0xFE0F },
-    { 0xFE20, 0xFE23 }, { 0xFEFF, 0xFEFF }, { 0xFFF9, 0xFFFB },
-    { 0x10A01, 0x10A03 }, { 0x10A05, 0x10A06 }, { 0x10A0C, 0x10A0F },
-    { 0x10A38, 0x10A3A }, { 0x10A3F, 0x10A3F }, { 0x1D167, 0x1D169 },
-    { 0x1D173, 0x1D182 }, { 0x1D185, 0x1D18B }, { 0x1D1AA, 0x1D1AD },
-    { 0x1D242, 0x1D244 }, { 0xE0001, 0xE0001 }, { 0xE0020, 0xE007F },
-    { 0xE0100, 0xE01EF }
-  };
+  static const struct interval combining[] =
+#include "combining.t"
+
+  /* sorted list of non-overlapping intervals of wide characters,
+     ranges extended to Blocks where possible
+   */
+  static const struct interval wide[] =
+#include "wide.t"
 
   /* Test for NUL character */
   if (ucs == 0)
@@ -310,20 +216,12 @@ _DEFUN (__wcwidth, (ucs),
 
   /* if we arrive here, ucs is not a combining or C0/C1 control character */
 
-  return 1 + 
-    (ucs >= 0x1100 &&
-     (ucs <= 0x115f ||                    /* Hangul Jamo init. consonants */
-      ucs == 0x2329 || ucs == 0x232a ||
-      (ucs >= 0x2e80 && ucs <= 0xa4cf &&
-       ucs != 0x303f) ||                  /* CJK ... Yi */
-      (ucs >= 0xac00 && ucs <= 0xd7a3) || /* Hangul Syllables */
-      (ucs >= 0xf900 && ucs <= 0xfaff) || /* CJK Compatibility Ideographs */
-      (ucs >= 0xfe10 && ucs <= 0xfe19) || /* Vertical forms */
-      (ucs >= 0xfe30 && ucs <= 0xfe6f) || /* CJK Compatibility Forms */
-      (ucs >= 0xff00 && ucs <= 0xff60) || /* Fullwidth Forms */
-      (ucs >= 0xffe0 && ucs <= 0xffe6) ||
-      (ucs >= 0x20000 && ucs <= 0x2fffd) ||
-      (ucs >= 0x30000 && ucs <= 0x3fffd)));
+  /* binary search in table of wide character codes */
+  if (bisearch(ucs, wide,
+	       sizeof(wide) / sizeof(struct interval) - 1))
+    return 2;
+  else
+    return 1;
 #else /* !_MB_CAPABLE */
   if (iswprint (ucs))
     return 1;
@@ -333,9 +231,9 @@ _DEFUN (__wcwidth, (ucs),
 #endif /* _MB_CAPABLE */
 }
 
-int     
+int
 _DEFUN (wcwidth, (wc),
-	_CONST wchar_t wc)
+	_CONST wint_t wc)
 { 
   wint_t wi = wc;
 
-- 
2.13.2


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Ping: Unicode update of width and other character properties
  2017-08-06  5:36 Unicode update of width and other character properties Thomas Wolff
  2017-08-07 10:31 ` Corinna Vinschen
@ 2017-12-02 11:25 ` Thomas Wolff
  1 sibling, 0 replies; 15+ messages in thread
From: Thomas Wolff @ 2017-12-02 11:25 UTC (permalink / raw)
  To: newlib

Hi,
this is to remind of may patch for wcwidth Unicode consistence, as 
requested.
Thomas

-------- Weitergeleitete Nachricht --------
Betreff: 	Unicode update of width and other character properties
Datum: 	Sun, 6 Aug 2017 07:36:10 +0200
Von: 	Thomas Wolff <towo@towo.net>
An: 	newlib@sourceware.org

Hi,
this is a proposal to update wcwidth and the character properties
functions isw*/towupper/towlower to Unicode 10.0, as discussed in the
mail thread https://cygwin.com/ml/cygwin/2017-07/msg00366.html,
as well as to simplify automatic generation of respective tables for an
easier update step.
Table size is moderate (using ranges for character properties) but there
is still an option to reduce the two big tables in size.

The patch can be retrieved from http://towo.net/cygwin/charprops10.zip .

The Makefile.widthdata does not yet distinguish the two subdirectories
(libc/string, libc/ctypw) as it comes from a common development directory.

There is a test program in which comparison for isw*/tow* functions
between current and patched implementation can be compared.

I also provide a log of deviations of the new approach to the current
implementation, based on Unicode 5.2 data, to compare and check.
If there are any disputable cases, I would consider that of course.

My main aim was actually to get the wcwidth data updated, for which the
change is more obviously clear.

Thanks
Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unicode update of width and other character properties
  2017-08-17 11:03       ` Thomas Wolff
@ 2017-12-03 14:07         ` Corinna Vinschen
  2017-12-03 17:31           ` Thomas Wolff
  0 siblings, 1 reply; 15+ messages in thread
From: Corinna Vinschen @ 2017-12-03 14:07 UTC (permalink / raw)
  To: newlib

[-- Attachment #1: Type: text/plain, Size: 1874 bytes --]

Sorry for the late reply, I forgot this patch.

On Aug 17 07:53, Thomas Wolff wrote:
> [...]
> I'm attaching my patches here for assessment.
> I have revised table handling further, using gcc bit struct packing. The two
> big tables have a total size of 14340 bytes now, for Unicode 10.0.
> I have fixed locale handling in the isw* and tow* functions, but I've not
> yet changed JP conversion. Unfortunately, the routines from newlib/iconvdata
> are not as straight-forward to be employed as I thought, because the work on
> multi-byte representations.
> Also the mapping of ctype charsets (JIS, SJIS, EUC-JP) to the subsets
> handled in iconvdata (JIS-201/208/212) is a little bit obscure.
> Likewise obscure is the relation between newlib/iconvdata and
> newlib/libc/iconv.

This is really old stuff.  I wonder if anybody is still using it with
Unicode around for a long time...

> To be on the safe side, I’m leaving the actual jp2uc conversion untouched
> for now, and I’ve just added a dummy back-conversion uc2jp with a #warning.
> If the #warning is ignored or removed, the non-Cygwin build should work as
> before, fixing just locale handling.
> 
> I'm attaching the wcwidth part here, all patches are available at
> http://towo.net/cygwin/Unicode_and_locale_tweaks.zip (don't fit in the
> mailbox size limit).

So why don't you use git send-email (ideally with a cover letter, see
`git format-patch --cover-letter') instead of attaching the patches to a
single email?  This is the correct way of sending patch series and it
gets you around the size limit.

The below patches are missing a patch, last one is patch 3/4.

Patches 2 and 3 are ok, afaics, but as for patch 1, why did you create
an extra Makefile?  This should be merged into string/Makefile.am.


Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unicode update of width and other character properties
  2017-12-03 14:07         ` Corinna Vinschen
@ 2017-12-03 17:31           ` Thomas Wolff
  2017-12-03 17:33             ` Jon Turney
                               ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Thomas Wolff @ 2017-12-03 17:31 UTC (permalink / raw)
  To: newlib


> Sorry for the late reply, I forgot this patch.
>
> On Aug 17 07:53, Thomas Wolff wrote:
>> [...]
>> I'm attaching my patches here for assessment.
>> I have revised table handling further, using gcc bit struct packing. The two
>> big tables have a total size of 14340 bytes now, for Unicode 10.0.
>> I have fixed locale handling in the isw* and tow* functions, but I've not
>> yet changed JP conversion. Unfortunately, the routines from newlib/iconvdata
>> are not as straight-forward to be employed as I thought, because the work on
>> multi-byte representations.
>> Also the mapping of ctype charsets (JIS, SJIS, EUC-JP) to the subsets
>> handled in iconvdata (JIS-201/208/212) is a little bit obscure.
>> Likewise obscure is the relation between newlib/iconvdata and
>> newlib/libc/iconv.
> This is really old stuff.  I wonder if anybody is still using it with
> Unicode around for a long time...
>
>> To be on the safe side, Iâ€™m leaving the actual jp2uc conversion untouched
>> for now, and Iâ€™ve just added a dummy back-conversion uc2jp with a #warning.
>> If the #warning is ignored or removed, the non-Cygwin build should work as
>> before, fixing just locale handling.
>>
>> I'm attaching the wcwidth part here, all patches are available at
>> http://towo.net/cygwin/Unicode_and_locale_tweaks.zip (don't fit in the
>> mailbox size limit).
> So why don't you use git send-email (ideally with a cover letter, see
> `git format-patch --cover-letter') instead of attaching the patches to a
> single email?  This is the correct way of sending patch series and it
> gets you around the size limit.
Because of:
LC_ALL=C git send-email
git: 'send-email' is not a git command. See 'git --help'.

Are there any working instructions for newlib contributions to be found 
anywhere?

> The below patches are missing a patch, last one is patch 3/4.
>
> Patches 2 and 3 are ok, afaics, but as for patch 1, why did you create
> an extra Makefile?  This should be merged into string/Makefile.am.
I was awaiting feedback before doing further integration work. The 
Makefile includes test stuff and also the table generation. Would the 
generation be invoked every time or rather called manually?

Thomas

---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprÃ¼ft.
https://www.avast.com/antivirus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unicode update of width and other character properties
  2017-12-03 17:31           ` Thomas Wolff
@ 2017-12-03 17:33             ` Jon Turney
  2017-12-04  7:32             ` Brian Inglis
  2017-12-04  9:05             ` Corinna Vinschen
  2 siblings, 0 replies; 15+ messages in thread
From: Jon Turney @ 2017-12-03 17:33 UTC (permalink / raw)
  To: Thomas Wolff, newlib

On 03/12/2017 14:07, Thomas Wolff wrote:
>> On Aug 17 07:53, Thomas Wolff wrote:

>>> I'm attaching the wcwidth part here, all patches are available at
>>> http://towo.net/cygwin/Unicode_and_locale_tweaks.zip (don't fit in the
>>> mailbox size limit).
>> So why don't you use git send-email (ideally with a cover letter, see
>> `git format-patch --cover-letter') instead of attaching the patches to a
>> single email?Â  This is the correct way of sending patch series and it
>> gets you around the size limit.
> Because of:
> LC_ALL=C git send-email
> git: 'send-email' is not a git command. See 'git --help'.
> 
> Are there any working instructions for newlib contributions to be found 
> anywhere?

Due to extra deps, git-send-email is usually packaged separately.

On Cygwin, the package name is 'git-email' ('Email tools for Git')

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unicode update of width and other character properties
  2017-12-03 17:31           ` Thomas Wolff
  2017-12-03 17:33             ` Jon Turney
@ 2017-12-04  7:32             ` Brian Inglis
  2017-12-04  9:05             ` Corinna Vinschen
  2 siblings, 0 replies; 15+ messages in thread
From: Brian Inglis @ 2017-12-04  7:32 UTC (permalink / raw)
  To: newlib

On 2017-12-03 07:07, Thomas Wolff wrote:
>> On Aug 17 07:53, Thomas Wolff wrote:
>> So why don't you use git send-email (ideally with a cover letter, see `git
>> format-patch --cover-letter') instead of attaching the patches to a single
>> email?  This is the correct way of sending patch series and it gets you
>> around the size limit.
> Because of:
> LC_ALL=C git send-email
> git: 'send-email' is not a git command. See 'git --help'.

You need to install git-email, and if you run X you might also want git-gui.

Ensure $GIT_EDITOR|$VISUAL|$EDITOR stays in foreground so that you can edit
commit messages, emails, interactive rebases, merges, etc. e.g.:
	git config --global core.editor 'gvim -f'

You should not need LC_ALL=C most places these days, except to get sort, join,
uniq to play well together.

> Are there any working instructions for newlib contributions to be found
> anywhere?

Everyone assumes you are comfortable with git and understand its model.
Advice I git:

cd  .../repo

git checkout master
git pull
git checkout -b BRANCH

$VISUAL FILE
git add FILE
git commit FILE
...

git format-patch -o PATH/ --stat --cover-letter -#commits
$VISUAL PATH/0000-*cover-letter.patch
git send-email --compose PATH/000?-*.patch

Update branch to the latest upstream master:

git checkout master
git pull
git rebase [-i] master BRANCH

...

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unicode update of width and other character properties
  2017-12-03 17:31           ` Thomas Wolff
  2017-12-03 17:33             ` Jon Turney
  2017-12-04  7:32             ` Brian Inglis
@ 2017-12-04  9:05             ` Corinna Vinschen
  2018-02-25 17:14               ` Thomas Wolff
  2 siblings, 1 reply; 15+ messages in thread
From: Corinna Vinschen @ 2017-12-04  9:05 UTC (permalink / raw)
  To: newlib

[-- Attachment #1: Type: text/plain, Size: 2717 bytes --]

On Dec  3 15:07, Thomas Wolff wrote:
> > On Aug 17 07:53, Thomas Wolff wrote:
> > > [...]
> > > I have fixed locale handling in the isw* and tow* functions, but I've not
> > > yet changed JP conversion. Unfortunately, the routines from newlib/iconvdata
> > > are not as straight-forward to be employed as I thought, because the work on
> > > multi-byte representations.
> > > Also the mapping of ctype charsets (JIS, SJIS, EUC-JP) to the subsets
> > > handled in iconvdata (JIS-201/208/212) is a little bit obscure.
> > > Likewise obscure is the relation between newlib/iconvdata and
> > > newlib/libc/iconv.
> > This is really old stuff.  I wonder if anybody is still using it with
> > Unicode around for a long time...

I forgot to mention, I think your approach to keep this is the best
one for now so as not to break anything for small targets.

> > > To be on the safe side, I’m leaving the actual jp2uc conversion untouched
> > > for now, and I’ve just added a dummy back-conversion uc2jp with a #warning.
> > > If the #warning is ignored or removed, the non-Cygwin build should work as
> > > before, fixing just locale handling.
> > > 
> > > I'm attaching the wcwidth part here, all patches are available at
> > > http://towo.net/cygwin/Unicode_and_locale_tweaks.zip (don't fit in the
> > > mailbox size limit).
> > So why don't you use git send-email (ideally with a cover letter, see
> > `git format-patch --cover-letter') instead of attaching the patches to a
> > single email?  This is the correct way of sending patch series and it
> > gets you around the size limit.
> Because of:
> LC_ALL=C git send-email
> git: 'send-email' is not a git command. See 'git --help'.
> 
> Are there any working instructions for newlib contributions to be found
> anywhere?

Jon and Brian answered that.

> > The below patches are missing a patch, last one is patch 3/4.
> > 
> > Patches 2 and 3 are ok, afaics, but as for patch 1, why did you create
> > an extra Makefile?  This should be merged into string/Makefile.am.
> I was awaiting feedback before doing further integration work. The Makefile
> includes test stuff and also the table generation. Would the generation be
> invoked every time or rather called manually?

Keeping generated files in the repos is frowned upon these days, but
we're doing this for a pretty long time already and don't want everybody
having to do these, mostly awkward steps.  So, yeah, keeping the tables
in the repo and manually calling the generation targets sounds right to
me.  Only maintainers (or interested parties) need to do this once in a
while.


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unicode update of width and other character properties
  2017-12-04  9:05             ` Corinna Vinschen
@ 2018-02-25 17:14               ` Thomas Wolff
  2018-02-26 17:20                 ` Corinna Vinschen
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Wolff @ 2018-02-25 17:14 UTC (permalink / raw)
  To: newlib

I have finally revamped, manually rebased, and repackaged my Unicode 
data patches which I'll send in separate mail.
However, as I don't have a command-line sendmail set up (and apparently 
it's not as easy as it used to be),
I'll send zip archives which contain git-patch files.
There are two patches:
libc/string: wcwidth using generated width data, with data generated 
from Unicode 10.0
libc/ctype: isw* and tow* functions using generated case conversion and 
character class data, with Unicode 10.0 data
For both, generation script and a Makefile.widthdata / Makefile.chardata 
is included. As these are to be used in the source directory,
not the binary target directory, in case of future Unicode update, they 
are not related to the other Makefiles.
In ctype/, there is one new source (categories.c) which should be 
compiled separately but although I tried to include it in Makefile.am,
I could not get the build process to compile it. So the current solution 
is to include it from one of the other sources (the one that also 
maintains the case conversion table).
Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unicode update of width and other character properties
  2018-02-25 17:14               ` Thomas Wolff
@ 2018-02-26 17:20                 ` Corinna Vinschen
  2018-02-26 20:02                   ` Thomas Wolff
  0 siblings, 1 reply; 15+ messages in thread
From: Corinna Vinschen @ 2018-02-26 17:20 UTC (permalink / raw)
  To: Thomas Wolff; +Cc: newlib


[-- Attachment #1.1: Type: text/plain, Size: 2158 bytes --]

On Feb 25 18:14, Thomas Wolff wrote:
> I have finally revamped, manually rebased, and repackaged my Unicode data
> patches which I'll send in separate mail.
> However, as I don't have a command-line sendmail set up (and apparently it's
> not as easy as it used to be),
> I'll send zip archives which contain git-patch files.

No, sorry, but no.  It's not that tricky to send standard git patch
series, we're all doing this.  If your MUA doesn't fit, use another MUA
or *attach* the patches, one per mail.

> There are two patches:
> libc/string: wcwidth using generated width data, with data generated from
> Unicode 10.0
> libc/ctype: isw* and tow* functions using generated case conversion and
> character class data, with Unicode 10.0 data
> For both, generation script and a Makefile.widthdata / Makefile.chardata is
> included. As these are to be used in the source directory,
> not the binary target directory, in case of future Unicode update, they are
> not related to the other Makefiles.

Eh, what?  If you read back, I had no problems with your patches 2 and
3, only with patch 1 adding new makefiles.  So the only thing I actually
asked for was to integrate the creation of the generated tables into
Makefile.am and now you're telling me this is not what you changed...?

> In ctype/, there is one new source (categories.c) which should be compiled
> separately but although I tried to include it in Makefile.am,
> I could not get the build process to compile it. So the current solution is
> to include it from one of the other sources (the one that also maintains the
> case conversion table).

That's a workaround, not a solution.  When you change Makefile.am you
have to regenerate Makefile.in, obviously.

However, since regenerating Makefile.in for newlib is (unfortunately,
for historical reasons) non-obvious, you can just go ahead and manually
add categories.* to Makefile.in where it belongs, kind of like the
attached.  A later regeneration run by one of the maintainers will fix
the formatting so that's nothing to worry about.


Corinna

-- 
Corinna Vinschen
Cygwin Maintainer
Red Hat

[-- Attachment #1.2: x --]
[-- Type: text/plain, Size: 2902 bytes --]

diff --git a/newlib/libc/ctype/Makefile.am b/newlib/libc/ctype/Makefile.am
index 898693571bd1..fa6a70d3a1bf 100644
--- a/newlib/libc/ctype/Makefile.am
+++ b/newlib/libc/ctype/Makefile.am
@@ -24,6 +24,7 @@ if ELIX_LEVEL_1
 ELIX_SOURCES =
 else
 ELIX_SOURCES = \
+	categories.c	\
 	isalnum_l.c	\
 	isalpha_l.c	\
 	isascii.c 	\
diff --git a/newlib/libc/ctype/Makefile.in b/newlib/libc/ctype/Makefile.in
index 2b2331767a0f..9932a9494b09 100644
--- a/newlib/libc/ctype/Makefile.in
+++ b/newlib/libc/ctype/Makefile.in
@@ -79,7 +79,8 @@ am__objects_1 = lib_a-ctype_.$(OBJEXT) lib_a-isalnum.$(OBJEXT) \
 	lib_a-ispunct.$(OBJEXT) lib_a-isspace.$(OBJEXT) \
 	lib_a-isxdigit.$(OBJEXT) lib_a-tolower.$(OBJEXT) \
 	lib_a-toupper.$(OBJEXT)
-@ELIX_LEVEL_1_FALSE@am__objects_2 = lib_a-isalnum_l.$(OBJEXT) \
+@ELIX_LEVEL_1_FALSE@am__objects_2 = lib_a-categories.$(OBJEXT) \
+@ELIX_LEVEL_1_FALSE@	lib_a-isalnum_l.$(OBJEXT) \
 @ELIX_LEVEL_1_FALSE@	lib_a-isalpha_l.$(OBJEXT) \
 @ELIX_LEVEL_1_FALSE@	lib_a-isascii.$(OBJEXT) \
 @ELIX_LEVEL_1_FALSE@	lib_a-isascii_l.$(OBJEXT) \
@@ -142,7 +143,7 @@ libctype_la_LIBADD =
 am__objects_3 = ctype_.lo isalnum.lo isalpha.lo iscntrl.lo isdigit.lo \
 	islower.lo isupper.lo isprint.lo ispunct.lo isspace.lo \
 	isxdigit.lo tolower.lo toupper.lo
-@ELIX_LEVEL_1_FALSE@am__objects_4 = isalnum_l.lo isalpha_l.lo \
+@ELIX_LEVEL_1_FALSE@am__objects_4 = categories.lo isalnum_l.lo isalpha_l.lo \
 @ELIX_LEVEL_1_FALSE@	isascii.lo isascii_l.lo isblank.lo \
 @ELIX_LEVEL_1_FALSE@	isblank_l.lo iscntrl_l.lo isdigit_l.lo \
 @ELIX_LEVEL_1_FALSE@	islower_l.lo isupper_l.lo isprint_l.lo \
@@ -351,6 +352,7 @@ GENERAL_SOURCES = \
 	toupper.c
 
 @ELIX_LEVEL_1_FALSE@ELIX_SOURCES = \
+@ELIX_LEVEL_1_FALSE@	categories.c	\
 @ELIX_LEVEL_1_FALSE@	isalnum_l.c	\
 @ELIX_LEVEL_1_FALSE@	isalpha_l.c	\
 @ELIX_LEVEL_1_FALSE@	isascii.c 	\
@@ -609,6 +611,12 @@ lib_a-toupper.o: toupper.c
 lib_a-toupper.obj: toupper.c
 	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(lib_a_CFLAGS) $(CFLAGS) -c -o lib_a-toupper.obj `if test -f 'toupper.c'; then $(CYGPATH_W) 'toupper.c'; else $(CYGPATH_W) '$(srcdir)/toupper.c'; fi`
 
+lib_a-categories.o: categories.c
+	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(lib_a_CFLAGS) $(CFLAGS) -c -o lib_a-categories.o `test -f 'categories.c' || echo '$(srcdir)/'`categories.c
+
+lib_a-categories.obj: categories.c
+	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(lib_a_CFLAGS) $(CFLAGS) -c -o lib_a-categories.obj `if test -f 'categories.c'; then $(CYGPATH_W) 'categories.c'; else $(CYGPATH_W) '$(srcdir)/categories.c'; fi`
+
 lib_a-isalnum_l.o: isalnum_l.c
 	$(CC) $(DEFS) $(DEFAULT_INCLUDES) $(INCLUDES) $(AM_CPPFLAGS) $(CPPFLAGS) $(lib_a_CFLAGS) $(CFLAGS) -c -o lib_a-isalnum_l.o `test -f 'isalnum_l.c' || echo '$(srcdir)/'`isalnum_l.c
 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unicode update of width and other character properties
  2018-02-26 17:20                 ` Corinna Vinschen
@ 2018-02-26 20:02                   ` Thomas Wolff
  2018-02-26 20:25                     ` Hans-Bernhard Bröker
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Wolff @ 2018-02-26 20:02 UTC (permalink / raw)
  To: newlib

Am 26.02.2018 um 18:20 schrieb Corinna Vinschen:
> On Feb 25 18:14, Thomas Wolff wrote:
>> I have finally revamped, manually rebased, and repackaged my Unicode data
>> patches which I'll send in separate mail.
>> ...
> ...
> or *attach* the patches, one per mail.
That will do, thanks.

>> There are two patches:
>> libc/string: wcwidth using generated width data, with data generated from
>> Unicode 10.0
>> libc/ctype: isw* and tow* functions using generated case conversion and
>> character class data, with Unicode 10.0 data
>> For both, generation script and a Makefile.widthdata / Makefile.chardata is
>> included. As these are to be used in the source directory,
>> not the binary target directory, in case of future Unicode update, they are
>> not related to the other Makefiles.
> Eh, what?  If you read back, I had no problems with your patches 2 and
> 3, only with patch 1 adding new makefiles.  So the only thing I actually
> asked for was to integrate the creation of the generated tables into
> Makefile.am and now you're telling me this is not what you changed...?
First I added an include to the generation makefile into Makefile.am, 
then it occurred to me that the generated makefile resides in the target 
hierarchy while the generation should probably be invoked in the source 
directory, so I removed it again.
I'm not sure about the best or preferred invocation interface for such a 
step. Maybe I should just provide the generation scripts (mk*) and leave 
it up to you to integrate them into the Makefile.am.

>> In ctype/, there is one new source (categories.c) which should be compiled
>> separately but although I tried to include it in Makefile.am,
>> I could not get the build process to compile it. So the current solution is
>> to include it from one of the other sources (the one that also maintains the
>> case conversion table).
> That's a workaround, not a solution.  When you change Makefile.am you
> have to regenerate Makefile.in, obviously.
>
> However, since regenerating Makefile.in for newlib is (unfortunately,
> for historical reasons) non-obvious, you can just go ahead and manually
> add categories.* to Makefile.in where it belongs, kind of like the
> attached.
Thanks for the patch.

I'll resubmit the wcwidth patch soon, maybe you can tell me how you'd 
like the data generation to be invoked, or I can submit it just with the 
script. I'll submit the ctype patch later; all works fine on my Windows 
10 system but there is some obscure trouble on a Windows 7 system which 
I'd like to check out first.
And there's also a locale patch which I presented a separate mail.

Thomas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Unicode update of width and other character properties
  2018-02-26 20:02                   ` Thomas Wolff
@ 2018-02-26 20:25                     ` Hans-Bernhard Bröker
  0 siblings, 0 replies; 15+ messages in thread
From: Hans-Bernhard Bröker @ 2018-02-26 20:25 UTC (permalink / raw)
  To: newlib

Am 26.02.2018 um 21:02 schrieb Thomas Wolff:

> First I added an include to the generation makefile into Makefile.am, 
> then it occurred to me that the generated makefile resides in the target 
> hierarchy while the generation should probably be invoked in the source 
> directory, so I removed it again.

Which directory the tool is to be invoked in is irrelevant for this. 
Anything you put into a Makefile.am will automatically go into 
Makefile.in and Makefile, anyway.  Well, assuming you actually run the 
autotools, that is.

If you need to run things from a inside the src tree, you're supposed to 
apply something like

	cd $(srcdir) &&
	cd $(top_srcdir)/some/where &&

to them just like, e.g. the auto-generated production rules for 
Makefile.in and configure do it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-02-26 20:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-06  5:36 Unicode update of width and other character properties Thomas Wolff
2017-08-07 10:31 ` Corinna Vinschen
2017-08-07 19:18   ` Thomas Wolff
2017-08-08  8:30     ` Corinna Vinschen
2017-08-17 11:03       ` Thomas Wolff
2017-12-03 14:07         ` Corinna Vinschen
2017-12-03 17:31           ` Thomas Wolff
2017-12-03 17:33             ` Jon Turney
2017-12-04  7:32             ` Brian Inglis
2017-12-04  9:05             ` Corinna Vinschen
2018-02-25 17:14               ` Thomas Wolff
2018-02-26 17:20                 ` Corinna Vinschen
2018-02-26 20:02                   ` Thomas Wolff
2018-02-26 20:25                     ` Hans-Bernhard Bröker
2017-12-02 11:25 ` Ping: " Thomas Wolff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).