From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-locales-return-3489-listarch-libc-locales=sources.redhat.com@sourceware.org>
Received: (qmail 15169 invoked by alias); 3 Dec 2014 08:46:11 -0000
Mailing-List: contact libc-locales-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-locales.sourceware.org>
List-Subscribe: <mailto:libc-locales-subscribe@sourceware.org>
List-Post: <mailto:libc-locales@sourceware.org>
List-Help: <mailto:libc-locales-help@sourceware.org>, <http://sourceware.org/lists.html#faqs>
Sender: libc-locales-owner@sourceware.org
Received: (qmail 7421 invoked by uid 48); 3 Dec 2014 07:17:28 -0000
From: "maiku.fabian at gmail dot com" <sourceware-bugzilla@sourceware.org>
To: libc-locales@sourceware.org
Subject: [Bug localedata/17588] Update UTF-8 charmap and width to Unicode
 7.0.0
Date: Wed, 03 Dec 2014 08:46:00 -0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: localedata
X-Bugzilla-Version: unspecified
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: maiku.fabian at gmail dot com
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: pravin.d.s at gmail dot com
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: security-
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-17588-716-dKO1Dayrme@http.sourceware.org/bugzilla/>
In-Reply-To: <bug-17588-716@http.sourceware.org/bugzilla/>
References: <bug-17588-716@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-SW-Source: 2014-q4/txt/msg00076.txt.bz2

https://sourceware.org/bugzilla/show_bug.cgi?id=3D17588

--- Comment #9 from Mike FABIAN <maiku.fabian at gmail dot com> ---
I built glibc with the patch from comment#8.

I produces some FAILs in =E2=80=9Cmake check=E2=80=9D:

    FAIL: localedata/cs_CZ.UTF-8/LC_CTYPE
    ... similar FAILs ...

Shortly after starting =E2=80=9Cmake check=E2=80=9D one sees:

    ./charmaps/UTF-8:42734: unknown character `U00009FCD'
    ... similar messages ...

All the above problems are cause by ranges of reserved code points
which are listed in EastAsianWidth.txt like this:

    9FCD..9FFF;W     # Cn    [51] <reserved-9FCD>..<reserved-9FFF>

and these code points are not in UnicodeData.txt.

Therefore, they are not generated into the CHARMAP section
of glibc=E2=80=99s UTF-8 file and it causes the above problems if they
are generated into the WIDTH section of glibc=E2=80=99s  UTF-8 file.

This can be fixed by not generating reserved code points into
the WIDTH section, i.e. by ignoring the  reserved  code points
mentioned in EastAsianWidth.txt. Patch for utf8-gen.py:
diff --git a/utf8-gen.py b/utf8-gen.py
index 57875b6..20b68bb 100755
--- a/utf8-gen.py
+++ b/utf8-gen.py
@@ -218,6 +218,8 @@ if __name__ =3D=3D "__main__":
         write_comments(outfile, 1)
         elines =3D []
         for line in easta_file.readlines():
+                if re.match(r'.*<reserved-.+>\.\.<reserved-.+>.*', line):
+                        continue
                 if re.match(r'^[^;]*;[WF]', line):
                         elines.append(line.strip())
         process_width(outfile, flines, elines)

--=20
You are receiving this mail because:
You are on the CC list for the bug.