public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed
From: Brian Inglis <Brian.Inglis@SystematicSw.ab.ca>
To: newlib@sourceware.org
Subject: Re: broken check in wchar testsuite
Date: Tue, 03 Sep 2019 20:55:00 -0000	[thread overview]
Message-ID: <e165fef0-1b39-227e-a55f-32edbe245409@SystematicSw.ab.ca> (raw)
In-Reply-To: <CAHL7psEVh4iR05KWW8PM-x4yDsSAyQ1C_QFkH6RC3uKc=cTnHg@mail.gmail.com>

On 2019-09-03 10:00, Giacomo Tesio wrote:
> Appartently the check at newlib/testsuite/newlib.wctype/twctype.c:40
> is wrong since the unicode character 0x0967 is a number, not a letter.
> See https://www.fileformat.info/info/unicode/char/0967/index.htm

$ grep '^0967' /usr/share/unicode/ucd/UnicodeData.txt
0967;DEVANAGARI DIGIT ONE;Nd;0;L;;1;1;1;N;;;;;

Those have been there as digits since Unicode 1.0 in 1991:
http://www.unicode.org/versions/Unicode1.0.0/ch04.pdf

> The nearest letter I've found is 0x0961, see
> https://www.fileformat.info/info/unicode/char/0961/index.htm

$ egrep '^1?096.;[^;]+LETTER[^;]+;L' /usr/share/unicode/ucd/UnicodeData.txt
0960;DEVANAGARI LETTER VOCALIC RR;Lo;0;L;;;;;N;;;;;
0961;DEVANAGARI LETTER VOCALIC LL;Lo;0;L;;;;;N;;;;;

Perhaps 0x0967 was intended to be one of the letters 0x0[0124578F]67:

$ egrep '^1?0.67;[^;]+LETTER[^;]+;L' /usr/share/unicode/ucd/UnicodeData.txt
0067;LATIN SMALL LETTER G;Ll;0;L;;;;;N;;;0047;;0047
0167;LATIN SMALL LETTER T WITH STROKE;Ll;0;L;;;;;N;LATIN SMALL LETTER T
BAR;;0166;;0166
0267;LATIN SMALL LETTER HENG WITH HOOK;Ll;0;L;;;;;N;LATIN SMALL LETTER HENG HOOK;;;;
0467;CYRILLIC SMALL LETTER LITTLE YUS;Ll;0;L;;;;;N;;;0466;;0466
0567;ARMENIAN SMALL LETTER EH;Ll;0;L;;;;;N;;;0537;;0537
0767;ARABIC LETTER NOON WITH TWO DOTS BELOW;Lo;0;AL;;;;;N;;;;;
0867;SYRIAC LETTER MALAYALAM RA;Lo;0;AL;;;;;N;;;;;
0F67;TIBETAN LETTER HA;Lo;0;L;;;;;N;;;;;
10367;OLD PERMIC LETTER YRY;Lo;0;L;;;;;N;;;;;
10467;SHAVIAN LETTER EGG;Lo;0;L;;;;;N;;;;;
10867;PALMYRENE LETTER HETH;Lo;0;R;;;;;N;;;;;
10A67;OLD SOUTH ARABIAN LETTER RESH;Lo;0;R;;;;;N;;;;;
10B67;INSCRIPTIONAL PAHLAVI LETTER HETH;Lo;0;R;;;;;N;;;;;

Could just pick the first for a patch:

--- a/twctype.c 2019-03-23 20:44:45.229950600 -0600
+++ b/twctype.c 2019-09-03 14:10:30.440326700 -0600
@@ -37,7 +37,7 @@ int main()
   else
     {
       setlocale (LC_CTYPE, "C-UTF-8");
-      CHECK (iswalpha(0x0967));
+      CHECK (iswalpha(0x0067));
       CHECK (!iswalpha(0x128e));
       CHECK (iswalnum(0x1d7ce));
       CHECK (!iswalnum(0x1d800));

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.

      reply	other threads:[~2019-09-03 20:55 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-03 16:00 Giacomo Tesio
2019-09-03 20:55 ` Brian Inglis [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e165fef0-1b39-227e-a55f-32edbe245409@SystematicSw.ab.ca \
    --to=brian.inglis@systematicsw.ab.ca \
    --cc=newlib@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).