public inbox for newlib@sourceware.org
 help / color / mirror / Atom feed
* broken check in wchar testsuite
@ 2019-09-03 16:00 Giacomo Tesio
  2019-09-03 20:55 ` Brian Inglis
  0 siblings, 1 reply; 2+ messages in thread
From: Giacomo Tesio @ 2019-09-03 16:00 UTC (permalink / raw)
  To: newlib

Appartently the check at newlib/testsuite/newlib.wctype/twctype.c:40
is wrong since the unicode character 0x0967 is a number, not a letter.
See https://www.fileformat.info/info/unicode/char/0967/index.htm

The nearest letter I've found is 0x0961, see
https://www.fileformat.info/info/unicode/char/0961/index.htm


Giacomo

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: broken check in wchar testsuite
  2019-09-03 16:00 broken check in wchar testsuite Giacomo Tesio
@ 2019-09-03 20:55 ` Brian Inglis
  0 siblings, 0 replies; 2+ messages in thread
From: Brian Inglis @ 2019-09-03 20:55 UTC (permalink / raw)
  To: newlib

On 2019-09-03 10:00, Giacomo Tesio wrote:
> Appartently the check at newlib/testsuite/newlib.wctype/twctype.c:40
> is wrong since the unicode character 0x0967 is a number, not a letter.
> See https://www.fileformat.info/info/unicode/char/0967/index.htm

$ grep '^0967' /usr/share/unicode/ucd/UnicodeData.txt
0967;DEVANAGARI DIGIT ONE;Nd;0;L;;1;1;1;N;;;;;

Those have been there as digits since Unicode 1.0 in 1991:
http://www.unicode.org/versions/Unicode1.0.0/ch04.pdf

> The nearest letter I've found is 0x0961, see
> https://www.fileformat.info/info/unicode/char/0961/index.htm

$ egrep '^1?096.;[^;]+LETTER[^;]+;L' /usr/share/unicode/ucd/UnicodeData.txt
0960;DEVANAGARI LETTER VOCALIC RR;Lo;0;L;;;;;N;;;;;
0961;DEVANAGARI LETTER VOCALIC LL;Lo;0;L;;;;;N;;;;;

Perhaps 0x0967 was intended to be one of the letters 0x0[0124578F]67:

$ egrep '^1?0.67;[^;]+LETTER[^;]+;L' /usr/share/unicode/ucd/UnicodeData.txt
0067;LATIN SMALL LETTER G;Ll;0;L;;;;;N;;;0047;;0047
0167;LATIN SMALL LETTER T WITH STROKE;Ll;0;L;;;;;N;LATIN SMALL LETTER T
BAR;;0166;;0166
0267;LATIN SMALL LETTER HENG WITH HOOK;Ll;0;L;;;;;N;LATIN SMALL LETTER HENG HOOK;;;;
0467;CYRILLIC SMALL LETTER LITTLE YUS;Ll;0;L;;;;;N;;;0466;;0466
0567;ARMENIAN SMALL LETTER EH;Ll;0;L;;;;;N;;;0537;;0537
0767;ARABIC LETTER NOON WITH TWO DOTS BELOW;Lo;0;AL;;;;;N;;;;;
0867;SYRIAC LETTER MALAYALAM RA;Lo;0;AL;;;;;N;;;;;
0F67;TIBETAN LETTER HA;Lo;0;L;;;;;N;;;;;
10367;OLD PERMIC LETTER YRY;Lo;0;L;;;;;N;;;;;
10467;SHAVIAN LETTER EGG;Lo;0;L;;;;;N;;;;;
10867;PALMYRENE LETTER HETH;Lo;0;R;;;;;N;;;;;
10A67;OLD SOUTH ARABIAN LETTER RESH;Lo;0;R;;;;;N;;;;;
10B67;INSCRIPTIONAL PAHLAVI LETTER HETH;Lo;0;R;;;;;N;;;;;

Could just pick the first for a patch:

--- a/twctype.c 2019-03-23 20:44:45.229950600 -0600
+++ b/twctype.c 2019-09-03 14:10:30.440326700 -0600
@@ -37,7 +37,7 @@ int main()
   else
     {
       setlocale (LC_CTYPE, "C-UTF-8");
-      CHECK (iswalpha(0x0967));
+      CHECK (iswalpha(0x0067));
       CHECK (!iswalpha(0x128e));
       CHECK (iswalnum(0x1d7ce));
       CHECK (!iswalnum(0x1d800));

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

This email may be disturbing to some readers as it contains
too much technical detail. Reader discretion is advised.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-09-03 20:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-03 16:00 broken check in wchar testsuite Giacomo Tesio
2019-09-03 20:55 ` Brian Inglis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).