public inbox for cygwin@cygwin.com
 help / color / mirror / Atom feed
* Necessity for the assignment form
@ 2016-05-20  0:45 KOBAYASHI Shinji
  2016-05-20  1:22 ` Warren Young
  2016-05-20  5:46 ` Necessity for the assignment form David Stacey
  0 siblings, 2 replies; 14+ messages in thread
From: KOBAYASHI Shinji @ 2016-05-20  0:45 UTC (permalink / raw)
  To: cygwin


Hello,

I found a bug and I am willing to submit a patch (under
winsup/cygwin). But before doing that, I was asked from my employer to
clarify some points. https://cygwin.com/contrib.html says that the
assignment form is required "if your change is going to be a
significant one in terms of the size of your code changes". So the
questions are:

1. Who is going to determine if the change is "a significant one"?
2. In fact, my patch is just two lines of modifications, so I believe
   that it is not "a significant one". Is it okay to send such a small
   patch without the assignment form?

Thanks in advance,

KOBAYASHI Shinji.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Necessity for the assignment form
  2016-05-20  0:45 Necessity for the assignment form KOBAYASHI Shinji
@ 2016-05-20  1:22 ` Warren Young
  2016-05-20  4:22   ` Invalid tm_zone from localtime() when TZ is not set KOBAYASHI Shinji
  2016-05-20  5:46 ` Necessity for the assignment form David Stacey
  1 sibling, 1 reply; 14+ messages in thread
From: Warren Young @ 2016-05-20  1:22 UTC (permalink / raw)
  To: The Cygwin Mailing List

On May 19, 2016, at 6:44 PM, KOBAYASHI Shinji <koba@jp.fujitsu.com> wrote:
> 
> my patch is just two lines of modifications, so I believe
>   that it is not "a significant one". Is it okay to send such a small
>   patch without the assignment form?

How about you just give the line of code and explain what’s wrong with it?  Then someone with checkin rights can reimplement your change from your prose description.
--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Invalid tm_zone from localtime() when TZ is not set
  2016-05-20  1:22 ` Warren Young
@ 2016-05-20  4:22   ` KOBAYASHI Shinji
  2016-05-20  7:09     ` Csaba Raduly
  0 siblings, 1 reply; 14+ messages in thread
From: KOBAYASHI Shinji @ 2016-05-20  4:22 UTC (permalink / raw)
  To: cygwin

>>>>>	Warren Young  wrote:
> How about you just give the line of code and explain what’s wrong with it?  Then someone with checkin rights can reimplement your change from your prose description.

Thank you for your suggestion. So I try to describe the problem:

Current localtime() can return invalid tm_zone when used with
non-English Windows and the TZ environment variable is unset. This
leads to a failure of importing time module in Python 3:

% (unset TZ; python3 -c "import time")
Traceback (most recent call last):
  File "<string>", line 1, in <module>
SystemError: initialization of time raised unreported exception

Although most users set the TZ environment variable, tox (>= 2.0)
drops most environment variables including TZ by default when invoking
test environments. So the users of tox on non-English Windows may
suffer from this problem.

localtime() calls tzsetwall() when TZ is not set. In tzsetwall(),
the StandardName and DaylightName member values retrieved by
GetTimeZoneInformation() are checked with isupper() and copied to the
char[] buffer used as the timezone name in tzparse(). However, the
type of these member values are wchar_t and isupper() is defined only
when isascii() is true. So it may happen that the char[] buffer
contains invalid characters as a result of implicit cast from wchar_t
to char.

The return value of isupper() for non-ascii characters depends on
other data, because an out of bounds access occurs for the small
(128 + 256) table used in isupper(). I confirmed the above error on
Japanese Windows with 64-bit Cygwin 2.5.0-1 and 2.5.1-1, but had no
problem with 64-bit Cygwin 2.4.1-1 nor with 32-bit Cygwins.

So, I propose to call isascii() to assure the wchar_t fits in the
range of ASCII before calling isupper().

I have considered some other methods:

1. Using iswupper() instead of isupper().
   - Although this method is effective for Japanese environments, it
     is not assured that the character iswupper() returns true fits in
     the range of ASCII.
2. Add (char) cast to the argument of isupper().
   - This method assures that the copied characters are uppercase
     only. However, it may be different from original characters due
     to casting.

So I think that adding isascii() calls is better than these methods.

Regards,

KOBAYASHI Shinji.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Necessity for the assignment form
  2016-05-20  0:45 Necessity for the assignment form KOBAYASHI Shinji
  2016-05-20  1:22 ` Warren Young
@ 2016-05-20  5:46 ` David Stacey
  2016-05-20 11:36   ` KOBAYASHI Shinji
  1 sibling, 1 reply; 14+ messages in thread
From: David Stacey @ 2016-05-20  5:46 UTC (permalink / raw)
  To: cygwin

On 20/05/16 01:44, KOBAYASHI Shinji wrote:
> I found a bug and I am willing to submit a patch (under
> winsup/cygwin). But before doing that, I was asked from my employer to
> clarify some points.https://cygwin.com/contrib.html  says that the
> assignment form is required "if your change is going to be a
> significant one in terms of the size of your code changes". So the
> questions are:
>
> 1. Who is going to determine if the change is "a significant one"?
> 2. In fact, my patch is just two lines of modifications, so I believe
>     that it is not "a significant one". Is it okay to send such a small
>     patch without the assignment form?


For Cygwin, the boundary between trivial and substantial is usually 
taken as ten lines. A two-line patch would probably be accepted without 
an assignment form.

Dave.


--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Invalid tm_zone from localtime() when TZ is not set
  2016-05-20  4:22   ` Invalid tm_zone from localtime() when TZ is not set KOBAYASHI Shinji
@ 2016-05-20  7:09     ` Csaba Raduly
  2016-05-20 10:16       ` KOBAYASHI Shinji
  0 siblings, 1 reply; 14+ messages in thread
From: Csaba Raduly @ 2016-05-20  7:09 UTC (permalink / raw)
  To: cygwin list

On Fri, May 20, 2016 at 6:22 AM, KOBAYASHI Shinji  wrote:
(snip)
>
> localtime() calls tzsetwall() when TZ is not set. In tzsetwall(),
> the StandardName and DaylightName member values retrieved by
> GetTimeZoneInformation() are checked with isupper() and copied to the
> char[] buffer used as the timezone name in tzparse(). However, the
> type of these member values are wchar_t and isupper() is defined only
> when isascii() is true.

If the type of those members is WCHAR[] then using isascii() /
isupper() on them is just plain wrong.
The correct function to use would be iswupper().

The line
    if (isupper(*src)) *dst++ = *src;

(where src is wchar_t* and dst is char*) assumes that the upper 8 bits
of *src are zero (or *src is -1).
If not, the behavior is at best implementation-defined (maybe even undefined).

> So it may happen that the char[] buffer
> contains invalid characters as a result of implicit cast from wchar_t
> to char.
>
> The return value of isupper() for non-ascii characters depends on
> other data, because an out of bounds access occurs for the small
> (128 + 256) table used in isupper(). I confirmed the above error on
> Japanese Windows with 64-bit Cygwin 2.5.0-1 and 2.5.1-1, but had no
> problem with 64-bit Cygwin 2.4.1-1 nor with 32-bit Cygwins.
>
> So, I propose to call isascii() to assure the wchar_t fits in the
> range of ASCII before calling isupper().
>
> I have considered some other methods:
>
> 1. Using iswupper() instead of isupper().
>    - Although this method is effective for Japanese environments, it
>      is not assured that the character iswupper() returns true fits in
>      the range of ASCII.

It is highly likely that if the argument of iswupper() does not fit
into ASCII then its result won't fit either.

> 2. Add (char) cast to the argument of isupper().
>    - This method assures that the copied characters are uppercase
>      only. However, it may be different from original characters due
>      to casting.
>



-- 
GCS a+ e++ d- C++ ULS$ L+$ !E- W++ P+++$ w++$ tv+ b++ DI D++ 5++
The Tao of math: The numbers you can count are not the real numbers.
Life is complex, with real and imaginary parts.
"Ok, it boots. Which means it must be bug-free and perfect. " -- Linus Torvalds
"People disagree with me. I just ignore them." -- Linus Torvalds

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Invalid tm_zone from localtime() when TZ is not set
  2016-05-20  7:09     ` Csaba Raduly
@ 2016-05-20 10:16       ` KOBAYASHI Shinji
  2016-05-25  9:28         ` KOBAYASHI Shinji
  0 siblings, 1 reply; 14+ messages in thread
From: KOBAYASHI Shinji @ 2016-05-20 10:16 UTC (permalink / raw)
  To: cygwin


Hi,

>>>>>	Csaba Raduly wrote:
> If the type of those members is WCHAR[] then using isascii() /
> isupper() on them is just plain wrong.
> The correct function to use would be iswupper().

I agree that isw*() functions should be used for wchar_t. But I think
just changing isupper() to iswupper() is not enough as I explain
below. If iswascii() was available, using both iswascii() and
iswupper() might be a candidate.

> The line
>     if (isupper(*src)) *dst++ = *src;
> (where src is wchar_t* and dst is char*) assumes that the upper 8 bits
> of *src are zero (or *src is -1).
> If not, the behavior is at best implementation-defined (maybe even undefined).

With Japanese Windows, the upper 8 bits of *src are not zero. So I
would like to correct the behavior. I think adding isascii() can be a
compromise.

>> 1. Using iswupper() instead of isupper().
>>    - Although this method is effective for Japanese environments, it
>>      is not assured that the character iswupper() returns true fits in
>>      the range of ASCII.
> It is highly likely that if the argument of iswupper() does not fit
> into ASCII then its result won't fit either.

I'm not sure if I understand what you are saying, but for example
iswupper(L'\uff21') returns 1 in Japanese locale.

Regards,

KOBAYASHI Shinji.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Necessity for the assignment form
  2016-05-20  5:46 ` Necessity for the assignment form David Stacey
@ 2016-05-20 11:36   ` KOBAYASHI Shinji
  0 siblings, 0 replies; 14+ messages in thread
From: KOBAYASHI Shinji @ 2016-05-20 11:36 UTC (permalink / raw)
  To: cygwin

>>>>>	David Stacey wrote:
> For Cygwin, the boundary between trivial and substantial is usually
> taken as ten lines. A two-line patch would probably be accepted
> without an assignment form.

Thank you for the clarification. Although I started another thread
without submitting the patch, this information will be helpful next
time.

Regards,

KOBAYASHI Shinji.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Invalid tm_zone from localtime() when TZ is not set
  2016-05-20 10:16       ` KOBAYASHI Shinji
@ 2016-05-25  9:28         ` KOBAYASHI Shinji
  2016-05-25 10:15           ` Corinna Vinschen
  0 siblings, 1 reply; 14+ messages in thread
From: KOBAYASHI Shinji @ 2016-05-25  9:28 UTC (permalink / raw)
  To: cygwin


Any other comments on this topic? Let me explain my proposal again.

The intention of the following code in tzsetwall() should be to pick
up UPPERCASE letters "in ASCII range":

if (isupper(*src)) *dst++ = *src;

NOTE: src is wchar_t *, dst is char *.

As Csaba Raduly pointed out, isw*() functions should be the first
choice if they achieve the desired behavior (select uppercase AND
ASCII). However, iswupper() does not fit for this purpose, as it
returns 1 for L'\uff21' for example. And I could not find isw*()
functions to achieve the behavior.

As the man page mentions, isupper() accepts int as its argument, but
it is defined only when isascii() is true (or EOF). Both isascii() and
isupper() accept int, not char. So it seems to me that

isascii(*src) && isupper(*src)

is fine even if *src is wchar_t, provided that:

1. Type of *src (wchar_t) <= int.
2. Code points used is an extension of ASCII.

I think these conditions are met. Because sizeof(wchar_t) is 2, and
sizeof(int) is 4 in Cygwin. The code points used here is an extension
of ASCII because the Windows function GetTimeZoneInformation() uses
UTF-16 encoding.

The second option might be to use a macro as it was before the
following commit, but I prefer using standard isascii() and isupper():

commit 49808040c5fa1390983945d38c7e8267b2b8a381
Author: Christopher Faylor
Date:   Sat Oct 30 19:22:42 2010 +0000

    * localtime.cc (is_upper): Rename to isupper and include ctype.h to pull in
    standard macro.
    (lcl_is_set): Define as an enum.
    (tzsetwall): Assign lcl_is_set to correct enum values.
    (tzset): Ditto.  Copy as much of TZ as will fit to TZ buffer.

Regards,

KOBAYASHI Shinji.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Invalid tm_zone from localtime() when TZ is not set
  2016-05-25  9:28         ` KOBAYASHI Shinji
@ 2016-05-25 10:15           ` Corinna Vinschen
  2016-05-26 11:03             ` Hans-Bernhard Bröker
  0 siblings, 1 reply; 14+ messages in thread
From: Corinna Vinschen @ 2016-05-25 10:15 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 1560 bytes --]

On May 25 11:28, KOBAYASHI Shinji wrote:
> 
> Any other comments on this topic? Let me explain my proposal again.
> 
> The intention of the following code in tzsetwall() should be to pick
> up UPPERCASE letters "in ASCII range":
> 
> if (isupper(*src)) *dst++ = *src;
> 
> NOTE: src is wchar_t *, dst is char *.
> 
> As Csaba Raduly pointed out, isw*() functions should be the first
> choice if they achieve the desired behavior (select uppercase AND
> ASCII). However, iswupper() does not fit for this purpose, as it
> returns 1 for L'\uff21' for example. And I could not find isw*()

In that case, wouldn't it make sense to fix iswupper in the first place?
It relies on the towupper function in newlib/libc/ctype/towlower.c.

However, I don't understand the example.  0xff21 is FULLWIDTH LATIN
CAPITAL LETTER A.  This is very certainly an uppercase letter so
towlower transforms it to 0xff41, FULLWIDTH LATIN SMALL LETTER A.

However, testing this with GLibc it turns out that GLibc's towlower does not
transform the character but returns 0xff21.  Can anybody explain to me why?  

Apart from that, we can workaround all problems in tzsetwall by just
checking for 

  if (*src >= L'A' && *src <= L'Z')

So, yeah, I can do the above change to tzsetwall and it would be nice if
somebody could take a critical look over newlib's towlower.c and
toupper.c.


Thanks,
Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Invalid tm_zone from localtime() when TZ is not set
  2016-05-25 10:15           ` Corinna Vinschen
@ 2016-05-26 11:03             ` Hans-Bernhard Bröker
  2016-05-26 12:16               ` KOBAYASHI Shinji
  0 siblings, 1 reply; 14+ messages in thread
From: Hans-Bernhard Bröker @ 2016-05-26 11:03 UTC (permalink / raw)
  To: cygwin

Am 25.05.2016 um 10:44 schrieb Corinna Vinschen:
> On May 25 11:28, KOBAYASHI Shinji wrote:
>>
>> Any other comments on this topic? Let me explain my proposal again.
>>
>> The intention of the following code in tzsetwall() should be to pick
>> up UPPERCASE letters "in ASCII range":

Are you sure you're not mixing ASCII with '8-bit character' range there?

>> if (isupper(*src)) *dst++ = *src;
>>
>> NOTE: src is wchar_t *, dst is char *.
>>
>> As Csaba Raduly pointed out, isw*() functions should be the first
>> choice if they achieve the desired behavior (select uppercase AND
>> ASCII).

But it doesn't, so it's not.

>> However, iswupper() does not fit for this purpose, as it
>> returns 1 for L'\uff21' for example. And I could not find isw*()
>
> In that case, wouldn't it make sense to fix iswupper in the first place?

I don't believe it's been shown to be broken, so there's no need to fix it.

> Apart from that, we can workaround all problems in tzsetwall by just
> checking for
>
>   if (*src >= L'A' && *src <= L'Z')

While that may be possible if it really is ASCII you're looking for, 
it's perverting the whole reason <ctype.h> and <wctype.h> exist: to make 
tests like this as independent of the actual character encoding as possible.

Here's what I wrote last week, but apparently only to Csaba Raduli:

Am 20.05.2016 um 09:09 schrieb Csaba Raduly:

 > If the type of those members is WCHAR[] then using isascii() /
 > isupper() on them is just plain wrong.

Absolutely.  The argument type of isupper() and friends is 'int', not 
'unsigned char'.  But the _only_ allowed argument values are those in 
the range of unsigned char, plus EOF.  For typical systems, that means 
the allowed argument range of is*() is -1 ... 255 inclusive.  Calling 
these Standard Library functions with any other argument causes 
undefined behaviour.

That leaves three sensible ways of calling isupper() in portable code:

*) isupper(foo)  # where type of foo is unsigned char
*) isupper((unsigned char)bar) # where bar is signed char, or plain char
*) isupper(baz) # where baz was got from fgetc() or similar

All other call patterns are plain and simply wrong, or at least 
non-portable.  In particular, passing a wchar_t to any of the <ctype.h> 
function is wrong every time.

 > The correct function to use would be iswupper().

Actually, the is*upper() isn't even the actual problem here.  The whole 
idea of copying a wchar_t string into a char one, element by element, is 
most likely nonsensical.  A wchar_t cannot be assumed to just fit into a 
char, regardless whether iswupper() returned true on it or not.  E.g. 
what do we expect this to do with an upper-case Greek or Cyrillic letter?

A proper solution may have to be more like this:

     int mapped = wctob(*src);
     /* this call is safe now because of how wctob() works: */
     if (isupper(mapped)) {
        *dst++ = (unsigned char)mapped;
     }

 >> So, I propose to call isascii() to assure the wchar_t fits in the
 >> range of ASCII before calling isupper().

Calling isascii() would be wrong for the same reasons calling isupper() is.




--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Invalid tm_zone from localtime() when TZ is not set
  2016-05-26 11:03             ` Hans-Bernhard Bröker
@ 2016-05-26 12:16               ` KOBAYASHI Shinji
  2016-05-26 14:56                 ` Corinna Vinschen
  2016-05-26 17:21                 ` Hans-Bernhard Bröker
  0 siblings, 2 replies; 14+ messages in thread
From: KOBAYASHI Shinji @ 2016-05-26 12:16 UTC (permalink / raw)
  To: cygwin

On Wed, 25 May 2016 22:02:50 +0200, Hans-Bernhard Bröker wrote:
> On May 25 11:28, KOBAYASHI Shinji wrote:
>> The intention of the following code in tzsetwall() should be to pick
>> up UPPERCASE letters "in ASCII range":
> Are you sure you're not mixing ASCII with '8-bit character' range there?

I'm not sure if non-ASCII 8-bit characters are allowed as TZ
names. I just guessed the intention from the fact isupper() is used,
and from the description of isupper() man page:

- isupper is a macro which classifies ASCII integer values by table
  lookup.
- It is defined only when isascii(c) is true or c is EOF.

However, the implementation of isupper() seems to work for 8-bit
characters even if isascii() is false. Does the man page need some
changes?

> A proper solution may have to be more like this:
>     int mapped = wctob(*src);
>     /* this call is safe now because of how wctob() works: */
>     if (isupper(mapped)) {
>        *dst++ = (unsigned char)mapped;
>     }

Thank you for pointing out wctob(). I'm in favor of fixing this way.

On Wed, 25 May 2016 10:44:30 +0200, Corinna Vinschen wrote:
> However, testing this with GLibc it turns out that GLibc's towlower does not
> transform the character but returns 0xff21.  Can anybody explain to me why?  

It seems that it depends on locale. The man page (on Linux) says:

  The behavior of towlower() depends on the LC_CTYPE category of the
  current locale.

I observed that it returns 0xff41 when LANG=ja_JP.UTF-8, and 0xff21
when LANG=C.

Cygwin's implementation (and man page) does not seem to depend on
locale.

Regards,

KOBAYASHI Shinji.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Invalid tm_zone from localtime() when TZ is not set
  2016-05-26 12:16               ` KOBAYASHI Shinji
@ 2016-05-26 14:56                 ` Corinna Vinschen
  2016-05-26 17:21                   ` KOBAYASHI Shinji
  2016-05-26 17:21                 ` Hans-Bernhard Bröker
  1 sibling, 1 reply; 14+ messages in thread
From: Corinna Vinschen @ 2016-05-26 14:56 UTC (permalink / raw)
  To: cygwin

[-- Attachment #1: Type: text/plain, Size: 2283 bytes --]

On May 26 15:54, KOBAYASHI Shinji wrote:
> On Wed, 25 May 2016 22:02:50 +0200, Hans-Bernhard Bröker wrote:
> > On May 25 11:28, KOBAYASHI Shinji wrote:
> >> The intention of the following code in tzsetwall() should be to pick
> >> up UPPERCASE letters "in ASCII range":
> > Are you sure you're not mixing ASCII with '8-bit character' range there?
> 
> I'm not sure if non-ASCII 8-bit characters are allowed as TZ
> names. I just guessed the intention from the fact isupper() is used,
> and from the description of isupper() man page:
> 
> - isupper is a macro which classifies ASCII integer values by table
>   lookup.
> - It is defined only when isascii(c) is true or c is EOF.
> 
> However, the implementation of isupper() seems to work for 8-bit
> characters even if isascii() is false. Does the man page need some
> changes?
> 
> > A proper solution may have to be more like this:
> >     int mapped = wctob(*src);
> >     /* this call is safe now because of how wctob() works: */
> >     if (isupper(mapped)) {
> >        *dst++ = (unsigned char)mapped;
> >     }
> 
> Thank you for pointing out wctob(). I'm in favor of fixing this way.

I fixed it already in the repo using explicit testing for
L'A' <= wchar <= L'Z', see
https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;h=030d86d04dbd6ae878ff894256c5a465890d5970

> On Wed, 25 May 2016 10:44:30 +0200, Corinna Vinschen wrote:
> > However, testing this with GLibc it turns out that GLibc's towlower does not
> > transform the character but returns 0xff21.  Can anybody explain to me why?  
> It seems that it depends on locale. The man page (on Linux) says:
> 
>   The behavior of towlower() depends on the LC_CTYPE category of the
>   current locale.
> 
> I observed that it returns 0xff41 when LANG=ja_JP.UTF-8, and 0xff21
> when LANG=C.
> 
> Cygwin's implementation (and man page) does not seem to depend on
> locale.

No, it's basically fixed code inside newlib.  I have some code prepared
to update newlib from Unicode 5.2 to Unicode 8.0, but it adds a lot of
extra code.  So many new characters...


Corinna

-- 
Corinna Vinschen                  Please, send mails regarding Cygwin to
Cygwin Maintainer                 cygwin AT cygwin DOT com
Red Hat

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Invalid tm_zone from localtime() when TZ is not set
  2016-05-26 12:16               ` KOBAYASHI Shinji
  2016-05-26 14:56                 ` Corinna Vinschen
@ 2016-05-26 17:21                 ` Hans-Bernhard Bröker
  1 sibling, 0 replies; 14+ messages in thread
From: Hans-Bernhard Bröker @ 2016-05-26 17:21 UTC (permalink / raw)
  To: KOBAYASHI Shinji; +Cc: cygwin

Am 26.05.2016 um 08:54 schrieb KOBAYASHI Shinji:
> On Wed, 25 May 2016 22:02:50 +0200, Hans-Bernhard Bröker wrote:
>> On May 25 11:28, KOBAYASHI Shinji wrote:

> - isupper is a macro which classifies ASCII integer values by table
>   lookup.
> - It is defined only when isascii(c) is true or c is EOF.

That restriction is artificial, at best.  I might go so far as to say 
it's a direct standard violation since, according to the C Standard, the 
<ctype.h> macros/functions are defined for the entire range of unsigned 
char, not just ASCII's 0 ... 127.

It's also untrue.  Once you set the locale and the console character set 
accordingly, Cygwin's isupper() does work for characters outside the 
7-bit ASCII range, too:

$ ./tisupper.exe a A b B ö Ö ß
a: isupper=0
A: isupper=1
b: isupper=0
B: isupper=1
ö: isupper=0
Ö: isupper=1
ß: isupper=0

This is with a ISO-Latin-15 mintty, and LANG set to de_DE (note: no 
UTF-8), running this program:

#include <stdio.h>
#include <ctype.h>
#include <locale.h>

int main(int argc, char **argv)
{
	int i = 0;
	
	setlocale(LC_ALL, "");
	for (i = 1; i < argc; i++) {
		printf("%c: isupper=%d\n",
		       argv[i][0],
		       isupper((unsigned char)argv[i][0])
		      );
	}
	return 0;
}

> On Wed, 25 May 2016 10:44:30 +0200, Corinna Vinschen wrote:
>> However, testing this with GLibc it turns out that GLibc's towlower does not
>> transform the character but returns 0xff21.  Can anybody explain to me why?
>
> It seems that it depends on locale. The man page (on Linux) says:
>
>   The behavior of towlower() depends on the LC_CTYPE category of the
>   current locale.

So does the implementation in Cygwin.  The result of my little test is 
influenced by locale.  E.g. in Russian locale, Ö is no longer classified 
an upper-case letter.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Invalid tm_zone from localtime() when TZ is not set
  2016-05-26 14:56                 ` Corinna Vinschen
@ 2016-05-26 17:21                   ` KOBAYASHI Shinji
  0 siblings, 0 replies; 14+ messages in thread
From: KOBAYASHI Shinji @ 2016-05-26 17:21 UTC (permalink / raw)
  To: cygwin

On Thu, 26 May 2016 12:23:03 +0200, Corinna Vinschen wrote:
> > > A proper solution may have to be more like this:
> > >     int mapped = wctob(*src);
> > >     /* this call is safe now because of how wctob() works: */
> > >     if (isupper(mapped)) {
> > >        *dst++ = (unsigned char)mapped;
> > >     }
> > 
> > Thank you for pointing out wctob(). I'm in favor of fixing this way.
> I fixed it already in the repo using explicit testing for
> L'A' <= wchar <= L'Z', see
> https://sourceware.org/git/gitweb.cgi?p=newlib-cygwin.git;h=030d86d04dbd6ae878ff894256c5a465890d5970

Okay, thanks. I confirmed that the problem does not happen with
current master.

Best regards,

KOBAYASHI Shinji.

--
Problem reports:       http://cygwin.com/problems.html
FAQ:                   http://cygwin.com/faq/
Documentation:         http://cygwin.com/docs.html
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-05-26 12:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-20  0:45 Necessity for the assignment form KOBAYASHI Shinji
2016-05-20  1:22 ` Warren Young
2016-05-20  4:22   ` Invalid tm_zone from localtime() when TZ is not set KOBAYASHI Shinji
2016-05-20  7:09     ` Csaba Raduly
2016-05-20 10:16       ` KOBAYASHI Shinji
2016-05-25  9:28         ` KOBAYASHI Shinji
2016-05-25 10:15           ` Corinna Vinschen
2016-05-26 11:03             ` Hans-Bernhard Bröker
2016-05-26 12:16               ` KOBAYASHI Shinji
2016-05-26 14:56                 ` Corinna Vinschen
2016-05-26 17:21                   ` KOBAYASHI Shinji
2016-05-26 17:21                 ` Hans-Bernhard Bröker
2016-05-20  5:46 ` Necessity for the assignment form David Stacey
2016-05-20 11:36   ` KOBAYASHI Shinji

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).