public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* RE: signed vs unsigned pointer warning
@ 2004-09-22 16:43 Morten Welinder
  2004-09-22 17:17 ` Paul Koning
  2004-09-22 17:20 ` Dave Korn
  0 siblings, 2 replies; 31+ messages in thread
From: Morten Welinder @ 2004-09-22 16:43 UTC (permalink / raw)
  To: gcc; +Cc: dk


Dave Korn writes:

> Until you try indexing an array with an 8-bit high ASCII char, of course.
> Then things become radically different.  I've known buggy ctype
> implementations that have failed on this (ASCII > 127 being signed negative
> and the ctype function accidentally indexing memory space before an array
> full of ctype result flags).

[/me gathers soapbox]

I bet you have.  In fact *ALL* ctype implementations will fail.[*]
That includes glibc

What glibc does is to *mostly* work around buggy programs that send
(explicitly or implicitly) signed characters to, say, isprint.  It does
not always work, though, so glibc really did you a disservice.  It is
really hard to get people to fix their programs.

It does not work for (signed char)-1 if EOF==-1.  It cannot work as two
different results are required for the same argument value.

Solaris does the array[arg] thing you speak about.  It isn't buggy.  The
caller is, and, IMHO the standard is.

Morten


[*] Assuming (char)EOF==EOF, which it will be with signed characters and
EOF==-1.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: signed vs unsigned pointer warning
  2004-09-22 16:43 signed vs unsigned pointer warning Morten Welinder
@ 2004-09-22 17:17 ` Paul Koning
  2004-09-22 17:27   ` Morten Welinder
  2004-09-22 17:20 ` Dave Korn
  1 sibling, 1 reply; 31+ messages in thread
From: Paul Koning @ 2004-09-22 17:17 UTC (permalink / raw)
  To: terra; +Cc: gcc, dk

>>>>> "Morten" == Morten Welinder <terra@gnome.org> writes:

 Morten> Dave Korn writes:

 >> Until you try indexing an array with an 8-bit high ASCII char, of
 >> course.  Then things become radically different.  I've known buggy
 >> ctype implementations that have failed on this (ASCII > 127 being
 >> signed negative and the ctype function accidentally indexing
 >> memory space before an array full of ctype result flags).

 Morten> [/me gathers soapbox]

 Morten> I bet you have.  In fact *ALL* ctype implementations will
 Morten> fail.[*] That includes glibc
 (moved)
 Morten> [*] Assuming (char)EOF==EOF, which it will be with signed
 Morten> characters and EOF==-1.

EOF isn't a character.

 Morten> What glibc does is to *mostly* work around buggy programs
 Morten> that send (explicitly or implicitly) signed characters to,
 Morten> say, isprint.  It does not always work, though, so glibc
 Morten> really did you a disservice.  It is really hard to get people
 Morten> to fix their programs.

I don't understand your point.  isprint takes a char * argument.  Its
semantics don't depend on whether char is signed or unsigned.  It's
the job of the libc implementer to implement it correctly.  It's
perfectly trivial to do that.  You can do it by casts, by configure,
or by using a 384 entry array, to mention just a few.

	  paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: signed vs unsigned pointer warning
  2004-09-22 16:43 signed vs unsigned pointer warning Morten Welinder
  2004-09-22 17:17 ` Paul Koning
@ 2004-09-22 17:20 ` Dave Korn
  2004-09-23  1:31   ` Andreas Schwab
  2004-09-27  2:04   ` Jamie Lokier
  1 sibling, 2 replies; 31+ messages in thread
From: Dave Korn @ 2004-09-22 17:20 UTC (permalink / raw)
  To: 'Morten Welinder', gcc

> -----Original Message-----
> From: Morten Welinder [mailto:terra@gnome.org] 
> Sent: 22 September 2004 17:18

> Dave Korn writes:
> 
> > Until you try indexing an array with an 8-bit high ASCII 
> char, of course.
> > Then things become radically different.  I've known buggy ctype
> > implementations that have failed on this (ASCII > 127 being 
> signed negative
> > and the ctype function accidentally indexing memory space 
> before an array
> > full of ctype result flags).
> 
> [/me gathers soapbox]
> 
> I bet you have.  In fact *ALL* ctype implementations will fail.[*]
> That includes glibc 

  You've provoked me to go and refer to the standard!

http://www.opengroup.org/onlinepubs/009695399/functions/isprint.html

------------snip!------------
int isprint(int c);

The c argument is an int, the value of which the application shall ensure is
a character representable as an unsigned char or equal to the value of the
macro EOF. If the argument has any other value, the behavior is undefined.
------------snip!------------

> It does not work for (signed char)-1 if EOF==-1.  It cannot 
> work as two
> different results are required for the same argument value.

  Um... but the ctype argument is an integer.  If you pass EOF, you get
(int)-1 in the ctype function.  If you want to pass (signed char)-1, you
have to ensure that -1 is "a character representable as an unsigned char"
first.  Which I don't think you can, since you can't store negative numbers
in an unsigned type.  So AFAICS, the valid range of inputs to the function
is the closed range [-1,255] in integers.

> What glibc does is to *mostly* work around buggy programs that send
> (explicitly or implicitly) signed characters to, say, 
> isprint.  

  The bug really is in passing a char to a function that requires an int
argument and mismanaging the promotion by not explicitly casting the char
first, isn't it?

> Solaris does the array[arg] thing you speak about.  It isn't 
> buggy.  The caller is, and, IMHO the standard is.

  As I said, I think the standard makes it quite clear that you can pass -1
and any unsigned char (0....255) value.  It seems to me to say quite clearly
that if you have a signed char variable which is negative and you pass it to
the ctype function and allow it to be sign-extended by the implicit argument
promotion rules then you have supplied an out-of-range value to the
function.

  Anyway, I only gave this particular example as an illustration to back up
my argument that the incompatibility between signed and unsigned chars is
not theoretical but very very real and does very much occur in practice as
it is very very common for char-sized arguments to be promoted to int sized
and the two types behave significantly differently when this happens.
That's all.


    cheers, 
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-09-22 17:17 ` Paul Koning
@ 2004-09-22 17:27   ` Morten Welinder
  2004-09-22 17:49     ` Dave Korn
  0 siblings, 1 reply; 31+ messages in thread
From: Morten Welinder @ 2004-09-22 17:27 UTC (permalink / raw)
  To: pkoning; +Cc: gcc, dk


> EOF isn't a character.

No-one said it was.  It evaluates to an int.

> I don't understand your point.  isprint takes a char * argument.

NO!

It takes an int argument.  And only certain ints are valid: EOF
(== -1 everywhere, it seems) and the range of unsigned char, i.e.,
0-255.  If you wanted to make signed args valid arguments -- and
doing so would be an extension to the standards -- then isprint
would need to be able to distinguish EOF and (int)(signed char)-1.
How do you propose to do that?

> You can do it by casts, by configure, or by using a 384 entry
> array, to mention just a few.

As it follows from the above, neither of these methods actually
work.

Morten

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: signed vs unsigned pointer warning
  2004-09-22 17:27   ` Morten Welinder
@ 2004-09-22 17:49     ` Dave Korn
  0 siblings, 0 replies; 31+ messages in thread
From: Dave Korn @ 2004-09-22 17:49 UTC (permalink / raw)
  To: gcc

> -----Original Message-----
> From: Morten Welinder
> Sent: 22 September 2004 17:43

> > EOF isn't a character.
> 
> No-one said it was.  It evaluates to an int.
> 
> > I don't understand your point.  isprint takes a char * argument.
> 
> NO!
> 
> It takes an int argument.  And only certain ints are valid: EOF
> (== -1 everywhere, it seems) and the range of unsigned char, i.e.,
> 0-255.  

Heh.  Can we not just work around this problem by replacing the definition
of EOF

#define EOF -1U

;)

[Ok, now I'm just being silly!]

    cheers, 
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-09-22 17:20 ` Dave Korn
@ 2004-09-23  1:31   ` Andreas Schwab
  2004-09-23 12:29     ` Dave Korn
  2004-09-27  2:04   ` Jamie Lokier
  1 sibling, 1 reply; 31+ messages in thread
From: Andreas Schwab @ 2004-09-23  1:31 UTC (permalink / raw)
  To: Dave Korn; +Cc: 'Morten Welinder', gcc

"Dave Korn" <dk@artimi.com> writes:

> Which I don't think you can, since you can't store negative numbers
> in an unsigned type. 

Actually you can, due to the modulo behaviour of unsigned integers.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: signed vs unsigned pointer warning
  2004-09-23  1:31   ` Andreas Schwab
@ 2004-09-23 12:29     ` Dave Korn
  2004-09-23 18:57       ` Joe Buck
  0 siblings, 1 reply; 31+ messages in thread
From: Dave Korn @ 2004-09-23 12:29 UTC (permalink / raw)
  To: 'Andreas Schwab'; +Cc: 'Morten Welinder', gcc

> -----Original Message-----
> From: Andreas Schwab 
> Sent: 22 September 2004 22:57
> To: Dave Korn
> Cc: 'Morten Welinder'; gcc
> Subject: Re: signed vs unsigned pointer warning
> 
> "Dave Korn" writes:
> 
> > Which I don't think you can, since you can't store negative numbers
> > in an unsigned type. 
> 
> Actually you can, due to the modulo behaviour of unsigned integers.
> 
> Andreas.


  Well, yes, it is physically possible, but it's a kind of type-punning, it
defies the aliasing rules, and we get into some very deeply
language-lawyerly issues here, but it's not a valid representation IIUIC and
therefore invokes undefined behaviour in many circumstances.


    cheers, 
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-09-23 12:29     ` Dave Korn
@ 2004-09-23 18:57       ` Joe Buck
  2004-09-23 19:38         ` Dave Korn
  0 siblings, 1 reply; 31+ messages in thread
From: Joe Buck @ 2004-09-23 18:57 UTC (permalink / raw)
  To: Dave Korn; +Cc: 'Andreas Schwab', 'Morten Welinder', gcc

On Thu, Sep 23, 2004 at 12:35:22PM +0100, Dave Korn wrote:
> > "Dave Korn" writes:
> > 
> > > Which I don't think you can, since you can't store negative numbers
> > > in an unsigned type. 
> > 
> > Actually you can, due to the modulo behaviour of unsigned integers.
> > 
> > Andreas.
> 
>   Well, yes, it is physically possible, but it's a kind of type-punning, it
> defies the aliasing rules, and we get into some very deeply
> language-lawyerly issues here, but it's not a valid representation IIUIC and
> therefore invokes undefined behaviour in many circumstances.

The C aliasing rules specifically bless accessing ints as unsigneds, and,
more generally, punning accesses between any type and the unsigned version
of that type.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: signed vs unsigned pointer warning
  2004-09-23 18:57       ` Joe Buck
@ 2004-09-23 19:38         ` Dave Korn
  0 siblings, 0 replies; 31+ messages in thread
From: Dave Korn @ 2004-09-23 19:38 UTC (permalink / raw)
  To: 'Joe Buck'
  Cc: 'Andreas Schwab', 'Morten Welinder', gcc

> -----Original Message-----
> From: Joe Buck [mailto:Joe.Buck@synopsys.COM] 
> Sent: 23 September 2004 17:11

> On Thu, Sep 23, 2004 at 12:35:22PM +0100, Dave Korn wrote:
> > > "Dave Korn" writes:
> > > 
> > > > Which I don't think you can, since you can't store 
> negative numbers
> > > > in an unsigned type. 
> > > 
> > > Actually you can, due to the modulo behaviour of unsigned 
> integers.
> > > 
> > > Andreas.
> > 
> >   Well, yes, it is physically possible, but it's a kind of 
> type-punning, it
> > defies the aliasing rules, and we get into some very deeply
> > language-lawyerly issues here, but it's not a valid 
> representation IIUIC and
> > therefore invokes undefined behaviour in many circumstances.
> 
> The C aliasing rules specifically bless accessing ints as 
> unsigneds, and,
> more generally, punning accesses between any type and the 
> unsigned version
> of that type.


  Ya learn something new every day :)

  Actually, come to think of it, I stumbled across that myself just a few
weeks back, when I was trying to work round the _restrict_ bug (in a block
of code that was copying some memory to some registers) by defining one
pointer as an int and one pointer as a long (both actually being 32 bit
ints), and I had tried to use the signed/unsigned distinction to get it to
work, but it didn't until I actually used different underlying types.

  Anyway, it's all wandering a bit far enough from the original point now,
which is that a warning would be a good idea, and I'm convinced enough by
that line of argument without any reference to the finer details of buggy
ctype implementations anyway!

    cheers, 
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-09-22 17:20 ` Dave Korn
  2004-09-23  1:31   ` Andreas Schwab
@ 2004-09-27  2:04   ` Jamie Lokier
  2004-10-08 13:29     ` Nick Ing-Simmons
  1 sibling, 1 reply; 31+ messages in thread
From: Jamie Lokier @ 2004-09-27  2:04 UTC (permalink / raw)
  To: Dave Korn; +Cc: 'Morten Welinder', gcc

Dave Korn wrote:
>   As I said, I think the standard makes it quite clear that you can pass -1
> and any unsigned char (0....255) value.  It seems to me to say quite clearly
> that if you have a signed char variable which is negative and you pass it to
> the ctype function and allow it to be sign-extended by the implicit argument
> promotion rules then you have supplied an out-of-range value to the
> function.

This is a real typical bug.  Just recently a bug was found in
curl-library, quite a popular little library, which calls
isspace(char).  The bug was missed for a long time, as it is only
triggered with characters with the MSB set, which do not occur often
in HTTP headers.

Their fix was to change the program to use "unsigned char" strings,
rather than change all the callers of isspace() as the latter change
might not be preserved by future programmers who don't know why the
cast in "issspace((unsigned char) c)" is necessary.

I must admit that I hadn't realised it was necessary and I have
written a lot of C (but I never use the is* functions anyway, so it's
never arisen for me).

I suspect quite a lot of programmers write "isspace(c)" and
"isalnum(c)" et al using a "char" argument, not realising they have
sometimes written buggy code - and it passes testing in many cases.

This is a reason why GCC should issue a signedness warning.

>   Anyway, I only gave this particular example as an illustration to back up
> my argument that the incompatibility between signed and unsigned chars is
> not theoretical but very very real and does very much occur in practice as
> it is very very common for char-sized arguments to be promoted to int sized
> and the two types behave significantly differently when this happens.
> That's all.

I agree with you, except that the real practical problems arise from
promotion to wider types, not operations involving just chars of
various signedness.

I agree with Linus that it's common to mix "char" with "unsigned char"
in real code, and warning about calling strlen(unsigned char) would be
too much.

The balance where the warning is useful is that it should warn about
errors such as calling "isspace()" with "char" or "signed char", but
_not_ warn about calling "strlen()" and "memcpy()" with "unsigned
char", or assigning a string constant to an "unsigned char *".

-- Jamie

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-09-27  2:04   ` Jamie Lokier
@ 2004-10-08 13:29     ` Nick Ing-Simmons
  2004-10-08 13:32       ` Dave Korn
  2004-10-08 17:20       ` Joe Buck
  0 siblings, 2 replies; 31+ messages in thread
From: Nick Ing-Simmons @ 2004-10-08 13:29 UTC (permalink / raw)
  To: jamie; +Cc: gcc, 'Morten Welinder', Dave Korn

Jamie Lokier <jamie@shareable.org> writes:
>Dave Korn wrote:
>>   As I said, I think the standard makes it quite clear that you can pass -1
>> and any unsigned char (0....255) value.  It seems to me to say quite clearly
>> that if you have a signed char variable which is negative and you pass it to
>> the ctype function and allow it to be sign-extended by the implicit argument
>> promotion rules then you have supplied an out-of-range value to the
>> function.
>
>This is a real typical bug.  Just recently a bug was found in
>curl-library, quite a popular little library, which calls
>isspace(char).  The bug was missed for a long time, as it is only
>triggered with characters with the MSB set, which do not occur often
>in HTTP headers.

I am reasnably sure that on old SunOS4 systems passing signed char 
to isxxx() was normal and worked. The lookup tables where defined
in such a way that 128 or so entries before the normal table replicated 
the 2nd half.

Snag is it is hard to define such arrays in C.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: signed vs unsigned pointer warning
  2004-10-08 13:29     ` Nick Ing-Simmons
@ 2004-10-08 13:32       ` Dave Korn
  2004-10-08 17:20       ` Joe Buck
  1 sibling, 0 replies; 31+ messages in thread
From: Dave Korn @ 2004-10-08 13:32 UTC (permalink / raw)
  To: 'Nick Ing-Simmons', jamie; +Cc: gcc, 'Morten Welinder'

> -----Original Message-----
> From: Nick Ing-Simmons [mailto:nick@ing-simmons.net] 
> Sent: 08 October 2004 14:06

> >This is a real typical bug.  Just recently a bug was found in
> >curl-library, quite a popular little library, which calls
> >isspace(char).  The bug was missed for a long time, as it is only
> >triggered with characters with the MSB set, which do not occur often
> >in HTTP headers.
> 
> I am reasnably sure that on old SunOS4 systems passing signed char 
> to isxxx() was normal and worked. The lookup tables where defined
> in such a way that 128 or so entries before the normal table 
> replicated 
> the 2nd half.
> 
> Snag is it is hard to define such arrays in C.

  Well, you don't, you just define a "type * const" pointer to the 128'th
entry of the real underlying, e.g. instead of saying

unsigned char isXXX_flags_array[256]

you say

static unsigned char __isXXX_flags_array[256];
unsigned char * const isXXX_flags_array = &__is_flags_array[128];


  Or at any rate, I use that idiom quite a lot.


    cheers, 
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-08 13:29     ` Nick Ing-Simmons
  2004-10-08 13:32       ` Dave Korn
@ 2004-10-08 17:20       ` Joe Buck
  2004-10-08 17:28         ` Paul Jarc
  2004-10-08 18:57         ` Morten Welinder
  1 sibling, 2 replies; 31+ messages in thread
From: Joe Buck @ 2004-10-08 17:20 UTC (permalink / raw)
  To: Nick Ing-Simmons; +Cc: jamie, gcc, 'Morten Welinder', Dave Korn

On Fri, Oct 08, 2004 at 02:06:23PM +0100, Nick Ing-Simmons wrote:
> >This is a real typical bug.  Just recently a bug was found in
> >curl-library, quite a popular little library, which calls
> >isspace(char).  The bug was missed for a long time, as it is only
> >triggered with characters with the MSB set, which do not occur often
> >in HTTP headers.
> 
> I am reasnably sure that on old SunOS4 systems passing signed char 
> to isxxx() was normal and worked. The lookup tables where defined
> in such a way that 128 or so entries before the normal table replicated 
> the 2nd half.

Why can't an implementation define isxxx(c) to return something like
    table_lookup[(unsigned)(c)]
?



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-08 17:20       ` Joe Buck
@ 2004-10-08 17:28         ` Paul Jarc
  2004-10-08 17:59           ` Joe Buck
  2004-10-08 18:57         ` Morten Welinder
  1 sibling, 1 reply; 31+ messages in thread
From: Paul Jarc @ 2004-10-08 17:28 UTC (permalink / raw)
  To: Joe Buck
  Cc: Nick Ing-Simmons, jamie, gcc, 'Morten Welinder', Dave Korn

Joe Buck <Joe.Buck@synopsys.COM> wrote:
> Why can't an implementation define isxxx(c) to return something like
>     table_lookup[(unsigned)(c)]
> ?

Assuming EOF==-1, that fails to distinguish between EOF and character
255 (as does the 384-element table in the case where a signed char is
passed).  The ctype macros are supposed to accept EOF as well as
unsigned char values.


paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-08 17:28         ` Paul Jarc
@ 2004-10-08 17:59           ` Joe Buck
  2004-10-08 18:15             ` Dave Korn
                               ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Joe Buck @ 2004-10-08 17:59 UTC (permalink / raw)
  To: Nick Ing-Simmons, jamie, gcc, 'Morten Welinder', Dave Korn

On Fri, Oct 08, 2004 at 12:26:35PM -0400, Paul Jarc wrote:
> Joe Buck <Joe.Buck@synopsys.COM> wrote:
> > Why can't an implementation define isxxx(c) to return something like
> >     table_lookup[(unsigned)(c)]
> > ?
> 
> Assuming EOF==-1, that fails to distinguish between EOF and character
> 255 (as does the 384-element table in the case where a signed char is
> passed).  The ctype macros are supposed to accept EOF as well as
> unsigned char values.

No, (unsigned)-1 does not turn into 255, it turns into a very large
number.  You cannot store EOF in a char.  Here's a bug fix:

	 table_lookup[1U+(unsigned)(c)]

Now EOF goes into slot 0.



^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: signed vs unsigned pointer warning
  2004-10-08 17:59           ` Joe Buck
@ 2004-10-08 18:15             ` Dave Korn
  2004-10-08 18:22               ` Joe Buck
  2004-10-08 18:24             ` Jamie Lokier
  2004-10-08 19:57             ` Paul Jarc
  2 siblings, 1 reply; 31+ messages in thread
From: Dave Korn @ 2004-10-08 18:15 UTC (permalink / raw)
  To: 'Joe Buck', 'Nick Ing-Simmons',
	jamie, gcc, 'Morten Welinder'

> -----Original Message-----
> From: Joe Buck  
> Sent: 08 October 2004 18:09
> To: Nick Ing-Simmons; jamie@shareable.org; gcc@gcc.gnu.org; 
> 'Morten Welinder'; Dave Korn
> Subject: Re: signed vs unsigned pointer warning
> 
> On Fri, Oct 08, 2004 at 12:26:35PM -0400, Paul Jarc wrote:
> > Joe Buck <Joe.Buck@synopsys.COM> wrote:
> > > Why can't an implementation define isxxx(c) to return 
> something like
> > >     table_lookup[(unsigned)(c)]
> > > ?
> > 
> > Assuming EOF==-1, that fails to distinguish between EOF and 
> character
> > 255 (as does the 384-element table in the case where a 
> signed char is
> > passed).  The ctype macros are supposed to accept EOF as well as
> > unsigned char values.
> 
> No, (unsigned)-1 does not turn into 255, it turns into a very large
> number.  You cannot store EOF in a char.  

  Of course, isXXXX functions do take an int argument, which can indeed
store EOF.

> Here's a bug fix:
> 
> 	 table_lookup[1U+(unsigned)(c)]
> 
> Now EOF goes into slot 0.

  Hmm, but only by virtue of integer maths overflow.  Wouldn't it be better
to just leave out the cast in this case and do the sum in signed math?

    cheers, 
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-08 18:15             ` Dave Korn
@ 2004-10-08 18:22               ` Joe Buck
  0 siblings, 0 replies; 31+ messages in thread
From: Joe Buck @ 2004-10-08 18:22 UTC (permalink / raw)
  To: Dave Korn
  Cc: 'Nick Ing-Simmons', jamie, gcc, 'Morten Welinder'

I wrote:


> > Here's a bug fix:
> > 
> > 	 table_lookup[1U+(unsigned)(c)]
> > 
> > Now EOF goes into slot 0.

On Fri, Oct 08, 2004 at 06:12:09PM +0100, Dave Korn wrote:
>   Hmm, but only by virtue of integer maths overflow.  Wouldn't it be better
> to just leave out the cast in this case and do the sum in signed math?

No.  The C standard guarantees that unsigned arithmetic obeys the rules
of arithmetic modulo 2**N, where N is the number of bits.  The overflow is
well-defined and is required to yield zero.

Signed math would mishandle the original case, where the original argument
was a signed char with 8th bit set.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-08 17:59           ` Joe Buck
  2004-10-08 18:15             ` Dave Korn
@ 2004-10-08 18:24             ` Jamie Lokier
  2004-10-08 19:57             ` Paul Jarc
  2 siblings, 0 replies; 31+ messages in thread
From: Jamie Lokier @ 2004-10-08 18:24 UTC (permalink / raw)
  To: Joe Buck; +Cc: Nick Ing-Simmons, gcc, 'Morten Welinder', Dave Korn

Joe Buck wrote:
> No, (unsigned)-1 does not turn into 255, it turns into a very large
> number.  You cannot store EOF in a char.  Here's a bug fix:
> 
> 	 table_lookup[1U+(unsigned)(c)]
> 
> Now EOF goes into slot 0.

The trouble is that char 255 (or char -1 depending on your point of
view) _might_ also go into slot 0, depending on the type of c and how
it was passed.

Do the standards require anything about the result of is*(EOF), or do
the functions simply have to accept that as an argument?

In any case, code which writes:

    char c = ... ;
    ...
    if (isspace(c)) {
        ...
    }

Surely deserves a warning from the compiler, just because it'll fail
on some OSes.

-- Jamie

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-08 17:20       ` Joe Buck
  2004-10-08 17:28         ` Paul Jarc
@ 2004-10-08 18:57         ` Morten Welinder
  2004-10-08 20:59           ` Matthias B.
  2004-10-11  0:11           ` Kai Henningsen
  1 sibling, 2 replies; 31+ messages in thread
From: Morten Welinder @ 2004-10-08 18:57 UTC (permalink / raw)
  To: Joe.Buck; +Cc: gcc


   Why can't an implementation define isxxx(c) to return something like
       table_lookup[(unsigned)(c)]
   ?

Because isxxx needs to work with EOF, typically -1.

And regardless of the implementation's value of EOF, you cannot cast to
"unsigned char" either because that would make EOF collide with one of
the other 256 valid inputs.

So to summarize: someone screwed up with the definition of isxxx and it
is not fixable by the implementation.  Therefore, get used to seeing
code like

     const char *foo = whatever;
     if (isspace ((unsigned char)*foo)) oink ();

no matter how ugly it is.

Morten

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-08 17:59           ` Joe Buck
  2004-10-08 18:15             ` Dave Korn
  2004-10-08 18:24             ` Jamie Lokier
@ 2004-10-08 19:57             ` Paul Jarc
  2004-10-09  7:05               ` Jamie Lokier
  2 siblings, 1 reply; 31+ messages in thread
From: Paul Jarc @ 2004-10-08 19:57 UTC (permalink / raw)
  To: Joe Buck
  Cc: Nick Ing-Simmons, jamie, gcc, 'Morten Welinder', Dave Korn

Joe Buck <Joe.Buck@synopsys.COM> wrote:
> No, (unsigned)-1 does not turn into 255, it turns into a very large
> number.

But character 255, as a signed char, is also -1, and so will also
become the same large number.  The caller must cast char to unsigned
char to ensure that EOF is distinct.  *Only* the caller can make the
distinction reliably, since only the caller knows whether this
particular -1 is supposed to be EOF.


paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-08 18:57         ` Morten Welinder
@ 2004-10-08 20:59           ` Matthias B.
  2004-10-08 22:34             ` Paul Koning
  2004-10-09  1:39             ` Andreas Schwab
  2004-10-11  0:11           ` Kai Henningsen
  1 sibling, 2 replies; 31+ messages in thread
From: Matthias B. @ 2004-10-08 20:59 UTC (permalink / raw)
  To: gcc

On Fri,  8 Oct 2004 13:31:53 -0400 (EDT) terra@gnome.org (Morten Welinder)
wrote:

> 
>    Why can't an implementation define isxxx(c) to return something like
>        table_lookup[(unsigned)(c)]
>    ?
> 
> Because isxxx needs to work with EOF, typically -1.

So what? On my system all the is* calls return the same thing for EOF as
they do for 255, namely 0. Is there an actual locale where any of the
is*() calls returns non-zero for 255? In any case, for the usual western
locales, your argument is invalid.

> And regardless of the implementation's value of EOF, you cannot cast to
> "unsigned char" either because that would make EOF collide with one of
> the other 256 valid inputs.
> 
> So to summarize: someone screwed up with the definition of isxxx and it
> is not fixable by the implementation.  Therefore, get used to seeing
> code like
> 
>      const char *foo = whatever;
>      if (isspace ((unsigned char)*foo)) oink ();
> 
> no matter how ugly it is.

I've grepped through a couple sources I had lying around. I don't think
we'll have to get used to this anytime soon. Almost everyone just seems to
pass signed chars to the is* functions. It's the DE FACTO standard and
libraries/compilers better implement it. And as stated above, for the
usual western locales, you can implement it without violating the
standard, so not supporting signed char arguments would be completely
irrational.

MSB

-- 
You know that you're lonely when you start laughing at your own jokes.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-08 20:59           ` Matthias B.
@ 2004-10-08 22:34             ` Paul Koning
  2004-10-10  2:03               ` Matthias B.
  2004-10-09  1:39             ` Andreas Schwab
  1 sibling, 1 reply; 31+ messages in thread
From: Paul Koning @ 2004-10-08 22:34 UTC (permalink / raw)
  To: msb; +Cc: gcc

>>>>> "Matthias" == Matthias B <msbREMOVE-THIS@winterdrache.de> writes:

 Matthias> On Fri, 8 Oct 2004 13:31:53 -0400 (EDT) terra@gnome.org
 Matthias> (Morten Welinder) wrote:

 >> Why can't an implementation define isxxx(c) to return something
 >> like table_lookup[(unsigned)(c)] ?
 >> 
 >> Because isxxx needs to work with EOF, typically -1.

 Matthias> So what? On my system all the is* calls return the same
 Matthias> thing for EOF as they do for 255, namely 0. Is there an
 Matthias> actual locale where any of the is*() calls returns non-zero
 Matthias> for 255? In any case, for the usual western locales, your
 Matthias> argument is invalid.

Not true!  Character code 255 is a lowecase letter.  Certainly it is
in iso-8859-1 (western europe) and probably in many of the other 8859
flavors. 

    paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-08 20:59           ` Matthias B.
  2004-10-08 22:34             ` Paul Koning
@ 2004-10-09  1:39             ` Andreas Schwab
  1 sibling, 0 replies; 31+ messages in thread
From: Andreas Schwab @ 2004-10-09  1:39 UTC (permalink / raw)
  To: Matthias B.; +Cc: gcc

"Matthias B." <msbREMOVE-THIS@winterdrache.de> writes:

> So what? On my system all the is* calls return the same thing for EOF as
> they do for 255, namely 0. Is there an actual locale where any of the
> is*() calls returns non-zero for 255? In any case, for the usual western
> locales, your argument is invalid.

For any locale that uses the Latin-1 character set isalpha(255) returns
non-zero (<U00FF> is LATIN SMALL LETTER Y WITH DIAERESIS).

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-08 19:57             ` Paul Jarc
@ 2004-10-09  7:05               ` Jamie Lokier
  2004-10-09  8:48                 ` Paul Jarc
  0 siblings, 1 reply; 31+ messages in thread
From: Jamie Lokier @ 2004-10-09  7:05 UTC (permalink / raw)
  To: Joe Buck, Nick Ing-Simmons, gcc, 'Morten Welinder', Dave Korn

Paul Jarc wrote:
> > No, (unsigned)-1 does not turn into 255, it turns into a very large
> > number.
> 
> But character 255, as a signed char, is also -1, and so will also
> become the same large number.  The caller must cast char to unsigned
> char to ensure that EOF is distinct.  *Only* the caller can make the
> distinction reliably, since only the caller knows whether this
> particular -1 is supposed to be EOF.

Is it necessary for them to be distinct?

I didn't see any rule in "man isalpha" saying anything about EOF
other than it's accepted as an argument value.

In particular, is it ok if isalpha(EOF) == isalpha(255), whatever the
value of isalpha(255)?

Or is there a rule that is*(EOF) must return 0?

-- Jamie

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-09  7:05               ` Jamie Lokier
@ 2004-10-09  8:48                 ` Paul Jarc
  2004-10-11 16:34                   ` Richard Earnshaw
  0 siblings, 1 reply; 31+ messages in thread
From: Paul Jarc @ 2004-10-09  8:48 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Joe Buck, Nick Ing-Simmons, gcc, 'Morten Welinder', Dave Korn

Jamie Lokier <jamie@shareable.org> wrote:
> Or is there a rule that is*(EOF) must return 0?

Yes.  The standard specifies the result of each is* macro to be 1 iff
the argument is a certain kind of character.  EOF is not any kind of
character.


paul

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-08 22:34             ` Paul Koning
@ 2004-10-10  2:03               ` Matthias B.
  0 siblings, 0 replies; 31+ messages in thread
From: Matthias B. @ 2004-10-10  2:03 UTC (permalink / raw)
  To: gcc

On Fri, 8 Oct 2004 16:10:23 -0400 Paul Koning <pkoning@equallogic.com>
wrote:

> Not true!  Character code 255 is a lowecase letter.  Certainly it is
> in iso-8859-1 (western europe) and probably in many of the other 8859
> flavors. 

Sorry. I forgot to use setlocale() in my test program. I thought that
glibc would automatically choose the locale based on the LC_* environment
variables in that case, but that doesn't seem to be true.

MSB

-- 
There are only 10 types of people in this world:
Those who understand binary, and those who don't.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-08 18:57         ` Morten Welinder
  2004-10-08 20:59           ` Matthias B.
@ 2004-10-11  0:11           ` Kai Henningsen
  1 sibling, 0 replies; 31+ messages in thread
From: Kai Henningsen @ 2004-10-11  0:11 UTC (permalink / raw)
  To: gcc

terra@gnome.org (Morten Welinder)  wrote on 08.10.04 in <20041008173153.E89761422D56@darter.rentec.com>:

> And regardless of the implementation's value of EOF, you cannot cast to
> "unsigned char" either because that would make EOF collide with one of
> the other 256 valid inputs.

The usual problem is that either you already have (implicitely) cast to  
char where you shouldn't gave (the classic while((c=getchar())!=EOF) bug),  
or else you just have EOF-less data anyway but it's in char*.

In the first case, obviously the answer is to make c an int as it should  
have been in the first place, so you can distinguish between EOF and  
(unsigned char)255 - there's a reason getchar() returns all characters as  
unsigned char values. Then no casting is needed for isXXX().

In the second case, there's nothing wrong with casting to unsigned char -  
in fact, that is exactly what is necessary. You can, of course, do it by  
accessing your data with an unsigned char * pointer instead. That's just a  
question of what looks better to you.

And I should mention wint_t/wchar_t here just for completeness.


Really, the fundamental bug was when the first C standard failed to have  
separate unit-of-storage and character types, and allowed the fundamental  
character type to be signed (and forced it to always be one storage unit,  
thus wchar_t). There's really no good reason for characters to have signs.  
(There occasionally is, of course, for storage units.)

It's probably far too late to fix that, unfortunately.


Anyway, the facts of life are that we have character-handling interfaces  
that insist on unsigned chars (getchar(), isXXX()), and we have others  
that insist on unadorned chars (strXXX(), "..."). And thus, we need extra  
intelligence to figure out where we should or should not warn about  
signedness-of-char issues.

Let this be a lesson to language designers about how NOT to do it.

MfG Kai

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-10-09  8:48                 ` Paul Jarc
@ 2004-10-11 16:34                   ` Richard Earnshaw
  0 siblings, 0 replies; 31+ messages in thread
From: Richard Earnshaw @ 2004-10-11 16:34 UTC (permalink / raw)
  To: Paul Jarc
  Cc: Jamie Lokier, Joe Buck, Nick Ing-Simmons, gcc,
	'Morten Welinder',
	Dave Korn

On Fri, 2004-10-08 at 22:14, Paul Jarc wrote:
> Jamie Lokier <jamie@shareable.org> wrote:
> > Or is there a rule that is*(EOF) must return 0?
> 
> Yes.  The standard specifies the result of each is* macro to be 1 iff
> the argument is a certain kind of character.  EOF is not any kind of
> character.

Pedant time...  The value is 'nonzero (true)' according to section 7.4.1
of c99.  It doesn't have to be 1.  

R.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: signed vs unsigned pointer warning
  2004-09-21 22:36 ` Linus Torvalds
@ 2004-09-22 14:35   ` Dave Korn
  0 siblings, 0 replies; 31+ messages in thread
From: Dave Korn @ 2004-09-22 14:35 UTC (permalink / raw)
  To: 'Linus Torvalds', 'Richard Henderson'; +Cc: gcc

> -----Original Message-----
> From: gcc-owner On Behalf Of Linus Torvalds
> Sent: 21 September 2004 19:49


> As to the argument that signed and unsigned types are 
> incompatible - that
> may be true in theory ("C language lawyers"), but it's 
> certainly not true
> in practice. I'd hope that practice matters.
> 
> 		Linus


  Until you try indexing an array with an 8-bit high ASCII char, of course.
Then things become radically different.  I've known buggy ctype
implementations that have failed on this (ASCII > 127 being signed negative
and the ctype function accidentally indexing memory space before an array
full of ctype result flags).


    cheers, 
      DaveK
-- 
Can't think of a witty .sigline today....

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: signed vs unsigned pointer warning
  2004-09-21 20:52 Richard Henderson
@ 2004-09-21 22:36 ` Linus Torvalds
  2004-09-22 14:35   ` Dave Korn
  0 siblings, 1 reply; 31+ messages in thread
From: Linus Torvalds @ 2004-09-21 22:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc



On Tue, 21 Sep 2004, Richard Henderson wrote:
>
> [ Forwarded from the thread beginning at
>   http://marc.theaimsgroup.com/?l=linux-sparse&m=109577992701909&w=2
> ]

Btw, please do look at that first email, and in particular about "char".

The fact is, "char" is neither clearly signed nor unsigned, so it looks to
be clearly a bug to complain about assignment either way. Also, "char"  
ends up being special because there are magic language features that are
explicitly "char", but make sense for both signed and unsigned types,
notably constant strings (but also standard-defined string functions like
"strlen()" that are defined in terms of taking a "const char *").

In other words, I'd argue _strongly_ to allow at least "char *" to be
compatible with both "unsigned char *" and "signed char *". Anything else
is just madness.

Making "signed char *" be incompatible with "unsigned char *" (ie
_explicit_ signedness) sounds sane enough, but it may be that gcc can't
tell the difference internally.

Also, can we please add a flag to turn this off? I'd prefer to see
something like "-W[no-]typesign", which is what I already do in sparse (in
sparse it currently defaults to being off pending resolution of whether it
makes sense at all).

As to the argument that signed and unsigned types are incompatible - that
may be true in theory ("C language lawyers"), but it's certainly not true
in practice. I'd hope that practice matters.

		Linus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* signed vs unsigned pointer warning
@ 2004-09-21 20:52 Richard Henderson
  2004-09-21 22:36 ` Linus Torvalds
  0 siblings, 1 reply; 31+ messages in thread
From: Richard Henderson @ 2004-09-21 20:52 UTC (permalink / raw)
  To: gcc; +Cc: torvalds

[ Forwarded from the thread beginning at
  http://marc.theaimsgroup.com/?l=linux-sparse&m=109577992701909&w=2
]

On Tue, Sep 21, 2004 at 09:49:43AM -0700, Linus Torvalds wrote:
> In fact, even the "explicit sign" differences are a bit questionable. The 
> xdr4 code does something like this:
> 
> 	s64	len, start, end;
> 	...
> 	p = xdr_decode_hyper(p, &start);
> 	p = xdr_decode_hyper(p, &len);
> 	..
> 
> and both of these generate warnings, because xdr_decode_hyper() looks like
> 
> 	static inline u32 *
> 	xdr_decode_hyper(u32 *p, __u64 *valp)
> 
> but the fact is, it obviously works fine to return both u64 and s64
> values, and forcing the caller to use one over the other is just not that
> sensible.

Maybe.  Or maybe it's a bug that the caller typo'd s64 instead of u64,
and (start < end) will mistakenly compare false when end gets large.

> ... and duplicating the function to do the same thing also seems 
> totally idiotic.

I don't agree.  If signed vs unsigned really isn't important, because
xdr_decode_hyper does no range checking, yadda yadda, then

	static inline u32 *
	xdr_decode_hyper_s(u32 *p, s64 *valp)
	{
	  return xdr_decode_hyper (p, (u64 *) valp));
	}

does not seem too much to ask.

> Richard, are you sure that the gcc team has thought this through wrt
> gcc-4.0, or is this just another total disaster like adding
> "-Wsign-compare" to the default flags in gcc-3.0?

I think we're on more solid ground here than -Wsign-compare, because
the types "int *" and "unsigned int *" are not compatible [c99 6.2.7].
IANAL, but we could be within our rights to reject the program entirely
[c99 6.5.16.1].

I am finding it somewhat annoying that there's no -W switch to turn it
off though, since there are three include/linux/ headers that now prevent
me from using -Werror under arch/alpha/.


r~

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2004-10-11  9:53 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-22 16:43 signed vs unsigned pointer warning Morten Welinder
2004-09-22 17:17 ` Paul Koning
2004-09-22 17:27   ` Morten Welinder
2004-09-22 17:49     ` Dave Korn
2004-09-22 17:20 ` Dave Korn
2004-09-23  1:31   ` Andreas Schwab
2004-09-23 12:29     ` Dave Korn
2004-09-23 18:57       ` Joe Buck
2004-09-23 19:38         ` Dave Korn
2004-09-27  2:04   ` Jamie Lokier
2004-10-08 13:29     ` Nick Ing-Simmons
2004-10-08 13:32       ` Dave Korn
2004-10-08 17:20       ` Joe Buck
2004-10-08 17:28         ` Paul Jarc
2004-10-08 17:59           ` Joe Buck
2004-10-08 18:15             ` Dave Korn
2004-10-08 18:22               ` Joe Buck
2004-10-08 18:24             ` Jamie Lokier
2004-10-08 19:57             ` Paul Jarc
2004-10-09  7:05               ` Jamie Lokier
2004-10-09  8:48                 ` Paul Jarc
2004-10-11 16:34                   ` Richard Earnshaw
2004-10-08 18:57         ` Morten Welinder
2004-10-08 20:59           ` Matthias B.
2004-10-08 22:34             ` Paul Koning
2004-10-10  2:03               ` Matthias B.
2004-10-09  1:39             ` Andreas Schwab
2004-10-11  0:11           ` Kai Henningsen
  -- strict thread matches above, loose matches on Subject: below --
2004-09-21 20:52 Richard Henderson
2004-09-21 22:36 ` Linus Torvalds
2004-09-22 14:35   ` Dave Korn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).