public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* how gcc thinks `char' as signed char or unsigned char ?
@ 2008-03-05  9:20 PRC
  2008-03-05  9:58 ` Andrew Haley
  2008-03-05 13:08 ` John Love-Jensen
  0 siblings, 2 replies; 5+ messages in thread
From: PRC @ 2008-03-05  9:20 UTC (permalink / raw)
  To: gcc-help

---------------------------------
int main()
{
	char a = -7;
	
	if( a < -9 )
		printf("a");
	else
		printf("b");
}
---------------------------------
sde-gcc -c a2.c
c:/a2.c: In function `main':
c:/a2.c:6: warning: comparison is always false due to limited range of data type

It may be the reason for this warning that gcc thinks `char' as 'unsigned char' by default.
Can I change the default configuration by modifying some configuration file?
Or this feature can't be changed after gcc has been built?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: how gcc thinks `char' as signed char or unsigned char ?
  2008-03-05  9:20 how gcc thinks `char' as signed char or unsigned char ? PRC
@ 2008-03-05  9:58 ` Andrew Haley
  2008-03-05 13:08 ` John Love-Jensen
  1 sibling, 0 replies; 5+ messages in thread
From: Andrew Haley @ 2008-03-05  9:58 UTC (permalink / raw)
  To: PRC; +Cc: gcc-help

PRC wrote:
> ---------------------------------
> int main()
> {
> 	char a = -7;
> 	
> 	if( a < -9 )
> 		printf("a");
> 	else
> 		printf("b");
> }
> ---------------------------------
> sde-gcc -c a2.c
> c:/a2.c: In function `main':
> c:/a2.c:6: warning: comparison is always false due to limited range of data type
> 
> It may be the reason for this warning that gcc thinks `char' as 'unsigned char' by default.
> Can I change the default configuration by modifying some configuration file?
> Or this feature can't be changed after gcc has been built?

-fsigned-char

The signedness of characters is part of a machine's ABI.  If you
really need to have unsigned chars, declare them as such.

Andrew.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: how gcc thinks `char' as signed char or unsigned char ?
  2008-03-05  9:20 how gcc thinks `char' as signed char or unsigned char ? PRC
  2008-03-05  9:58 ` Andrew Haley
@ 2008-03-05 13:08 ` John Love-Jensen
  2008-03-05 13:32   ` Tom St Denis
  1 sibling, 1 reply; 5+ messages in thread
From: John Love-Jensen @ 2008-03-05 13:08 UTC (permalink / raw)
  To: PRC, GCC-help

Hi PRC,

In C++ (and C too), it is really best to think of char as non-signed.

If you need to manipulate bytes of data, take a page from the Java
programming manual and do this:

typedef char byte;
byte b = GetByte();
if ((b & 0xFF) < 200)
{
  std::cout << "byte is under 200" << std::endl;
}

The explicit (b & 0xFF) converts the byte to an int, and ensures that it is
between 0 and 255.

Also, the explicit (b & 0xFF) tells any other programmer that follows in
your footsteps that the value is ensured to be between 0 and 255.

It's safe from char being signed or unsigned, which is not reliable from
platform to platform.

Also, using 'byte' instead of 'char' is a way to convey, in code (rather
than in comment) that you are working with byte information and not
character information.

(The above assumes that the size of a byte is an octet.  If you are on a
platform that has a byte of a different bit-size, you may need to
accommodate accordingly.)

And on my platforms, (b & 0xFF) optimizes very well.  Apparently most CPUs
are pretty good at bit twiddling.  :-)

Some may advocate using 'unsigned char' for byte.  I used to advocate that,
too (via: typedef unsigned char byte;).  After a stint doing Java
development, I've changed my mind and now much prefer using an unspecified
'char' for byte (via: typedef char byte;), and employ the (b & 0xFF)
paradigm when/where needed.  More typing, but -- in my opinion -- much
better code clarity, better self-documenting code, and less obfuscation.

In the end, it's a matter of coding style and personal preference.  The
above is my recommendation, for you consideration.

HTH,
--Eljay

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: how gcc thinks `char' as signed char or unsigned char ?
  2008-03-05 13:08 ` John Love-Jensen
@ 2008-03-05 13:32   ` Tom St Denis
  2008-03-05 14:11     ` John Love-Jensen
  0 siblings, 1 reply; 5+ messages in thread
From: Tom St Denis @ 2008-03-05 13:32 UTC (permalink / raw)
  To: John Love-Jensen; +Cc: PRC, GCC-help

John Love-Jensen wrote:
> typedef char byte;
> byte b = GetByte();
> if ((b & 0xFF) < 200)
> {
>   std::cout << "byte is under 200" << std::endl;
> }
>   

Presumably GetByte() would be responsible for ensuring it's range is 
limited to 0..255 or -128..127, so the &0xFF is not required.

> The explicit (b & 0xFF) converts the byte to an int, and ensures that it is
> between 0 and 255.
>   
b < 200 would work just fine if GetByte() were spec'e properly.
> Also, using 'byte' instead of 'char' is a way to convey, in code (rather
> than in comment) that you are working with byte information and not
> character information.
>   
It's more apt to think of "char" as a small integer type, not a 
"character type."  You could have a platform where char and int are the 
same size for instance. 
> Some may advocate using 'unsigned char' for byte.  I used to advocate that,
> too (via: typedef unsigned char byte;).  After a stint doing Java
> development, I've changed my mind and now much prefer using an unspecified
> 'char' for byte (via: typedef char byte;), and employ the (b & 0xFF)
> paradigm when/where needed.  More typing, but -- in my opinion -- much
> better code clarity, better self-documenting code, and less obfuscation.
>   
In two's compliment it doesn't really matter, unless you multiply the 
type.  You'd get a signed multiplication instead of unsigned.  Which 
probably won't matter, but it's good to be explicit.  Also shifting 
works differently.  unsigned right shifts fill with zeros.  signed 
shifts may fill with zeros OR the sign bit.

It's IMO a better idea to use the unsigned type (that's why it exists) 
and write your functions so that their domains and co-domains are well 
understood.  If you can't figure out what the inputs to a function 
should be, it's clearly not clearly clearly documented.  ;-)

Tom

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: how gcc thinks `char' as signed char or unsigned char ?
  2008-03-05 13:32   ` Tom St Denis
@ 2008-03-05 14:11     ` John Love-Jensen
  0 siblings, 0 replies; 5+ messages in thread
From: John Love-Jensen @ 2008-03-05 14:11 UTC (permalink / raw)
  To: Tom St Denis; +Cc: PRC, GCC-help

Hi Tom,

> Presumably GetByte() would be responsible for ensuring it's range is
> limited to 0..255 or -128..127, so the &0xFF is not required.

GetByte returns a byte (a char), which does not specify whether the range is
0..255 or -128..127, so the &0xFF is required.

> b < 200 would work just fine if GetByte() were spec'e properly.

GetByte is spec'd properly.

> It's more apt to think of "char" as a small integer type, not a
> "character type."  You could have a platform where char and int are the
> same size for instance.

I have worked on a platform where char are 32-bit, and another platform
where char are 13-bit.  But that was quite a while ago.

It's more apt to think of char as holding character data, and that a byte
holds a byte (not a small integer).  And that the (b & 0xFF) converts the
byte into an int with a constrained range of 0..255.

> In two's compliment it doesn't really matter, unless you multiply the
> type.  You'd get a signed multiplication instead of unsigned.  Which
> probably won't matter, but it's good to be explicit.  Also shifting
> works differently.  unsigned right shifts fill with zeros.  signed
> shifts may fill with zeros OR the sign bit.

The (b & 0xFF) will work correctly on 1's complement machines, and 2's
complement machines.

If the desire is to have a byte (the typedef'd identifier) represent an
octet, it will also work correctly on machines with greater than 8-bit
bytes.  (In which case it may be more appropriate to use typedef char octet;
as the identifier.)

On platforms where a byte is less than 8-bit, and the desire is for a byte
to represent an octet, the char is not sufficient to hold an octet.

On all the platforms I work on these days, a byte is 8-bit.  I don't expect
that to change any time soon.

> It's IMO a better idea to use the unsigned type (that's why it exists)
> and write your functions so that their domains and co-domains are well
> understood.  If you can't figure out what the inputs to a function
> should be, it's clearly not clearly clearly documented.  ;-)

For a type that holds small integers, an unsigned char (or uint8_t from
<stdint.h>) is appropriate.

For a type that holds a byte, a typedef char byte; without regard to
signed-ness or unsigned-ness is appropriate.

For conversion of a byte to an unsigned char (or uint8_t) which is a small
integer, a (b & 0xFF) is appropriate.

IMO.  YMMV.

Sincerely,
--Eljay

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2008-03-05 14:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-05  9:20 how gcc thinks `char' as signed char or unsigned char ? PRC
2008-03-05  9:58 ` Andrew Haley
2008-03-05 13:08 ` John Love-Jensen
2008-03-05 13:32   ` Tom St Denis
2008-03-05 14:11     ` John Love-Jensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).