public inbox for gcc-help@gcc.gnu.org
 help / color / mirror / Atom feed
* RE: Crazy compiler optimization
@ 2013-10-09  9:36 vijay nag
  2013-10-09  9:54 ` Jonathan Wakely
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: vijay nag @ 2013-10-09  9:36 UTC (permalink / raw)
  To: gcc-help

Hello GCC,

I'm facing a wierd compiler optimization problem. Consider the code
snippet below

#include <stdio.h>

int printChar(unsigned long cur_col, unsigned char c)
{
  char buf[256];
  char* bufp = buf;
  char cnt = sizeof(buf) - 2; /* overflow in implicit type conversion */
  unsigned long terminal_width = 500;

  while ((cur_col++ < terminal_width) && cnt) {
      *bufp++ = c;
      cnt--;
  }

  *bufp++ = '\n';
  *bufp = 0;

  printf("%c\n", buf[0]);
  return 1;
}

int main()
{
  printChar(80, '-');
  return 1;
}

While compiler optimization should guarantee that the result of
execution is same at all optimization levels, I'm observing difference
in the result of execution of the above program when optimized to
different levels. Although there is fundamental problem with the
statement "char cnt = sizeof(buf) - 2", GCC seems to be warning(that
too only when -pedantic flag is used) about overflow error while
silently discarding any code related to cnt i.e. the check "&& cnt" in
the if-clause is silently discarded by the compiler at -O2.

$]gcc -g char.c -o char.c.unoptimized -m32 -O0 -Wall -Wextra -pedantic
char.c: In function ‘printChar’:
char.c:8: warning: overflow in implicit constant conversion

$]gcc -g char.c -o char.c.optimized -m32 -O2 -Wall -Wextra -pedantic
char.c: In function ‘printChar’:
char.c:8: warning: overflow in implicit constant conversion

$]./char.c.unoptimized
-
$]./char.c.optimized
-
Segmentation fault (core dumped)

Basically the crash here is because of elimination of the check in the
if-clause "&& cnt" which is causing stack overrun and thereby SIGSEGV.
While standards may say that the behaviour is
undefined when an unsigned value is stored in a signed value, can a
language lawyer explain to me why GCC chose to eliminate code
pertaining to cnt considering it as dead-code ?

Below is the objdump -S output of optimized and unoptimized binaries.

A) Optimized Binary

int printChar(unsigned long cur_col, unsigned char c)
{
 80483b0:    55                       push   %ebp
 80483b1:    89 e5                    mov    %esp,%ebp
 80483b3:    81 ec 08 01 00 00        sub    $0x108,%esp
 80483b9:    8b 45 08                 mov    0x8(%ebp),%eax
  char buf[256];
  char* bufp = buf;
  char cnt = sizeof(buf) - 2;
  unsigned long terminal_width = 500;

  while ((cur_col++ < terminal_width) && cnt) {
 80483bc:    8d 8d 00 ff ff ff        lea    0xffffff00(%ebp),%ecx
 80483c2:    8b 55 0c                 mov    0xc(%ebp),%edx
 80483c5:    3d f3 01 00 00           cmp    $0x1f3,%eax
 80483ca:    77 18                    ja     80483e4 <printChar+0x34>
 80483cc:    83 c0 01                 add    $0x1,%eax
 80483cf:    8d 8d 00 ff ff ff        lea    0xffffff00(%ebp),%ecx
 80483d5:    83 c0 01                 add    $0x1,%eax
      *bufp++ = c;
 80483d8:    88 11                    mov    %dl,(%ecx)
 80483da:    83 c1 01                 add    $0x1,%ecx
 80483dd:    3d f5 01 00 00           cmp    $0x1f5,%eax
 80483e2:    75 f1                    jne    80483d5 <printChar+0x25>
      cnt--;
  }

  *bufp++ = '\n';
 80483e4:    c6 01 0a                 movb   $0xa,(%ecx)
  *bufp = 0;
 80483e7:    c6 41 01 00              movb   $0x0,0x1(%ecx)

  printf("%c\n", buf[0]);
 80483eb:    0f be 85 00 ff ff ff     movsbl 0xffffff00(%ebp),%eax
 80483f2:    c7 04 24 20 85 04 08     movl   $0x8048520,(%esp)
 80483f9:    89 44 24 04              mov    %eax,0x4(%esp)
 80483fd:    e8 b6 fe ff ff           call   80482b8 <printf@plt>
  return 1;
}
 8048402:    b8 01 00 00 00           mov    $0x1,%eax
 8048407:    c9                       leave
 8048408:    c3                       ret
 8048409:    8d b4 26 00 00 00 00     lea    0x0(%esi),%esi


B) Unoptimized binary

int printChar(unsigned long cur_col, unsigned char c)
{
 80483a4:    55                       push   %ebp
 80483a5:    89 e5                    mov    %esp,%ebp
 80483a7:    81 ec 28 01 00 00        sub    $0x128,%esp
 80483ad:    8b 45 0c                 mov    0xc(%ebp),%eax
 80483b0:    88 85 ec fe ff ff        mov    %al,0xfffffeec(%ebp)
  char buf[256];
  char* bufp = buf;
 80483b6:    8d 85 f4 fe ff ff        lea    0xfffffef4(%ebp),%eax
 80483bc:    89 45 f4                 mov    %eax,0xfffffff4(%ebp)
  char cnt = sizeof(buf) - 2;
 80483bf:    c6 45 fb fe              movb   $0xfe,0xfffffffb(%ebp)
  unsigned long terminal_width = 500;
 80483c3:    c7 45 fc f4 01 00 00     movl   $0x1f4,0xfffffffc(%ebp)

  while ((cur_col++ < terminal_width) && cnt) {
 80483ca:    eb 14                    jmp    80483e0 <printChar+0x3c>
      *bufp++ = c;
 80483cc:    0f b6 95 ec fe ff ff     movzbl 0xfffffeec(%ebp),%edx
 80483d3:    8b 45 f4                 mov    0xfffffff4(%ebp),%eax
 80483d6:    88 10                    mov    %dl,(%eax)
 80483d8:    83 45 f4 01              addl   $0x1,0xfffffff4(%ebp)
      cnt--;
 80483dc:    80 6d fb 01              subb   $0x1,0xfffffffb(%ebp)
 80483e0:    8b 45 08                 mov    0x8(%ebp),%eax
 80483e3:    3b 45 fc                 cmp    0xfffffffc(%ebp),%eax
 80483e6:    0f 92 c0                 setb   %al
 80483e9:    83 45 08 01              addl   $0x1,0x8(%ebp)
 80483ed:    83 f0 01                 xor    $0x1,%eax
 80483f0:    84 c0                    test   %al,%al
 80483f2:    75 06                    jne    80483fa <printChar+0x56>
 80483f4:    80 7d fb 00              cmpb   $0x0,0xfffffffb(%ebp)
 80483f8:    75 d2                    jne    80483cc <printChar+0x28>
  }

  *bufp++ = '\n';
 80483fa:    8b 45 f4                 mov    0xfffffff4(%ebp),%eax
 80483fd:    c6 00 0a                 movb   $0xa,(%eax)
 8048400:    83 45 f4 01              addl   $0x1,0xfffffff4(%ebp)
  *bufp = 0;
 8048404:    8b 45 f4                 mov    0xfffffff4(%ebp),%eax
 8048407:    c6 00 00                 movb   $0x0,(%eax)

  printf("%c\n", buf[0]);
 804840a:    0f b6 85 f4 fe ff ff     movzbl 0xfffffef4(%ebp),%eax
 8048411:    0f be c0                 movsbl %al,%eax
 8048414:    89 44 24 04              mov    %eax,0x4(%esp)
 8048418:    c7 04 24 30 85 04 08     movl   $0x8048530,(%esp)
 804841f:    e8 94 fe ff ff           call   80482b8 <printf@plt>
  return 1;
 8048424:    b8 01 00 00 00           mov    $0x1,%eax
}
 8048429:    c9                       leave
 804842a:    c3                       ret

g++ -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-libgcj-multifile
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada
--enable-java-awt=gtk --disable-dssi --disable-plugin
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre
--with-cpu=generic --host=x86_64-redhat-linux
Thread model: posix
gcc version 4.1.2 20080704 (Red Hat 4.1.2-52)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crazy compiler optimization
  2013-10-09  9:36 Crazy compiler optimization vijay nag
@ 2013-10-09  9:54 ` Jonathan Wakely
  2013-10-09 10:02   ` vijay nag
  2013-10-09 10:18 ` Nicholas Mc Guire
  2013-10-09 17:48 ` Ian Lance Taylor
  2 siblings, 1 reply; 7+ messages in thread
From: Jonathan Wakely @ 2013-10-09  9:54 UTC (permalink / raw)
  To: vijay nag; +Cc: gcc-help

On 9 October 2013 10:36, vijay nag wrote:
> Hello GCC,
>
> I'm facing a wierd compiler optimization problem. Consider the code
> snippet below
>
> #include <stdio.h>
>
> int printChar(unsigned long cur_col, unsigned char c)
> {
>   char buf[256];
>   char* bufp = buf;
>   char cnt = sizeof(buf) - 2; /* overflow in implicit type conversion */
>   unsigned long terminal_width = 500;
>
>   while ((cur_col++ < terminal_width) && cnt) {
>       *bufp++ = c;
>       cnt--;
>   }


> Basically the crash here is because of elimination of the check in the
> if-clause "&& cnt" which is causing stack overrun and thereby SIGSEGV.
> While standards may say that the behaviour is
> undefined when an unsigned value is stored in a signed value,

Standards do not say that. 254 cannot be presented in a char if char
is a signed type, so it's an overflow, which is undefined behaviour.
Storing an unsigned value that doesn't overflow is OK.

> can a
> language lawyer explain to me why GCC chose to eliminate code
> pertaining to cnt considering it as dead-code ?

cnt is initialized to -2 (after an overflow) and then you decrement it
so it gets more negative.  The "&& cnt" condition will never be false,
because cnt starts non-zero and gets further from zero, so will never
reach zero.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crazy compiler optimization
  2013-10-09  9:54 ` Jonathan Wakely
@ 2013-10-09 10:02   ` vijay nag
  2013-10-09 10:16     ` Jonathan Wakely
  2013-10-09 15:40     ` David Brown
  0 siblings, 2 replies; 7+ messages in thread
From: vijay nag @ 2013-10-09 10:02 UTC (permalink / raw)
  To: Jonathan Wakely; +Cc: gcc-help

On Wed, Oct 9, 2013 at 3:24 PM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
> On 9 October 2013 10:36, vijay nag wrote:
>> Hello GCC,
>>
>> I'm facing a wierd compiler optimization problem. Consider the code
>> snippet below
>>
>> #include <stdio.h>
>>
>> int printChar(unsigned long cur_col, unsigned char c)
>> {
>>   char buf[256];
>>   char* bufp = buf;
>>   char cnt = sizeof(buf) - 2; /* overflow in implicit type conversion */
>>   unsigned long terminal_width = 500;
>>
>>   while ((cur_col++ < terminal_width) && cnt) {
>>       *bufp++ = c;
>>       cnt--;
>>   }
>
>
>> Basically the crash here is because of elimination of the check in the
>> if-clause "&& cnt" which is causing stack overrun and thereby SIGSEGV.
>> While standards may say that the behaviour is
>> undefined when an unsigned value is stored in a signed value,
>
> Standards do not say that. 254 cannot be presented in a char if char
> is a signed type, so it's an overflow, which is undefined behaviour.
> Storing an unsigned value that doesn't overflow is OK.
>
>> can a
>> language lawyer explain to me why GCC chose to eliminate code
>> pertaining to cnt considering it as dead-code ?
>
> cnt is initialized to -2 (after an overflow) and then you decrement it
> so it gets more negative.  The "&& cnt" condition will never be false,
> because cnt starts non-zero and gets further from zero, so will never
> reach zero.

Alright that is perfectly valid behaviour. Why does compiler consider
it to be a unsigned type at optimization level zero ? i.e. I see a
wrap around after
-128 to 128 ?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crazy compiler optimization
  2013-10-09 10:02   ` vijay nag
@ 2013-10-09 10:16     ` Jonathan Wakely
  2013-10-09 15:40     ` David Brown
  1 sibling, 0 replies; 7+ messages in thread
From: Jonathan Wakely @ 2013-10-09 10:16 UTC (permalink / raw)
  To: vijay nag; +Cc: gcc-help

On 9 October 2013 11:02, vijay nag wrote:
> Alright that is perfectly valid behaviour. Why does compiler consider
> it to be a unsigned type at optimization level zero ?

It doesn't.

> i.e. I see a
> wrap around after
> -128 to 128 ?

Because undefined behaviour.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crazy compiler optimization
  2013-10-09  9:36 Crazy compiler optimization vijay nag
  2013-10-09  9:54 ` Jonathan Wakely
@ 2013-10-09 10:18 ` Nicholas Mc Guire
  2013-10-09 17:48 ` Ian Lance Taylor
  2 siblings, 0 replies; 7+ messages in thread
From: Nicholas Mc Guire @ 2013-10-09 10:18 UTC (permalink / raw)
  To: vijay nag; +Cc: gcc-help

On Wed, 09 Oct 2013, vijay nag wrote:

> Hello GCC,
> 
> I'm facing a wierd compiler optimization problem. Consider the code
> snippet below
> 
> #include <stdio.h>
> 
> int printChar(unsigned long cur_col, unsigned char c)
> {
>   char buf[256];
>   char* bufp = buf;
>   char cnt = sizeof(buf) - 2; /* overflow in implicit type conversion */
>   unsigned long terminal_width = 500;
> 
>   while ((cur_col++ < terminal_width) && cnt) {
>       *bufp++ = c;
>       cnt--;
>   }
> 
>   *bufp++ = '\n';
>   *bufp = 0;
> 
>   printf("%c\n", buf[0]);
>   return 1;
> }
> 
> int main()
> {
>   printChar(80, '-');
>   return 1;
> }
> 
> While compiler optimization should guarantee that the result of
> execution is same at all optimization levels, I'm observing difference
> in the result of execution of the above program when optimized to
> different levels. Although there is fundamental problem with the
> statement "char cnt = sizeof(buf) - 2", GCC seems to be warning(that
> too only when -pedantic flag is used) about overflow error while
> silently discarding any code related to cnt i.e. the check "&& cnt" in
> the if-clause is silently discarded by the compiler at -O2.
> 
> $]gcc -g char.c -o char.c.unoptimized -m32 -O0 -Wall -Wextra -pedantic
> char.c: In function ?printChar?:
> char.c:8: warning: overflow in implicit constant conversion
>
This compiler optimization dependency is visible with quite a few code examples
that violate the C standard.
Integer overflow/underflow results in undefined behavior - you are in the
wild lands basically - you should not expect C-standard violations to result
in "reliable undefined" code.

See C99 Annex J.2 for details of undefined behaviors.

thx!
hofrat

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crazy compiler optimization
  2013-10-09 10:02   ` vijay nag
  2013-10-09 10:16     ` Jonathan Wakely
@ 2013-10-09 15:40     ` David Brown
  1 sibling, 0 replies; 7+ messages in thread
From: David Brown @ 2013-10-09 15:40 UTC (permalink / raw)
  To: vijay nag; +Cc: Jonathan Wakely, gcc-help

On 09/10/13 12:02, vijay nag wrote:
> On Wed, Oct 9, 2013 at 3:24 PM, Jonathan Wakely <jwakely.gcc@gmail.com> wrote:
>> On 9 October 2013 10:36, vijay nag wrote:
>>> Hello GCC,
>>>
>>> I'm facing a wierd compiler optimization problem. Consider the code
>>> snippet below
>>>
>>> #include <stdio.h>
>>>
>>> int printChar(unsigned long cur_col, unsigned char c)
>>> {
>>>   char buf[256];
>>>   char* bufp = buf;
>>>   char cnt = sizeof(buf) - 2; /* overflow in implicit type conversion */
>>>   unsigned long terminal_width = 500;
>>>
>>>   while ((cur_col++ < terminal_width) && cnt) {
>>>       *bufp++ = c;
>>>       cnt--;
>>>   }
>>
>>
>>> Basically the crash here is because of elimination of the check in the
>>> if-clause "&& cnt" which is causing stack overrun and thereby SIGSEGV.
>>> While standards may say that the behaviour is
>>> undefined when an unsigned value is stored in a signed value,
>>
>> Standards do not say that. 254 cannot be presented in a char if char
>> is a signed type, so it's an overflow, which is undefined behaviour.
>> Storing an unsigned value that doesn't overflow is OK.
>>
>>> can a
>>> language lawyer explain to me why GCC chose to eliminate code
>>> pertaining to cnt considering it as dead-code ?
>>
>> cnt is initialized to -2 (after an overflow) and then you decrement it
>> so it gets more negative.  The "&& cnt" condition will never be false,
>> because cnt starts non-zero and gets further from zero, so will never
>> reach zero.
> 
> Alright that is perfectly valid behaviour. Why does compiler consider
> it to be a unsigned type at optimization level zero ? i.e. I see a
> wrap around after
> -128 to 128 ?
> 

Without optimisation, the compiler generates simpler code without trying
to save time and space.  Thus it actually generates code to do the tests
and the decrement operation, rather than spending the effort "thinking"
about whether or not they are necessary.  It turns out that on your
system (and almost all modern systems), the simplistic machine code thus
generated has the effect you were looking for.

It's easy to correct the code by picking a more appropriate type for "cnt".

Incidentally, you should never assume that plain "char" is signed or
unsigned.  It varies between platforms, and can be changed by compiler
flags.  If you mean an "signed char", call it "signed char".  Far
better, of course, is to use <stdint.h> and call it "int8_t" or
"uint8_t" as appropriate, if you really want an 8-bit integer.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crazy compiler optimization
  2013-10-09  9:36 Crazy compiler optimization vijay nag
  2013-10-09  9:54 ` Jonathan Wakely
  2013-10-09 10:18 ` Nicholas Mc Guire
@ 2013-10-09 17:48 ` Ian Lance Taylor
  2 siblings, 0 replies; 7+ messages in thread
From: Ian Lance Taylor @ 2013-10-09 17:48 UTC (permalink / raw)
  To: vijay nag; +Cc: gcc-help

On Wed, Oct 9, 2013 at 2:36 AM, vijay nag <vijunag@gmail.com> wrote:
>
> While compiler optimization should guarantee that the result of
> execution is same at all optimization levels,

That guarantee only holds for programs that fully conform to the
language standard and avoid all cases of undefined behaviour.

Ian

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2013-10-09 17:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-09  9:36 Crazy compiler optimization vijay nag
2013-10-09  9:54 ` Jonathan Wakely
2013-10-09 10:02   ` vijay nag
2013-10-09 10:16     ` Jonathan Wakely
2013-10-09 15:40     ` David Brown
2013-10-09 10:18 ` Nicholas Mc Guire
2013-10-09 17:48 ` Ian Lance Taylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).