public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/111011] New: gcc-13 incorrectly decrements by 2. It's twice as fast as gcc-12 and clang!
@ 2023-08-14  9:52 adam.warner.nz at gmail dot com
  2023-08-14 11:10 ` [Bug rtl-optimization/111011] " rguenth at gcc dot gnu.org
  0 siblings, 1 reply; 2+ messages in thread
From: adam.warner.nz at gmail dot com @ 2023-08-14  9:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111011

            Bug ID: 111011
           Summary: gcc-13 incorrectly decrements by 2. It's twice as fast
                    as gcc-12 and clang!
           Product: gcc
           Version: 13.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: adam.warner.nz at gmail dot com
  Target Milestone: ---

(Please fix my guess at the correct component for this bug report)

I'm amused by a ghost in the GCC virtual machine. I'm running this code on a
Debian Linux x86-64 desktop with these software versions:

gcc-12 (Debian 12.3.0-7) 12.3.0
gcc-13 (Debian 13.2.0-2) 13.2.0
gcc (Debian 20230718-1) 14.0.0 20230718 (experimental) [master
r14-2597-g6bab2772dbc]
Debian clang version 17.0.0 (++20230128060150+75153adeda1a-1~exp1)

My CPU is locked at 2.7GHz. It should take a nice round 10 seconds to decrement
2.7x10^10 to zero if each decrement takes one clock cycle.

And indeed it used to:

$ cat countdown.c
#include <stdint.h>

int main() {
  int64_t count=27000000000;
  while (count>0) {
    __asm__ __volatile__("" : : : "memory");
    --count;
  }
  return 0;
}
$ gcc-12 -O3 countdown.c && time ./a.out 

real    0m10.029s
user    0m10.024s
sys     0m0.004s
$ clang-17 -O3 countdown.c && time ./a.out 

real    0m10.032s
user    0m10.030s
sys     0m0.000s


But now it only takes 5 seconds:
$ gcc-13 -O3 countdown.c && time ./a.out 

real    0m5.022s
user    0m5.021s
sys     0m0.001s
$ gcc-snapshot.sh -O3 countdown.c && time ./a.out 

real    0m5.023s
user    0m5.022s
sys     0m0.000s

By disassembling the machine code we can clearly see why:
$ gcc-13 -O3 countdown.c && objdump -d -m i386:x86-64:intel a.out
...
0000000000001040 <main>:
    1040:       48 b8 00 4e 53 49 06    movabs rax,0x649534e00
    1047:       00 00 00 
    104a:       66 0f 1f 44 00 00       nop    WORD PTR [rax+rax*1+0x0]
    1050:       48 83 e8 02             sub    rax,0x2
    1054:       75 fa                   jne    1050 <main+0x10>
    1056:       31 c0                   xor    eax,eax
    1058:       c3                      ret
    1059:       0f 1f 80 00 00 00 00    nop    DWORD PTR [rax+0x0]
...

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Bug rtl-optimization/111011] gcc-13 incorrectly decrements by 2. It's twice as fast as gcc-12 and clang!
  2023-08-14  9:52 [Bug rtl-optimization/111011] New: gcc-13 incorrectly decrements by 2. It's twice as fast as gcc-12 and clang! adam.warner.nz at gmail dot com
@ 2023-08-14 11:10 ` rguenth at gcc dot gnu.org
  0 siblings, 0 replies; 2+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-08-14 11:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111011

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
There's nothing wrong, we unroll the loop.

> ./cc1 -quiet t.c -O3 -fopt-info
t.c:5:15: optimized: loop unrolled 1 times

adding "# foo" to the asm text you'll see

.L2:
#APP
# 6 "t.c" 1
        # foo
# 0 "" 2
# 6 "t.c" 1
        # foo
# 0 "" 2
#NO_APP
        subq    $2, %rax
        jne     .L2

there's no data dependence with 'count' for the asm.  You can instead use

#include <stdint.h>

int main() {
  int64_t count=27000000000;
  while (count>0) {
    __asm__ __volatile__("" : "=g" (count) : "0" (count) : "memory");
    --count;
  }
  return 0;
}

to get the desired effect.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-08-14 11:10 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-14  9:52 [Bug rtl-optimization/111011] New: gcc-13 incorrectly decrements by 2. It's twice as fast as gcc-12 and clang! adam.warner.nz at gmail dot com
2023-08-14 11:10 ` [Bug rtl-optimization/111011] " rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).