public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/115071] New: performance regression, x86, between gcc-14 and gcc-13 using -O3 and _Pragma("GCC unroll 4") on skylake
@ 2024-05-13 14:31 colin.king at intel dot com
  2024-05-13 14:34 ` [Bug target/115071] " colin.king at intel dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: colin.king at intel dot com @ 2024-05-13 14:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115071

            Bug ID: 115071
           Summary: performance regression, x86, between gcc-14 and gcc-13
                    using -O3 and _Pragma("GCC unroll 4") on skylake
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: colin.king at intel dot com
  Target Milestone: ---

Created attachment 58191
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58191&action=edit
reproducer source code

I'm seeing a ~15% performance regression in gcc-14 compared to gcc-13, using
gcc on Ubuntu 24.04:

Versions:
gcc version 13.2.0 (Ubuntu 13.2.0-23ubuntu4) 
gcc version 14.0.1 20240412 (experimental) [master r14-9935-g67e1433a94f]
(Ubuntu 14-20240412-0ubuntu1) 

cking@skylake:~$ gcc-13 reproducer-bitonicsort.c -O2
cking@skylake:~$ ./a.out 
duration: 5.71 seconds, count = 1119566602

cking@skylake:~$ gcc-14 reproducer-bitonicsort.c -O2
cking@skylake:~$ ./a.out 
duration: 6.56 seconds, count = 1119566602

The original issue appeared when regression testing stress-ng bitonic sorting
stressor [1]. I've managed to extract the attached reproducer from the original
code (see attached).

Salient point to focus on:

1. The issue is also dependant on the use of _Pragma("GCC unroll 4")
2. The issue is also dependant on the use of __attribute__((optimize("-O3")))
by use of the OPTIMIZE3 macro in the example.

Attached are the reproducer C source and disassembled object code. 

References:
[1] https://github.com/ColinIanKing/stress-ng/blob/master/stress-bitonicsort.c

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/115071] performance regression, x86, between gcc-14 and gcc-13 using -O3 and _Pragma("GCC unroll 4") on skylake
  2024-05-13 14:31 [Bug target/115071] New: performance regression, x86, between gcc-14 and gcc-13 using -O3 and _Pragma("GCC unroll 4") on skylake colin.king at intel dot com
@ 2024-05-13 14:34 ` colin.king at intel dot com
  2024-05-13 14:34 ` colin.king at intel dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: colin.king at intel dot com @ 2024-05-13 14:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115071

--- Comment #1 from Colin Ian King <colin.king at intel dot com> ---
Created attachment 58192
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58192&action=edit
gcc-13 disassembly

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/115071] performance regression, x86, between gcc-14 and gcc-13 using -O3 and _Pragma("GCC unroll 4") on skylake
  2024-05-13 14:31 [Bug target/115071] New: performance regression, x86, between gcc-14 and gcc-13 using -O3 and _Pragma("GCC unroll 4") on skylake colin.king at intel dot com
  2024-05-13 14:34 ` [Bug target/115071] " colin.king at intel dot com
@ 2024-05-13 14:34 ` colin.king at intel dot com
  2024-05-15  6:38 ` haochen.jiang at intel dot com
  2024-05-16  1:49 ` [Bug target/115071] [14/15 regression] " sjames at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: colin.king at intel dot com @ 2024-05-13 14:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115071

--- Comment #2 from Colin Ian King <colin.king at intel dot com> ---
Created attachment 58193
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58193&action=edit
gcc-14 disassembly

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/115071] performance regression, x86, between gcc-14 and gcc-13 using -O3 and _Pragma("GCC unroll 4") on skylake
  2024-05-13 14:31 [Bug target/115071] New: performance regression, x86, between gcc-14 and gcc-13 using -O3 and _Pragma("GCC unroll 4") on skylake colin.king at intel dot com
  2024-05-13 14:34 ` [Bug target/115071] " colin.king at intel dot com
  2024-05-13 14:34 ` colin.king at intel dot com
@ 2024-05-15  6:38 ` haochen.jiang at intel dot com
  2024-05-16  1:49 ` [Bug target/115071] [14/15 regression] " sjames at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: haochen.jiang at intel dot com @ 2024-05-15  6:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115071

Haochen Jiang <haochen.jiang at intel dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |haochen.jiang at intel dot com

--- Comment #3 from Haochen Jiang <haochen.jiang at intel dot com> ---
I could not reproduce the regression. For me, it is:
[haochenj@shgcc101 pr115071]$ ./13.exe
duration: 7.09 seconds, count = 1119566602
[haochenj@shgcc101 pr115071]$ ./trunk.exe
duration: 4.97 seconds, count = 1119566602

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/115071] [14/15 regression] performance regression, x86, between gcc-14 and gcc-13 using -O3 and _Pragma("GCC unroll 4") on skylake
  2024-05-13 14:31 [Bug target/115071] New: performance regression, x86, between gcc-14 and gcc-13 using -O3 and _Pragma("GCC unroll 4") on skylake colin.king at intel dot com
                   ` (2 preceding siblings ...)
  2024-05-15  6:38 ` haochen.jiang at intel dot com
@ 2024-05-16  1:49 ` sjames at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: sjames at gcc dot gnu.org @ 2024-05-16  1:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115071

Sam James <sjames at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |14.2

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-05-16  1:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-13 14:31 [Bug target/115071] New: performance regression, x86, between gcc-14 and gcc-13 using -O3 and _Pragma("GCC unroll 4") on skylake colin.king at intel dot com
2024-05-13 14:34 ` [Bug target/115071] " colin.king at intel dot com
2024-05-13 14:34 ` colin.king at intel dot com
2024-05-15  6:38 ` haochen.jiang at intel dot com
2024-05-16  1:49 ` [Bug target/115071] [14/15 regression] " sjames at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).