public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/115024] New: 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform
@ 2024-05-10 7:12 colin.king at intel dot com
2024-05-10 7:16 ` [Bug target/115024] " colin.king at intel dot com
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: colin.king at intel dot com @ 2024-05-10 7:12 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115024
Bug ID: 115024
Summary: 128 bit division performance regression, x86, between
gcc-14 and gcc-13 using target clones on skylake
platform
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: colin.king at intel dot com
Target Milestone: ---
Created attachment 58158
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58158&action=edit
reproducer source code for __int128_t division regression
I'm seeing a 5% performance regression in gcc-14 compared to gcc-13, using gcc
on Ubuntu 24.04:
Versions:
gcc version 13.2.0 (Ubuntu 13.2.0-23ubuntu4)
gcc version 14.0.1 20240412 (experimental) [master r14-9935-g67e1433a94f]
(Ubuntu 14-20240412-0ubuntu1)
cking@skylake:~$ CFLAGS="" gcc-13 -O2 reproducer-div128.c
cking@skylake:~$ ./a.out
1650.83 div128 ops per sec
cking@skylake:~$ CFLAGS="" gcc-14 -O2 reproducer-div128.c
cking@skylake:~$ ./a.out
1567.48 div128 ops per sec
The original issue appeared when regression testing stress-ng cpu div128
stressor [1]. I've managed to extract the attached reproducer from the original
code (see attached).
Salient point to focus on:
1. The issue is also dependant on the TARGET_CLONES macro being defined as
__attribute__((target_clones("avx,default"))) - the avx target clones seems to
be an issue in reproducing this problem.
Attached are the reproducer C source and disassembled object code.
References: [1]
https://github.com/ColinIanKing/stress-ng/blob/master/stress-cpu.c
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/115024] 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform
2024-05-10 7:12 [Bug target/115024] New: 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform colin.king at intel dot com
@ 2024-05-10 7:16 ` colin.king at intel dot com
2024-05-10 7:17 ` colin.king at intel dot com
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: colin.king at intel dot com @ 2024-05-10 7:16 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115024
--- Comment #1 from Colin Ian King <colin.king at intel dot com> ---
Created attachment 58159
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58159&action=edit
gcc-13 disassembly
gcc-13 disassembly
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/115024] 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform
2024-05-10 7:12 [Bug target/115024] New: 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform colin.king at intel dot com
2024-05-10 7:16 ` [Bug target/115024] " colin.king at intel dot com
@ 2024-05-10 7:17 ` colin.king at intel dot com
2024-05-10 7:45 ` colin.king at intel dot com
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: colin.king at intel dot com @ 2024-05-10 7:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115024
--- Comment #2 from Colin Ian King <colin.king at intel dot com> ---
Created attachment 58160
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58160&action=edit
gcc-14 disassembly
gcc-14 disassembly
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/115024] 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform
2024-05-10 7:12 [Bug target/115024] New: 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform colin.king at intel dot com
2024-05-10 7:16 ` [Bug target/115024] " colin.king at intel dot com
2024-05-10 7:17 ` colin.king at intel dot com
@ 2024-05-10 7:45 ` colin.king at intel dot com
2024-05-10 7:46 ` colin.king at intel dot com
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: colin.king at intel dot com @ 2024-05-10 7:45 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115024
--- Comment #3 from Colin Ian King <colin.king at intel dot com> ---
Created attachment 58161
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58161&action=edit
perf output for gcc-13 compiled code
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/115024] 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform
2024-05-10 7:12 [Bug target/115024] New: 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform colin.king at intel dot com
` (2 preceding siblings ...)
2024-05-10 7:45 ` colin.king at intel dot com
@ 2024-05-10 7:46 ` colin.king at intel dot com
2024-05-16 1:49 ` [Bug target/115024] [14/15 regression] " sjames at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: colin.king at intel dot com @ 2024-05-10 7:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115024
--- Comment #4 from Colin Ian King <colin.king at intel dot com> ---
Created attachment 58162
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58162&action=edit
perf output for gcc-14 compiled code
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/115024] [14/15 regression] 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform
2024-05-10 7:12 [Bug target/115024] New: 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform colin.king at intel dot com
` (3 preceding siblings ...)
2024-05-10 7:46 ` colin.king at intel dot com
@ 2024-05-16 1:49 ` sjames at gcc dot gnu.org
2024-05-20 8:38 ` haochen.jiang at intel dot com
2024-06-05 6:25 ` haochen.jiang at intel dot com
6 siblings, 0 replies; 8+ messages in thread
From: sjames at gcc dot gnu.org @ 2024-05-16 1:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115024
Sam James <sjames at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target Milestone|--- |14.2
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/115024] [14/15 regression] 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform
2024-05-10 7:12 [Bug target/115024] New: 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform colin.king at intel dot com
` (4 preceding siblings ...)
2024-05-16 1:49 ` [Bug target/115024] [14/15 regression] " sjames at gcc dot gnu.org
@ 2024-05-20 8:38 ` haochen.jiang at intel dot com
2024-06-05 6:25 ` haochen.jiang at intel dot com
6 siblings, 0 replies; 8+ messages in thread
From: haochen.jiang at intel dot com @ 2024-05-20 8:38 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115024
Haochen Jiang <haochen.jiang at intel dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |haochen.jiang at intel dot com
--- Comment #5 from Haochen Jiang <haochen.jiang at intel dot com> ---
From my test, trunk only has <1% regression if I calculated right.
[haochenj@shgcc101 ~]$ ./13.exe
1240.97 div128 ops per sec
[haochenj@shgcc101 ~]$ ./13.exe
1235.78 div128 ops per sec
[haochenj@shgcc101 ~]$ ./13.exe
1236.95 div128 ops per sec
[haochenj@shgcc101 ~]$ ./trunk.exe
1228.43 div128 ops per sec
[haochenj@shgcc101 ~]$ ./trunk.exe
1227.11 div128 ops per sec
[haochenj@shgcc101 ~]$ ./trunk.exe
1225.42 div128 ops per sec
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/115024] [14/15 regression] 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform
2024-05-10 7:12 [Bug target/115024] New: 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform colin.king at intel dot com
` (5 preceding siblings ...)
2024-05-20 8:38 ` haochen.jiang at intel dot com
@ 2024-06-05 6:25 ` haochen.jiang at intel dot com
6 siblings, 0 replies; 8+ messages in thread
From: haochen.jiang at intel dot com @ 2024-06-05 6:25 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115024
--- Comment #6 from Haochen Jiang <haochen.jiang at intel dot com> ---
I have got a machine to reproduce the regression.
Seem like a DSB miss from my data, but don't know why. Need more investigation.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-06-05 6:25 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-10 7:12 [Bug target/115024] New: 128 bit division performance regression, x86, between gcc-14 and gcc-13 using target clones on skylake platform colin.king at intel dot com
2024-05-10 7:16 ` [Bug target/115024] " colin.king at intel dot com
2024-05-10 7:17 ` colin.king at intel dot com
2024-05-10 7:45 ` colin.king at intel dot com
2024-05-10 7:46 ` colin.king at intel dot com
2024-05-16 1:49 ` [Bug target/115024] [14/15 regression] " sjames at gcc dot gnu.org
2024-05-20 8:38 ` haochen.jiang at intel dot com
2024-06-05 6:25 ` haochen.jiang at intel dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).