public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4
@ 2024-01-04 16:59 aros at gmx dot com
2024-01-04 17:06 ` [Bug target/113236] " aros at gmx dot com
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: aros at gmx dot com @ 2024-01-04 16:59 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236
Bug ID: 113236
Summary: WebP benchmark is 20% slower vs. Clang on AMD Zen 4
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: aros at gmx dot com
Target Milestone: ---
According to Phoronix Test Suite WebP 1.2.4 is 20% slower when built with GCC
13.2/GCC git snapshot vs Clang:
https://www.phoronix.com/review/gcc-clang-eoy2023/4
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4
2024-01-04 16:59 [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4 aros at gmx dot com
@ 2024-01-04 17:06 ` aros at gmx dot com
2024-01-05 21:29 ` hubicka at gcc dot gnu.org
2024-04-24 15:41 ` hubicka at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: aros at gmx dot com @ 2024-01-04 17:06 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236
--- Comment #1 from Artem S. Tashkinov <aros at gmx dot com> ---
That's WebP image encode, Quality 100, highest compression.
Also applies to MTL:
https://www.phoronix.com/review/intel-meteorlake-gcc-clang/3
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4
2024-01-04 16:59 [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4 aros at gmx dot com
2024-01-04 17:06 ` [Bug target/113236] " aros at gmx dot com
@ 2024-01-05 21:29 ` hubicka at gcc dot gnu.org
2024-04-24 15:41 ` hubicka at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: hubicka at gcc dot gnu.org @ 2024-01-05 21:29 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236
Jan Hubicka <hubicka at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Last reconfirmed| |2024-01-05
CC| |hubicka at gcc dot gnu.org
Status|UNCONFIRMED |NEW
--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
On zen3 I get 0.75MP/s for GCC and 0.80MP/s for clang, so only 6.6%, but seems
reproducible.
Profile looks comparable:
gcc
30.96% cwebp libwebp.so.7.1.5 [.]
GetCombinedEntropyUnre
26.19% cwebp libwebp.so.7.1.5 [.] VP8LHashChainFill
3.34% cwebp libwebp.so.7.1.5 [.]
CalculateBestCacheSize
3.30% cwebp libwebp.so.7.1.5 [.]
CombinedShannonEntropy
3.21% cwebp libwebp.so.7.1.5 [.]
CollectColorBlueTransf
clang:
34.06% cwebp libwebp.so.7.1.5 [.] GetCombinedEntropy
28.95% cwebp libwebp.so.7.1.5 [.] VP8LHashChainFill
5.37% cwebp libwebp.so.7.1.5 [.]
VP8LGetBackwardReferences
4.39% cwebp libwebp.so.7.1.5 [.]
CombinedShannonEntropy_SS
4.28% cwebp libwebp.so.7.1.5 [.]
CollectColorBlueTransform
In the first loop clang seems to ifconvert while GCC doesn't:
0.59 │ lea kSLog2Table,%rdi
3.69 │ vmovss (%rdi,%rax,4),%xmm0
0.98 │ 6f: vcvtsi2ss %edx,%xmm2,%xmm1
0.63 │ vfnmadd213ss 0x0(%r13),%xmm0,%xmm1
38.16 │ vmovss %xmm1,0x0(%r13)
5.48 │ cmp %r12d,0xc(%r13)
0.06 │ ↓ jae 89
│ mov %r12d,0xc(%r13)
0.99 │ 89: mov 0x4(%r13),%edi
0.96 │ 8d: xor %eax,%eax
0.40 │ test %r12d,%r12d
0.60 │ setne %al
│ vcvtsd2ss %xmm0,%xmm0,%xmm1
0.02 │362: mov %r15d,%eax
0.57 │ imul %r12d,%eax
0.00 │ cmp %r12d,%r9d
0.03 │ cmovbe %r12d,%r9d
0.02 │ vmovd %eax,%xmm0
0.08 │ vpinsrd $0x1,%r15d,%xmm0,%xmm0
1.50 │ vpaddd %xmm0,%xmm4,%xmm4
1.08 │ vcvtsi2ss %r15d,%xmm5,%xmm0
0.87 │ vfnmadd231ss %xmm0,%xmm1,%xmm3
5.40 │ vmovaps %xmm3,%xmm0
0.02 │38c: xor %eax,%eax
0.16 │ cmp $0x4,%r15d
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4
2024-01-04 16:59 [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4 aros at gmx dot com
2024-01-04 17:06 ` [Bug target/113236] " aros at gmx dot com
2024-01-05 21:29 ` hubicka at gcc dot gnu.org
@ 2024-04-24 15:41 ` hubicka at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: hubicka at gcc dot gnu.org @ 2024-04-24 15:41 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236
--- Comment #3 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Seems this perofmance difference is still there on zen4
https://www.phoronix.com/review/gcc14-clang18-amd-zen4/3
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-04-24 15:41 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-04 16:59 [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4 aros at gmx dot com
2024-01-04 17:06 ` [Bug target/113236] " aros at gmx dot com
2024-01-05 21:29 ` hubicka at gcc dot gnu.org
2024-04-24 15:41 ` hubicka at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).