public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
* [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4 @ 2024-01-04 16:59 aros at gmx dot com 2024-01-04 17:06 ` [Bug target/113236] " aros at gmx dot com ` (2 more replies) 0 siblings, 3 replies; 4+ messages in thread From: aros at gmx dot com @ 2024-01-04 16:59 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236 Bug ID: 113236 Summary: WebP benchmark is 20% slower vs. Clang on AMD Zen 4 Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: aros at gmx dot com Target Milestone: --- According to Phoronix Test Suite WebP 1.2.4 is 20% slower when built with GCC 13.2/GCC git snapshot vs Clang: https://www.phoronix.com/review/gcc-clang-eoy2023/4 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4 2024-01-04 16:59 [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4 aros at gmx dot com @ 2024-01-04 17:06 ` aros at gmx dot com 2024-01-05 21:29 ` hubicka at gcc dot gnu.org 2024-04-24 15:41 ` hubicka at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: aros at gmx dot com @ 2024-01-04 17:06 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236 --- Comment #1 from Artem S. Tashkinov <aros at gmx dot com> --- That's WebP image encode, Quality 100, highest compression. Also applies to MTL: https://www.phoronix.com/review/intel-meteorlake-gcc-clang/3 ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4 2024-01-04 16:59 [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4 aros at gmx dot com 2024-01-04 17:06 ` [Bug target/113236] " aros at gmx dot com @ 2024-01-05 21:29 ` hubicka at gcc dot gnu.org 2024-04-24 15:41 ` hubicka at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: hubicka at gcc dot gnu.org @ 2024-01-05 21:29 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236 Jan Hubicka <hubicka at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Last reconfirmed| |2024-01-05 CC| |hubicka at gcc dot gnu.org Status|UNCONFIRMED |NEW --- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> --- On zen3 I get 0.75MP/s for GCC and 0.80MP/s for clang, so only 6.6%, but seems reproducible. Profile looks comparable: gcc 30.96% cwebp libwebp.so.7.1.5 [.] GetCombinedEntropyUnre 26.19% cwebp libwebp.so.7.1.5 [.] VP8LHashChainFill 3.34% cwebp libwebp.so.7.1.5 [.] CalculateBestCacheSize 3.30% cwebp libwebp.so.7.1.5 [.] CombinedShannonEntropy 3.21% cwebp libwebp.so.7.1.5 [.] CollectColorBlueTransf clang: 34.06% cwebp libwebp.so.7.1.5 [.] GetCombinedEntropy 28.95% cwebp libwebp.so.7.1.5 [.] VP8LHashChainFill 5.37% cwebp libwebp.so.7.1.5 [.] VP8LGetBackwardReferences 4.39% cwebp libwebp.so.7.1.5 [.] CombinedShannonEntropy_SS 4.28% cwebp libwebp.so.7.1.5 [.] CollectColorBlueTransform In the first loop clang seems to ifconvert while GCC doesn't: 0.59 │ lea kSLog2Table,%rdi 3.69 │ vmovss (%rdi,%rax,4),%xmm0 0.98 │ 6f: vcvtsi2ss %edx,%xmm2,%xmm1 0.63 │ vfnmadd213ss 0x0(%r13),%xmm0,%xmm1 38.16 │ vmovss %xmm1,0x0(%r13) 5.48 │ cmp %r12d,0xc(%r13) 0.06 │ ↓ jae 89 │ mov %r12d,0xc(%r13) 0.99 │ 89: mov 0x4(%r13),%edi 0.96 │ 8d: xor %eax,%eax 0.40 │ test %r12d,%r12d 0.60 │ setne %al │ vcvtsd2ss %xmm0,%xmm0,%xmm1 0.02 │362: mov %r15d,%eax 0.57 │ imul %r12d,%eax 0.00 │ cmp %r12d,%r9d 0.03 │ cmovbe %r12d,%r9d 0.02 │ vmovd %eax,%xmm0 0.08 │ vpinsrd $0x1,%r15d,%xmm0,%xmm0 1.50 │ vpaddd %xmm0,%xmm4,%xmm4 1.08 │ vcvtsi2ss %r15d,%xmm5,%xmm0 0.87 │ vfnmadd231ss %xmm0,%xmm1,%xmm3 5.40 │ vmovaps %xmm3,%xmm0 0.02 │38c: xor %eax,%eax 0.16 │ cmp $0x4,%r15d ^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4 2024-01-04 16:59 [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4 aros at gmx dot com 2024-01-04 17:06 ` [Bug target/113236] " aros at gmx dot com 2024-01-05 21:29 ` hubicka at gcc dot gnu.org @ 2024-04-24 15:41 ` hubicka at gcc dot gnu.org 2 siblings, 0 replies; 4+ messages in thread From: hubicka at gcc dot gnu.org @ 2024-04-24 15:41 UTC (permalink / raw) To: gcc-bugs https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236 --- Comment #3 from Jan Hubicka <hubicka at gcc dot gnu.org> --- Seems this perofmance difference is still there on zen4 https://www.phoronix.com/review/gcc14-clang18-amd-zen4/3 ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-04-24 15:41 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-01-04 16:59 [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4 aros at gmx dot com 2024-01-04 17:06 ` [Bug target/113236] " aros at gmx dot com 2024-01-05 21:29 ` hubicka at gcc dot gnu.org 2024-04-24 15:41 ` hubicka at gcc dot gnu.org
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).