public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4
@ 2024-01-04 16:59 aros at gmx dot com
  2024-01-04 17:06 ` [Bug target/113236] " aros at gmx dot com
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: aros at gmx dot com @ 2024-01-04 16:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236

            Bug ID: 113236
           Summary: WebP benchmark is 20% slower vs. Clang on AMD Zen 4
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: aros at gmx dot com
  Target Milestone: ---

According to Phoronix Test Suite WebP 1.2.4 is 20% slower when built with GCC
13.2/GCC git snapshot vs Clang:

https://www.phoronix.com/review/gcc-clang-eoy2023/4

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4
  2024-01-04 16:59 [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4 aros at gmx dot com
@ 2024-01-04 17:06 ` aros at gmx dot com
  2024-01-05 21:29 ` hubicka at gcc dot gnu.org
  2024-04-24 15:41 ` hubicka at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: aros at gmx dot com @ 2024-01-04 17:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236

--- Comment #1 from Artem S. Tashkinov <aros at gmx dot com> ---
That's WebP image encode, Quality 100, highest compression.

Also applies to MTL:
https://www.phoronix.com/review/intel-meteorlake-gcc-clang/3

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4
  2024-01-04 16:59 [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4 aros at gmx dot com
  2024-01-04 17:06 ` [Bug target/113236] " aros at gmx dot com
@ 2024-01-05 21:29 ` hubicka at gcc dot gnu.org
  2024-04-24 15:41 ` hubicka at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: hubicka at gcc dot gnu.org @ 2024-01-05 21:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2024-01-05
                 CC|                            |hubicka at gcc dot gnu.org
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
On zen3 I get 0.75MP/s for GCC and 0.80MP/s for clang, so only 6.6%, but seems
reproducible.

Profile looks comparable:

gcc
  30.96%  cwebp            libwebp.so.7.1.5               [.]
GetCombinedEntropyUnre
  26.19%  cwebp            libwebp.so.7.1.5               [.] VP8LHashChainFill 
   3.34%  cwebp            libwebp.so.7.1.5               [.]
CalculateBestCacheSize
   3.30%  cwebp            libwebp.so.7.1.5               [.]
CombinedShannonEntropy
   3.21%  cwebp            libwebp.so.7.1.5               [.]
CollectColorBlueTransf

clang:

  34.06%  cwebp            libwebp.so.7.1.5            [.] GetCombinedEntropy   
  28.95%  cwebp            libwebp.so.7.1.5            [.] VP8LHashChainFill    
   5.37%  cwebp            libwebp.so.7.1.5            [.]
VP8LGetBackwardReferences
   4.39%  cwebp            libwebp.so.7.1.5            [.]
CombinedShannonEntropy_SS
   4.28%  cwebp            libwebp.so.7.1.5            [.]
CollectColorBlueTransform


In the first loop clang seems to ifconvert while GCC doesn't:
  0.59 │       lea          kSLog2Table,%rdi
  3.69 │       vmovss       (%rdi,%rax,4),%xmm0
  0.98 │ 6f:   vcvtsi2ss    %edx,%xmm2,%xmm1
  0.63 │       vfnmadd213ss 0x0(%r13),%xmm0,%xmm1
 38.16 │       vmovss       %xmm1,0x0(%r13)
  5.48 │       cmp          %r12d,0xc(%r13)
  0.06 │     ↓ jae          89             
       │       mov          %r12d,0xc(%r13)
  0.99 │ 89:   mov          0x4(%r13),%edi 
  0.96 │ 8d:   xor          %eax,%eax      
  0.40 │       test         %r12d,%r12d    
  0.60 │       setne        %al                                                 



       │       vcvtsd2ss    %xmm0,%xmm0,%xmm1                                   
  0.02 │362:   mov          %r15d,%eax                                          
  0.57 │       imul         %r12d,%eax                                          
  0.00 │       cmp          %r12d,%r9d                                          
  0.03 │       cmovbe       %r12d,%r9d                                          
  0.02 │       vmovd        %eax,%xmm0                                          
  0.08 │       vpinsrd      $0x1,%r15d,%xmm0,%xmm0                              
  1.50 │       vpaddd       %xmm0,%xmm4,%xmm4                                   
  1.08 │       vcvtsi2ss    %r15d,%xmm5,%xmm0                                   
  0.87 │       vfnmadd231ss %xmm0,%xmm1,%xmm3                                   
  5.40 │       vmovaps      %xmm3,%xmm0                                         
  0.02 │38c:   xor          %eax,%eax                                           
  0.16 │       cmp          $0x4,%r15d

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/113236] WebP benchmark is 20% slower vs. Clang on AMD Zen 4
  2024-01-04 16:59 [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4 aros at gmx dot com
  2024-01-04 17:06 ` [Bug target/113236] " aros at gmx dot com
  2024-01-05 21:29 ` hubicka at gcc dot gnu.org
@ 2024-04-24 15:41 ` hubicka at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: hubicka at gcc dot gnu.org @ 2024-04-24 15:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113236

--- Comment #3 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
Seems this perofmance difference is still there on zen4
https://www.phoronix.com/review/gcc14-clang18-amd-zen4/3

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-04-24 15:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-04 16:59 [Bug rtl-optimization/113236] New: WebP benchmark is 20% slower vs. Clang on AMD Zen 4 aros at gmx dot com
2024-01-04 17:06 ` [Bug target/113236] " aros at gmx dot com
2024-01-05 21:29 ` hubicka at gcc dot gnu.org
2024-04-24 15:41 ` hubicka at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).