[Bug target/113235] SMHasher SHA3-256 benchmark is almost 40% slower vs. Clang

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "hubicka at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/113235] SMHasher SHA3-256 benchmark is almost 40% slower vs. Clang
Date: Fri, 05 Jan 2024 19:54:32 +0000	[thread overview]
Message-ID: <bug-113235-4-e8104KIlWu@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-113235-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113235

Jan Hubicka <hubicka at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu.org

--- Comment #4 from Jan Hubicka <hubicka at gcc dot gnu.org> ---
I keep mentioning to Larabel that he should use -fno-semantic-interposition,
but he doesn't.

Profile is very simple:

 96.75%  SMHasher                                        [.] keccakf.lto_priv.0
  ◆

All goes to simple loop. On Zen3 gcc 13 -march=native -Ofast -flto I get:

  3.85 │330:   mov    %r8,%rdi                                                  
  7.68 │       movslq (%rsi,%r9,1),%rcx                                         
  3.85 │       lea    (%rax,%rcx,8),%r10                                        
  3.86 │       mov    (%rdx,%r9,1),%ecx                                         
  3.83 │       add    $0x4,%r9                                                  
  3.86 │       mov    (%r10),%r8                                                
  7.37 │       rol    %cl,%rdi                                                  
  7.37 │       mov    %rdi,(%r10)                                               
  4.76 │       cmp    $0x60,%r9                                                 
  0.00 │     ↑ jne    330                                                       


Clang seems to unroll it:

 0.25 │ d0:   mov  -0x48(%rsp),%rdx                                            
  ▒
  0.25 │       xor  %r12,%rcx                                                  
   ▒
  0.25 │       mov  %r13,%r12                                                  
   ▒
  0.25 │       mov  %r13,0x10(%rsp)                                            
   ▒
  0.25 │       mov  %rax,%r13                                                  
   ◆
  0.26 │       xor  %r15,%r13                                                  
   ▒
  0.23 │       mov  %r11,-0x70(%rsp)                                           
   ▒
  0.25 │       mov  %r8,0x8(%rsp)                                              
   ▒
  0.25 │       mov  %r15,-0x40(%rsp)                                           
   ▒
  0.25 │       mov  %r10,%r15                                                  
   ▒
  0.26 │       mov  %r10,(%rsp)                                                
   ▒
  0.26 │       mov  %r14,%r10                                                  
   ▒
  0.25 │       xor  %r12,%r10                                                  
   ▒
  0.26 │       xor  %rsi,%r15                                                  
   ▒
  0.24 │       mov  %rbp,-0x80(%rsp)                                           
   ▒
  0.25 │       xor  %rcx,%r15                                                  
   ▒
  0.26 │       mov  -0x60(%rsp),%rcx                                           
   ▒
  0.25 │       xor  -0x68(%rsp),%r15                                           
   ▒
  0.26 │       xor  %rbp,%rdx                                                  
   ▒
  0.25 │       mov  -0x30(%rsp),%rbp                                           
   ▒
  0.25 │       xor  %rdx,%r13                                                  
   ▒
  0.24 │       mov  -0x10(%rsp),%rdx                                           
   ▒
  0.25 │       mov  %rcx,%r12                                                  
   ▒
  0.24 │       xor  %rcx,%r13                                                  
   ▒
  0.25 │       mov  $0x1,%ecx                                                  
   ▒
  0.25 │       xor  %r11,%rdx                                                  
   ▒
  0.24 │       mov  %r8,%r11                                                   
   ▒
  0.25 │       mov  -0x28(%rsp),%r8                                            
   ▒
  0.26 │       xor  -0x58(%rsp),%r8                                            
   ▒
  0.24 │       xor  %rdx,%r8                                                   
   ▒
  0.26 │       mov  -0x8(%rsp),%rdx                                            
   ▒
  0.25 │       xor  %rbp,%r8                                                   
   ▒
  0.26 │       xor  %r11,%rdx                                                  
   ▒
  0.25 │       mov  -0x20(%rsp),%r11                                           
   ▒
  0.25 │       xor  %rdx,%r10                                                  
   ▒....

next prev parent reply	other threads:[~2024-01-05 19:54 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-04 16:53 [Bug rtl-optimization/113235] New: SMHasher SHA3-256 benchmark is almost 40% slower vs. Clang on AMD Zen 4 aros at gmx dot com
2024-01-04 17:05 ` [Bug target/113235] " aros at gmx dot com
2024-01-04 17:09 ` xry111 at gcc dot gnu.org
2024-01-04 17:27 ` [Bug target/113235] SMHasher SHA3-256 benchmark is almost 40% slower vs. Clang xry111 at gcc dot gnu.org
2024-01-05 19:54 ` hubicka at gcc dot gnu.org [this message]
2024-01-05 20:26 ` [Bug target/113235] SMHasher SHA3-256 benchmark is almost 40% slower vs. Clang (not enough complete loop peeling) hubicka at gcc dot gnu.org
2024-01-05 21:03 ` hubicka at gcc dot gnu.org
2024-01-08 14:54 ` rguenth at gcc dot gnu.org
2024-01-08 14:55 ` rguenth at gcc dot gnu.org
2024-04-24 16:02 ` hubicka at gcc dot gnu.org
2024-04-24 16:41 ` dmalcolm at gcc dot gnu.org
2024-04-24 16:44 ` pinskia at gcc dot gnu.org
2024-04-24 16:47 ` pinskia at gcc dot gnu.org
2024-04-24 16:51 ` xry111 at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-113235-4-e8104KIlWu@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).