[Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/98856] [11 Regression] botan AES-128/XTS is slower by ~17% since r11-6649-g285fa338b06b804e72997c4d876ecf08a9c083af
Date: Fri, 05 Mar 2021 07:44:24 +0000	[thread overview]
Message-ID: <bug-98856-4-qoLZGifejo@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-98856-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98856

--- Comment #22 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Uroš Bizjak from comment #21)
> (In reply to Uroš Bizjak from comment #20)
> > (In reply to Richard Biener from comment #18)
> > > Even on Skylake it's 2 (movq) + 3 (vpinsr), so there it's 6 vs. 3.  Not
> > > sure if we should somehow do this late somehow (peephole or splitter) since
> > > it requires one more %xmm register.
> > What happens if you disparage [v]pinsrd alternatives in vec_concatv2di?
> 
> Please try this:
> 
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index db5be59f5b7..edf7b1a3074 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -16043,7 +16043,12 @@
>               (const_string "maybe_evex")
>            ]
>            (const_string "orig")))
> -   (set_attr "mode" "TI,TI,TI,TI,TI,TI,V4SF,V2SF,V2SF")])
> +   (set_attr "mode" "TI,TI,TI,TI,TI,TI,V4SF,V2SF,V2SF")
> +   (set (attr "preferred_for_speed")
> +     (cond [(eq_attr "alternative" "0,1,2,3")
> +             (symbol_ref "false")
> +          ]
> +          (symbol_ref "true")))])
>  
>  (define_insn "*vec_concatv2di_0"

That works to avoid the vpinsrq.  I guess the case of a mem operand
behaves similar to a gpr (plus the load uop), at least I don't have any
contrary evidence (but I didn't do any microbenchmarks either).

I'm not sure IRA/LRA will optimally handle the situation with register
pressure causing spilling in case it needs to reload both gpr operands.
At least for

typedef long v2di __attribute__((vector_size(16)));

v2di foo (long a, long b)
{
  return (v2di){a, b};
}

with -msse4.1 -O3 -ffixed-xmm1 -ffixed-xmm2 -ffixed-xmm3 -ffixed-xmm4
-ffixed-xmm5 -ffixed-xmm6 -ffixed-xmm7 -ffixed-xmm8 -ffixed-xmm9 -ffixed-xmm10
-ffixed-xmm11 -ffixed-xmm12 -ffixed-xmm13 -ffixed-xmm14 -ffixed-xmm15 I get
with the
patch

foo:
.LFB0:
        .cfi_startproc
        movq    %rsi, -16(%rsp)
        movq    %rdi, %xmm0
        pinsrq  $1, -16(%rsp), %xmm0
        ret

while without it's

        movq    %rdi, %xmm0
        pinsrq  $1, %rsi, %xmm0

as far as I understand LRA dumps the new attribute is a hard one, even
applying when other alternatives are worse.  In this case we choose
alt 7.  Covering also alts 7 and 8 with the optimize-for-speed attribute
causes reload fails - which is expected if there's no way for LRA to
choose alt 1.  The following seems to work for the small testcase above
but not for the important case in the benchmark (meh).

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index db5be59f5b7..e393a0d823b 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15992,7 +15992,7 @@
          (match_operand:DI 1 "register_operand"
          "  0, 0,x ,Yv,0,Yv,0,0,v")
          (match_operand:DI 2 "nonimmediate_operand"
-         " rm,rm,rm,rm,x,Yv,x,m,m")))]
+         " !rm,!rm,!rm,!rm,x,Yv,x,!m,!m")))]
   "TARGET_SSE"
   "@
    pinsrq\t{$1, %2, %0|%0, %2, 1}

I guess the idea of this insn setup was exactly to get IRA/LRA choose
the optimal instruction sequence - otherwise exposing the reload so
late is probably suboptimal.

next prev parent reply	other threads:[~2021-03-05  7:44 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-27 14:28 [Bug tree-optimization/98856] New: " marxin at gcc dot gnu.org
2021-01-27 14:29 ` [Bug tree-optimization/98856] " marxin at gcc dot gnu.org
2021-01-27 14:44 ` rguenth at gcc dot gnu.org
2021-01-28  7:47 ` rguenth at gcc dot gnu.org
2021-01-28  8:44 ` marxin at gcc dot gnu.org
2021-01-28  9:40 ` rguenth at gcc dot gnu.org
2021-01-28 11:03 ` rguenth at gcc dot gnu.org
2021-01-28 11:19 ` rguenth at gcc dot gnu.org
2021-01-28 11:57 ` rguenth at gcc dot gnu.org
2021-02-05 10:18 ` rguenth at gcc dot gnu.org
2021-02-05 11:52 ` jakub at gcc dot gnu.org
2021-02-05 12:52 ` rguenth at gcc dot gnu.org
2021-02-05 13:43 ` jakub at gcc dot gnu.org
2021-02-05 14:36 ` jakub at gcc dot gnu.org
2021-02-05 16:29 ` jakub at gcc dot gnu.org
2021-02-05 17:55 ` jakub at gcc dot gnu.org
2021-02-05 19:48 ` jakub at gcc dot gnu.org
2021-02-08 15:14 ` jakub at gcc dot gnu.org
2021-03-04 12:14 ` rguenth at gcc dot gnu.org
2021-03-04 15:36 ` rguenth at gcc dot gnu.org
2021-03-04 16:12 ` rguenth at gcc dot gnu.org
2021-03-04 17:56 ` ubizjak at gmail dot com
2021-03-04 18:12 ` ubizjak at gmail dot com
2021-03-05  7:44 ` rguenth at gcc dot gnu.org [this message]
2021-03-05  7:46 ` rguenth at gcc dot gnu.org
2021-03-05  8:29 ` ubizjak at gmail dot com
2021-03-05 10:04 ` rguenther at suse dot de
2021-03-05 10:43 ` rguenth at gcc dot gnu.org
2021-03-05 11:56 ` ubizjak at gmail dot com
2021-03-05 12:25 ` ubizjak at gmail dot com
2021-03-05 12:27 ` rguenth at gcc dot gnu.org
2021-03-05 12:49 ` jakub at gcc dot gnu.org
2021-03-05 12:52 ` ubizjak at gmail dot com
2021-03-05 12:55 ` rguenther at suse dot de
2021-03-05 13:06 ` rguenth at gcc dot gnu.org
2021-03-05 13:08 ` ubizjak at gmail dot com
2021-03-05 14:35 ` rguenth at gcc dot gnu.org
2021-03-08 10:41 ` rguenth at gcc dot gnu.org
2021-03-08 13:20 ` rguenth at gcc dot gnu.org
2021-03-08 15:46 ` amonakov at gcc dot gnu.org
2021-04-27 11:40 ` [Bug tree-optimization/98856] [11/12 " jakub at gcc dot gnu.org
2021-05-13 10:17 ` cvs-commit at gcc dot gnu.org
2021-07-28  7:05 ` rguenth at gcc dot gnu.org
2022-01-21 13:20 ` rguenth at gcc dot gnu.org
2022-04-21  7:48 ` rguenth at gcc dot gnu.org
2023-04-17 21:43 ` [Bug tree-optimization/98856] [11/12/13/14 " lukebenes at hotmail dot com
2023-04-18  9:07 ` rguenth at gcc dot gnu.org
2023-05-29 10:04 ` jakub at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-98856-4-qoLZGifejo@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).