Re: [x86_64 PATCH] More TImode parameter passing improvements.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Uros Bizjak <ubizjak@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [x86_64 PATCH] More TImode parameter passing improvements.
Date: Thu, 20 Jul 2023 09:49:17 +0200	[thread overview]
Message-ID: <CAFULd4ZiRNRHeCHua0hYdrcDHjR0iwXFfzKc3AMy8xFMzm4OVQ@mail.gmail.com> (raw)
In-Reply-To: <002501d9badd$f5675930$e0360b90$@nextmovesoftware.com>

On Thu, Jul 20, 2023 at 9:44 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Hi Uros,
>
> > From: Uros Bizjak <ubizjak@gmail.com>
> > Sent: 20 July 2023 07:50
> >
> > On Wed, Jul 19, 2023 at 10:07 PM Roger Sayle <roger@nextmovesoftware.com>
> > wrote:
> > >
> > > This patch is the next piece of a solution to the x86_64 ABI issues in
> > > PR 88873.  This splits the *concat<mode><dwi>3_3 define_insn_and_split
> > > into two patterns, a TARGET_64BIT *concatditi3_3 and a !TARGET_64BIT
> > > *concatsidi3_3.  This allows us to add an additional alternative to
> > > the the 64-bit version, enabling the register allocator to perform
> > > this operation using SSE registers, which is implemented/split after
> > > reload using vec_concatv2di.
> > >
> > > To demonstrate the improvement, the test case from PR88873:
> > >
> > > typedef struct { double x, y; } s_t;
> > >
> > > s_t foo (s_t a, s_t b, s_t c)
> > > {
> > >   return (s_t){ __builtin_fma(a.x, b.x, c.x), __builtin_fma (a.y, b.y,
> > > c.y) }; }
> > >
> > > when compiled with -O2 -march=cascadelake, currently generates:
> > >
> > > foo:    vmovq   %xmm2, -56(%rsp)
> > >         movq    -56(%rsp), %rax
> > >         vmovq   %xmm3, -48(%rsp)
> > >         vmovq   %xmm4, -40(%rsp)
> > >         movq    -48(%rsp), %rcx
> > >         vmovq   %xmm5, -32(%rsp)
> > >         vmovq   %rax, %xmm6
> > >         movq    -40(%rsp), %rax
> > >         movq    -32(%rsp), %rsi
> > >         vpinsrq $1, %rcx, %xmm6, %xmm6
> > >         vmovq   %xmm0, -24(%rsp)
> > >         vmovq   %rax, %xmm7
> > >         vmovq   %xmm1, -16(%rsp)
> > >         vmovapd %xmm6, %xmm2
> > >         vpinsrq $1, %rsi, %xmm7, %xmm7
> > >         vfmadd132pd     -24(%rsp), %xmm7, %xmm2
> > >         vmovapd %xmm2, -56(%rsp)
> > >         vmovsd  -48(%rsp), %xmm1
> > >         vmovsd  -56(%rsp), %xmm0
> > >         ret
> > >
> > > with this change, we avoid many of the reloads via memory,
> > >
> > > foo:    vpunpcklqdq     %xmm3, %xmm2, %xmm7
> > >         vpunpcklqdq     %xmm1, %xmm0, %xmm6
> > >         vpunpcklqdq     %xmm5, %xmm4, %xmm2
> > >         vmovdqa %xmm7, -24(%rsp)
> > >         vmovdqa %xmm6, %xmm1
> > >         movq    -16(%rsp), %rax
> > >         vpinsrq $1, %rax, %xmm7, %xmm4
> > >         vmovapd %xmm4, %xmm6
> > >         vfmadd132pd     %xmm1, %xmm2, %xmm6
> > >         vmovapd %xmm6, -24(%rsp)
> > >         vmovsd  -16(%rsp), %xmm1
> > >         vmovsd  -24(%rsp), %xmm0
> > >         ret
> > >
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > > and make -k check, both with and without --target_board=unix{-m32}
> > > with no new failures.  Ok for mainline?
> > >
> > >
> > > 2023-07-19  Roger Sayle  <roger@nextmovesoftware.com>
> > >
> > > gcc/ChangeLog
> > >         * config/i386/i386-expand.cc (ix86_expand_move): Don't call
> > >         force_reg, to use SUBREG rather than create a new pseudo when
> > >         inserting DFmode fields into TImode with insvti_{high,low}part.
> > >         (*concat<mode><dwi>3_3): Split into two define_insn_and_split...
> > >         (*concatditi3_3): 64-bit implementation.  Provide alternative
> > >         that allows register allocation to use SSE registers that is
> > >         split into vec_concatv2di after reload.
> > >         (*concatsidi3_3): 32-bit implementation.
> > >
> > > gcc/testsuite/ChangeLog
> > >         * gcc.target/i386/pr88873.c: New test case.
> >
> > diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
> > index f9b0dc6..9c3febe 100644
> > --- a/gcc/config/i386/i386-expand.cc
> > +++ b/gcc/config/i386/i386-expand.cc
> > @@ -558,7 +558,7 @@ ix86_expand_move (machine_mode mode, rtx
> > operands[])
> >    op0 = SUBREG_REG (op0);
> >    tmp = gen_rtx_AND (TImode, copy_rtx (op0), tmp);
> >    if (mode == DFmode)
> > -    op1 = force_reg (DImode, gen_lowpart (DImode, op1));
> > +    op1 = gen_lowpart (DImode, op1);
> >
> > Please note that gen_lowpart will ICE when op1 is a SUBREG. This is the reason
> > that we need to first force a SUBREG to a register and then perform gen_lowpart,
> > and it is necessary to avoid ICE.
>
> The good news is that we know op1 is a register, as this is tested by
> "&& REG_P (op1)" on line 551.  You'll also notice that I'm not removing
> the force_reg from before the call to gen_lowpart, but removing the call
> to force_reg after the call to gen_lowpart.  When I originally wrote this,
> the hope was that placing this SUBREG in its own pseudo would help
> with register allocation/CSE.  Unfortunately, increasing the number of
> pseudos (in this case) increases compile-time (due to quadratic behaviour
> in LRA), as shown by PR rtl-optimization/110587, and keeping the DF->DI
> conversion in a SUBREG inside the insvti_{high,low}part allows the
> register allocator to see the DF->DI->TI sequence in a single pattern,
> and hence choose to keep the TI mode in SSE registers, rather than use
> a pair of reloads, to write the DF value to memory, then read it back as
> a scalar in DImode, and perhaps the same again to go the other way.

This was my only concern with the patch, with that cleared, the patch is OK.

Thanks,
Uros.

     prev parent reply	other threads:[~2023-07-20  7:49 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-19 20:07 Roger Sayle
2023-07-20  6:49 ` Uros Bizjak
2023-07-20  7:44   ` Roger Sayle
2023-07-20  7:49     ` Uros Bizjak [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFULd4ZiRNRHeCHua0hYdrcDHjR0iwXFfzKc3AMy8xFMzm4OVQ@mail.gmail.com \
    --to=ubizjak@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=roger@nextmovesoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).