public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/107548] New: STV doesn't consider vec_select
@ 2022-11-07 10:25 rguenth at gcc dot gnu.org
  2022-12-23  9:58 ` [Bug target/107548] " cvs-commit at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-11-07 10:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107548

            Bug ID: 107548
           Summary: STV doesn't consider vec_select
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

typedef unsigned int v4si __attribute__((vector_size(16)));

unsigned f (v4si a, v4si b)
{
  a[0] += b[0];
  return a[0] + a[1];
}

gets optimized to

f:
.LFB0:
        .cfi_startproc
        vpextrd $1, %xmm0, %edx
        vmovd   %xmm0, %eax
        addl    %edx, %eax
        vmovd   %xmm1, %edx
        addl    %edx, %eax
        ret

with znver2 arch, but similar with others while it seems to be beneficial
to shuffle a[1] to a'[0] and perform the add on the vector side eliding
two xmm->gpr moves.  STV2 sees

   19: r94:V4SI=xmm0:V4SI
      REG_DEAD xmm0:V4SI
    2: r87:V4SI=r94:V4SI
      REG_DEAD r94:V4SI
   20: r95:V4SI=xmm1:V4SI
      REG_DEAD xmm1:V4SI
    3: NOTE_INSN_DELETED
    4: NOTE_INSN_FUNCTION_BEG
    7: r90:SI=vec_select(r87:V4SI,parallel)
    8: r91:SI=vec_select(r87:V4SI,parallel)
      REG_DEAD r87:V4SI
    9: {r92:SI=r90:SI+r91:SI;clobber flags:CC;}
      REG_DEAD r91:SI
      REG_DEAD r90:SI
      REG_UNUSED flags:CC
   10: r93:SI=vec_select(r95:V4SI,parallel)
      REG_DEAD r95:V4SI
   11: {r89:SI=r92:SI+r93:SI;clobber flags:CC;}
      REG_DEAD r93:SI
      REG_DEAD r92:SI
      REG_UNUSED flags:CC
   16: ax:SI=r89:SI
      REG_DEAD r89:SI
   17: use ax:SI

but it lacks vec_select support:

Created a new instruction chain #1
Building chain #1...
  Adding insn 9 to chain #1
  Adding insn 11 into chain's #1 queue
  r90 def in insn 7 isn't convertible
  Mark r90 def in insn 7 as requiring both modes in chain #1
  r91 def in insn 8 isn't convertible
  Mark r91 def in insn 8 as requiring both modes in chain #1
  Adding insn 11 to chain #1
  r89 use in insn 16 isn't convertible
  Mark r89 def in insn 11 as requiring both modes in chain #1
  r93 def in insn 10 isn't convertible
  Mark r93 def in insn 10 as requiring both modes in chain #1
Collected chain #1...
  insns: 9, 11
  defs to convert: r89, r90, r91, r93
Computing gain for chain #1...
  Instruction conversion gain: 0
  Registers conversion cost: 24
  Total gain: -24
Chain #1 conversion is not profitable

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/107548] STV doesn't consider vec_select
  2022-11-07 10:25 [Bug target/107548] New: STV doesn't consider vec_select rguenth at gcc dot gnu.org
@ 2022-12-23  9:58 ` cvs-commit at gcc dot gnu.org
  2022-12-24 22:09 ` cvs-commit at gcc dot gnu.org
  2022-12-26 13:27 ` roger at nextmovesoftware dot com
  2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-12-23  9:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107548

--- Comment #1 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:0b2c1369d035e92847cca81fd9f7b4e9ab9da710

commit r13-4873-g0b2c1369d035e92847cca81fd9f7b4e9ab9da710
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Fri Dec 23 09:56:30 2022 +0000

    PR target/107548: Handle vec_select in STV on x86.

    This patch enhances x86's STV pass to handle VEC_SELECT during general
    scalar chain conversion, performing SImode scalar extraction from V4SI
    and DImode scalar extraction from V2DI in vector registers.

    The motivating test case from bugzilla is:

    typedef unsigned int v4si __attribute__((vector_size(16)));

    unsigned int f (v4si a, v4si b)
    {
      a[0] += b[0];
      return a[0] + a[1];
    }

    currently with -O2 -march=znver2 this generates:

            vpextrd $1, %xmm0, %edx
            vmovd   %xmm0, %eax
            addl    %edx, %eax
            vmovd   %xmm1, %edx
            addl    %edx, %eax
            ret

    which performs three transfers from the vector unit to the scalar unit,
    and performs the two additions there.  With this patch, we now generate:

            vmovdqa %xmm0, %xmm2
            vpshufd $85, %xmm0, %xmm0
            vpaddd  %xmm0, %xmm2, %xmm0
            vpaddd  %xmm1, %xmm0, %xmm0
            vmovd   %xmm0, %eax
            ret

    which performs the two additions in the vector unit, and then transfers
    the result to the scalar unit.  Technically the (cheap) movdqa isn't
    needed with better register allocation (or this could be cleaned up
    during peephole2), but even so this transform is still a win.

    2022-12-23  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/ChangeLog
            PR target/107548
            * config/i386/i386-features.cc (scalar_chain::add_insn): The
            operands of a VEC_SELECT don't need to added to the scalar chain.
            (general_scalar_chain::compute_convert_gain) <case VEC_SELECT>:
            Provide gains for performing STV on a VEC_SELECT.
            (general_scalar_chain::convert_insn): Convert VEC_SELECT to pshufd,
            psrldq or no-op.
            (general_scalar_to_vector_candidate_p): Handle VEC_SELECT of a
            single element from a vector register to a scalar register.

    gcc/testsuite/ChangeLog
            PR target/107548
            * gcc.target/i386/pr107548-1.c: New test V4SI case.
            * gcc.target/i386/pr107548-2.c: New test V2DI case.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/107548] STV doesn't consider vec_select
  2022-11-07 10:25 [Bug target/107548] New: STV doesn't consider vec_select rguenth at gcc dot gnu.org
  2022-12-23  9:58 ` [Bug target/107548] " cvs-commit at gcc dot gnu.org
@ 2022-12-24 22:09 ` cvs-commit at gcc dot gnu.org
  2022-12-26 13:27 ` roger at nextmovesoftware dot com
  2 siblings, 0 replies; 4+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-12-24 22:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107548

--- Comment #2 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Roger Sayle <sayle@gcc.gnu.org>:

https://gcc.gnu.org/g:3cf6d0e1830231dd47740e66926499db600b9ae4

commit r13-4886-g3cf6d0e1830231dd47740e66926499db600b9ae4
Author: Roger Sayle <roger@nextmovesoftware.com>
Date:   Sat Dec 24 22:07:11 2022 +0000

    [Committed] Tweak new gcc.target/i386/pr107548-1.c for -march=cascadelake.

    My recently added testcases gcc.target/i386/pr107548-[12].c need to be
    tweaked slightly for -march=cascadelake.  Committed as obvious.

    2022-12-24  Roger Sayle  <roger@nextmovesoftware.com>

    gcc/testsuite/ChangeLog
            PR target/107548
            * gcc.target/i386/pr107548-1.c: Match both vmovd and movd.
            * gcc.target/i386/pr107548-2.c: Match both vpaddq and paddq.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/107548] STV doesn't consider vec_select
  2022-11-07 10:25 [Bug target/107548] New: STV doesn't consider vec_select rguenth at gcc dot gnu.org
  2022-12-23  9:58 ` [Bug target/107548] " cvs-commit at gcc dot gnu.org
  2022-12-24 22:09 ` cvs-commit at gcc dot gnu.org
@ 2022-12-26 13:27 ` roger at nextmovesoftware dot com
  2 siblings, 0 replies; 4+ messages in thread
From: roger at nextmovesoftware dot com @ 2022-12-26 13:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107548

Roger Sayle <roger at nextmovesoftware dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED
                 CC|                            |roger at nextmovesoftware dot com
   Target Milestone|---                         |13.0

--- Comment #3 from Roger Sayle <roger at nextmovesoftware dot com> ---
This should now be fixed/implemented on mainline.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-12-26 13:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-07 10:25 [Bug target/107548] New: STV doesn't consider vec_select rguenth at gcc dot gnu.org
2022-12-23  9:58 ` [Bug target/107548] " cvs-commit at gcc dot gnu.org
2022-12-24 22:09 ` cvs-commit at gcc dot gnu.org
2022-12-26 13:27 ` roger at nextmovesoftware dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).