public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/97194] New: optimize vector element set/extract at variable position
@ 2020-09-24 14:26 rguenth at gcc dot gnu.org
  2020-09-24 14:27 ` [Bug target/97194] " rguenth at gcc dot gnu.org
                   ` (25 more replies)
  0 siblings, 26 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-09-24 14:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

            Bug ID: 97194
           Summary: optimize vector element set/extract at variable
                    position
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

#define N 32
typedef int T;
typedef T V __attribute__((vector_size(N)));
V set (V v, int idx, T val)
{
  v[idx] = val;
  return v;
}
T get (V v, int idx)
{
  return v[idx];
}

generates with -mavx2

set:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movslq  %edi, %rdi
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        andq    $-32, %rsp
        vmovdqa %ymm0, -32(%rsp)
        movl    %esi, -32(%rsp,%rdi,4)
        vmovdqa -32(%rsp), %ymm0
^^^ store forwarding fail
        leave
        .cfi_def_cfa 7, 8
        ret

get:
.LFB1:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movslq  %edi, %rdi
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        andq    $-32, %rsp
        vmovdqa %ymm0, -32(%rsp)
        movl    -32(%rsp,%rdi,4), %eax
        leave
        .cfi_def_cfa 7, 8
        ret

maybe not too bad.

Vary N and T to cover all types and vector sizes.

It should be possible to do the 'get' case via variable permutes
and the 'set' case via a splat of the value and a blend using
a mask generated from 'idx'.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
@ 2020-09-24 14:27 ` rguenth at gcc dot gnu.org
  2020-09-24 14:39 ` rguenth at gcc dot gnu.org
                   ` (24 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-09-24 14:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|                            |x86_64-*-* i?86-*-*
                 CC|                            |hjl.tools at gmail dot com
           Keywords|                            |missed-optimization

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Note I googled quite a bit but didn't find sth for the 'set' case.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
  2020-09-24 14:27 ` [Bug target/97194] " rguenth at gcc dot gnu.org
@ 2020-09-24 14:39 ` rguenth at gcc dot gnu.org
  2020-09-24 14:52 ` rguenth at gcc dot gnu.org
                   ` (23 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-09-24 14:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
So for set with T == int and N == 32 we could generate

        vmovd   %edi, %xmm1
        vpbroadcastd    %xmm1, %ymm1
        vpcmpeqd        .LC0(%rip), %ymm1, %ymm2
        vpblendvb       %ymm2, %ymm1, %ymm0, %ymm0
        ret

.LC0:
        .long   0
        .long   1
        .long   2
        .long   3
        .long   4
        .long   5
        .long   6
        .long   7

aka, with GCC generic vectors

V setg (V v, int idx, T val)
{
  V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
  V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv);
  v = (v & ~mask) | (valv & mask);
  return v;
}


There's ongoing patch iteration on the ml adding variable index vec_set
expanders for powerpc (and the related middle-end changes).  The question
is whether optabs can try many things or the target should have the choice
(probably better).

Eventually there's a more efficient way to generate {0, 1, 2, 3...}.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
  2020-09-24 14:27 ` [Bug target/97194] " rguenth at gcc dot gnu.org
  2020-09-24 14:39 ` rguenth at gcc dot gnu.org
@ 2020-09-24 14:52 ` rguenth at gcc dot gnu.org
  2020-09-25  6:47 ` rguenth at gcc dot gnu.org
                   ` (22 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-09-24 14:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> Eventually there's a more efficient way to generate {0, 1, 2, 3...}.

vpmovzx* could be at least used to only have a single
byte vector {0, ... 255 } in memory for all cases including V256QI.
Maybe also a separately useful constant pool optimization ...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2020-09-24 14:52 ` rguenth at gcc dot gnu.org
@ 2020-09-25  6:47 ` rguenth at gcc dot gnu.org
  2020-09-27  9:27 ` crazylht at gmail dot com
                   ` (21 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-09-25  6:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> So for set with T == int and N == 32 we could generate
> 
>         vmovd   %edi, %xmm1
>         vpbroadcastd    %xmm1, %ymm1
>         vpcmpeqd        .LC0(%rip), %ymm1, %ymm2
>         vpblendvb       %ymm2, %ymm1, %ymm0, %ymm0
>         ret
> 
> .LC0:
>         .long   0
>         .long   1
>         .long   2
>         .long   3
>         .long   4
>         .long   5
>         .long   6
>         .long   7
> 
> aka, with GCC generic vectors
> 
> V setg (V v, int idx, T val)
> {
>   V valv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
>   V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == valv);
>   v = (v & ~mask) | (valv & mask);
>   return v;
> }

Botched this up, corrected is

V setg (V v, int idx, T val)
{
  V idxv = (V){idx, idx, idx, idx, idx, idx, idx, idx};
  V valv = (V){val, val, val, val, val, val, val, val};
  V mask = ((V){0, 1, 2, 3, 4, 5, 6, 7} == idxv);
  v = (v & ~mask) | (valv & mask);
  return v;
}

which produces

        vmovd   %edi, %xmm1
        vmovd   %esi, %xmm2
        vpbroadcastd    %xmm1, %ymm1
        vpbroadcastd    %xmm2, %ymm2
        vpcmpeqd        .LC0(%rip), %ymm1, %ymm1
        vpblendvb       %ymm1, %ymm2, %ymm0, %ymm0

with AVX2, so one more vmovd/vpbroadcastd (as expected).  With -mavx512vl
this even becomes

        vpbroadcastd    %edi, %ymm1
        vpcmpd  $0, .LC0(%rip), %ymm1, %k1
        vpbroadcastd    %esi, %ymm0{%k1}

for the extract case we really need to compute a variable permute mask
which looks harder and possibly more expensive than the spill/load,
so the set case looks more important to tackle (tackling it will still
eventually improve initial RTL generation by avoiding stack assignments
for locals)

> There's ongoing patch iteration on the ml adding variable index vec_set
> expanders for powerpc (and the related middle-end changes).  The question
> is whether optabs can try many things or the target should have the choice
> (probably better).
> 
> Eventually there's a more efficient way to generate {0, 1, 2, 3...}.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2020-09-25  6:47 ` rguenth at gcc dot gnu.org
@ 2020-09-27  9:27 ` crazylht at gmail dot com
  2020-09-28  7:01 ` rguenth at gcc dot gnu.org
                   ` (20 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: crazylht at gmail dot com @ 2020-09-27  9:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---

> > There's ongoing patch iteration on the ml adding variable index vec_set
> > expanders for powerpc (and the related middle-end changes).  The question
> > is whether optabs can try many things or the target should have the choice
> > (probably better).
> > 

The patch has been on trunk, will you add expander?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2020-09-27  9:27 ` crazylht at gmail dot com
@ 2020-09-28  7:01 ` rguenth at gcc dot gnu.org
  2020-09-28  8:55 ` amonakov at gcc dot gnu.org
                   ` (19 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-09-28  7:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2020-09-28
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #5)
> > > There's ongoing patch iteration on the ml adding variable index vec_set
> > > expanders for powerpc (and the related middle-end changes).  The question
> > > is whether optabs can try many things or the target should have the choice
> > > (probably better).
> > > 
> 
> The patch has been on trunk, will you add expander?

I hope x86 maintainers will beat me to it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2020-09-28  7:01 ` rguenth at gcc dot gnu.org
@ 2020-09-28  8:55 ` amonakov at gcc dot gnu.org
  2020-09-28  9:20 ` rguenth at gcc dot gnu.org
                   ` (18 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: amonakov at gcc dot gnu.org @ 2020-09-28  8:55 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |amonakov at gcc dot gnu.org

--- Comment #7 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
FWIW, Peter Cordes provides an overview of available approaches for extraction
depending on vector length and ISA extensions (up to AVX2, not including
AVX-512) in this StackOverflow answer:
https://stackoverflow.com/a/51414330/4755075

TL;DR: generally through store+load; possible alternatives:
 128b:
  SSSE3: pshufb          (1-byte elements)
  SSSE3: imul+add+pshufb (any element size)
  AVX: vpermilp[sd] (4 or 8-byte elements)
 256b:
  AVX2: vpermps (4-byte elements)

In all cases a (v)movd is needed to move the index to a vector register, and
potentially another (v)movd if the result is needed in a general register.

The basic store+load tactic may look worse latency-wise, but can be better
throughput-wise (especially with multiple extractions from the same vector, as
then the store needs to be done just once, as Peter mentioned).

Why in RTL it is important to do this without referencing the stack?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2020-09-28  8:55 ` amonakov at gcc dot gnu.org
@ 2020-09-28  9:20 ` rguenth at gcc dot gnu.org
  2020-09-28 10:45 ` amonakov at gcc dot gnu.org
                   ` (17 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-09-28  9:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #7)
> FWIW, Peter Cordes provides an overview of available approaches for
> extraction depending on vector length and ISA extensions (up to AVX2, not
> including AVX-512) in this StackOverflow answer:
> https://stackoverflow.com/a/51414330/4755075
> 
> TL;DR: generally through store+load; possible alternatives:
>  128b:
>   SSSE3: pshufb          (1-byte elements)
>   SSSE3: imul+add+pshufb (any element size)
>   AVX: vpermilp[sd] (4 or 8-byte elements)
>  256b:
>   AVX2: vpermps (4-byte elements)
> 
> In all cases a (v)movd is needed to move the index to a vector register, and
> potentially another (v)movd if the result is needed in a general register.
> 
> The basic store+load tactic may look worse latency-wise, but can be better
> throughput-wise (especially with multiple extractions from the same vector,
> as then the store needs to be done just once, as Peter mentioned).
> 
> Why in RTL it is important to do this without referencing the stack?

For extraction it isn't absolutely required to do this w/o the stack
since the spill would cover the whole vector and the reads can be
usually handled with store-forwarding in the CPUs.  So here this
can be fully based on cost.

The insert case is instead very bad here with a whole-vector store
followed by an element store and then a whole-vector load.  This
sequence will usually cause at least additional latency or worse
recovering from a bad store-forwarding.

Note that currently RTL expansion forces a local vector typed variable
to the stack (instead of allocating a pseudo) when there are
variable-index accesses to it.  That might be a reason to also handle
slightly "expensive" extract cases.  But I guess later falling back
to a stack slot via a splitter or LRA will lead to worse code.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2020-09-28  9:20 ` rguenth at gcc dot gnu.org
@ 2020-09-28 10:45 ` amonakov at gcc dot gnu.org
  2020-09-28 11:10 ` rguenther at suse dot de
                   ` (16 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: amonakov at gcc dot gnu.org @ 2020-09-28 10:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #9 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #8)
> Note that currently RTL expansion forces a local vector typed variable
> to the stack (instead of allocating a pseudo) when there are
> variable-index accesses to it.  That might be a reason to also handle
> slightly "expensive" extract cases.  But I guess later falling back
> to a stack slot via a splitter or LRA will lead to worse code.

Indeed, but I struggle to see a good reason to bind the entire lifetime of a
variable to memory just because one operation requires that. Cannot GCC instead
create a fresh temporary early at RTL-expand (not split) time for each extract
operation, letting the original variable live in a pseudo, and binding only
that short-lived temporary to memory?

It can result in extra copies if the temporary needs to be loaded from memory
anyway, but I think passes like RTL CSE should be able to propagate them.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2020-09-28 10:45 ` amonakov at gcc dot gnu.org
@ 2020-09-28 11:10 ` rguenther at suse dot de
  2020-09-28 11:19 ` amonakov at gcc dot gnu.org
                   ` (15 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenther at suse dot de @ 2020-09-28 11:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #10 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 28 Sep 2020, amonakov at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194
> 
> --- Comment #9 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #8)
> > Note that currently RTL expansion forces a local vector typed variable
> > to the stack (instead of allocating a pseudo) when there are
> > variable-index accesses to it.  That might be a reason to also handle
> > slightly "expensive" extract cases.  But I guess later falling back
> > to a stack slot via a splitter or LRA will lead to worse code.
> 
> Indeed, but I struggle to see a good reason to bind the entire lifetime of a
> variable to memory just because one operation requires that. Cannot GCC instead
> create a fresh temporary early at RTL-expand (not split) time for each extract
> operation, letting the original variable live in a pseudo, and binding only
> that short-lived temporary to memory?
> 
> It can result in extra copies if the temporary needs to be loaded from memory
> anyway, but I think passes like RTL CSE should be able to propagate them.

Sure, that would be possible.  We do this kind of things for
CONCAT (complex numbers) with handling some select cases and falling
back to spilling.  But we've backed off for more general handling
because it ICEd too many cases.

For inserts we don't want to do this since I'm quite positive that
RTL wouldn't be able to merge two spill slots when doing two
consecutive inserts.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2020-09-28 11:10 ` rguenther at suse dot de
@ 2020-09-28 11:19 ` amonakov at gcc dot gnu.org
  2020-09-28 11:42 ` rguenth at gcc dot gnu.org
                   ` (14 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: amonakov at gcc dot gnu.org @ 2020-09-28 11:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #11 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
Yeah, for inserts such tactic would be inappropriate due to bad store
forwarding stalls anyway. As you've shown in earlier comments, inserts have a
very nice generic way to expand them (that does not touch stack).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2020-09-28 11:19 ` amonakov at gcc dot gnu.org
@ 2020-09-28 11:42 ` rguenth at gcc dot gnu.org
  2020-09-28 11:44 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-09-28 11:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #11)
> Yeah, for inserts such tactic would be inappropriate due to bad store
> forwarding stalls anyway. As you've shown in earlier comments, inserts have
> a very nice generic way to expand them (that does not touch stack).

Unfortunately it doesn't work (the CSE).  Patch:

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 1eaa1da11b9..f7b1a92dd95 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6102,7 +6102,11 @@ discover_nonconstant_array_refs_r (tree * tp, int
*walk_subtrees,
             || CONVERT_EXPR_P (t))
        t = TREE_OPERAND (t, 0);

-      if (TREE_CODE (t) == ARRAY_REF || TREE_CODE (t) == ARRAY_RANGE_REF)
+      if ((TREE_CODE (t) == ARRAY_REF
+          && !(TREE_CODE (TREE_OPERAND (t, 0)) == VIEW_CONVERT_EXPR
+               && DECL_P (TREE_OPERAND (TREE_OPERAND (t, 0), 0)))
+               && VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (TREE_OPERAND (t, 0),
0))))
+          || TREE_CODE (t) == ARRAY_RANGE_REF)
        {
          t = get_base_address (t);
          if (t && DECL_P (t)


and for

typedef int v4si __attribute__((vector_size(16)));

int foo (v4si v, int i)
{
  v = v + v;
  return v[i] + v[2*i];
}

at -O2 we get

foo:
.LFB0:
        .cfi_startproc
        leal    (%rdi,%rdi), %edx
        paddd   %xmm0, %xmm0
        movslq  %edi, %rdi
        movslq  %edx, %rdx
        movaps  %xmm0, -24(%rsp)
        movaps  %xmm0, -40(%rsp)
        movl    -40(%rsp,%rdi,4), %eax
        addl    -24(%rsp,%rdx,4), %eax
        ret

we likely also not get rid of the stack allocation.  Maybe it's due to the
way expand does the temporary spill, not ending its lifetime, not sure.
We're definitely not "remembering" the spill slot used for 'v' and do
not re-use it, there's no mechanism for that IIRC.

At least we don't ICE for the specific case of vectors.  We're running into

        /* If we have either an offset, a BLKmode result, or a reference
           outside the underlying object, we must force it to memory.
           Such a case can occur in Ada if we have unchecked conversion
           of an expression from a scalar type to an aggregate type or
           for an ARRAY_RANGE_REF whose type is BLKmode, or if we were
           passed a partially uninitialized object or a view-conversion
           to a larger size.  */
        must_force_mem = (offset
                          || mode1 == BLKmode
                          || (mode == BLKmode
                              && !int_mode_for_size (bitsize, 1).exists ())
                          || maybe_gt (bitpos + bitsize,
                                       GET_MODE_BITSIZE (mode2)));

where 'offset' is MULT_EXPR and we've sofar expanded 'v' to op0 = (reg/v:V4SI
88 [ v ])
and then

        /* Otherwise, if this is a constant or the object is not in memory
           and need be, put it there.  */
        else if (CONSTANT_P (op0) || (!MEM_P (op0) && must_force_mem))
          {
            memloc = assign_temp (TREE_TYPE (tem), 1, 1);
            emit_move_insn (memloc, op0);
            op0 = memloc;
            clear_mem_expr = true;
          }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2020-09-28 11:42 ` rguenth at gcc dot gnu.org
@ 2020-09-28 11:44 ` rguenth at gcc dot gnu.org
  2020-09-28 13:20 ` amonakov at gcc dot gnu.org
                   ` (12 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-09-28 11:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #13 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #12)
> (In reply to Alexander Monakov from comment #11)
> > Yeah, for inserts such tactic would be inappropriate due to bad store
> > forwarding stalls anyway. As you've shown in earlier comments, inserts have
> > a very nice generic way to expand them (that does not touch stack).
> 
> Unfortunately it doesn't work (the CSE).  Patch:
> 
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 1eaa1da11b9..f7b1a92dd95 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -6102,7 +6102,11 @@ discover_nonconstant_array_refs_r (tree * tp, int
> *walk_subtrees,
>              || CONVERT_EXPR_P (t))
>         t = TREE_OPERAND (t, 0);
>  
> -      if (TREE_CODE (t) == ARRAY_REF || TREE_CODE (t) == ARRAY_RANGE_REF)
> +      if ((TREE_CODE (t) == ARRAY_REF
> +          && !(TREE_CODE (TREE_OPERAND (t, 0)) == VIEW_CONVERT_EXPR
> +               && DECL_P (TREE_OPERAND (TREE_OPERAND (t, 0), 0)))
> +               && VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (TREE_OPERAND (t,
> 0), 0))))
> +          || TREE_CODE (t) == ARRAY_RANGE_REF)
>         {
>           t = get_base_address (t);
>           if (t && DECL_P (t)
> 
> 
> and for
> 
> typedef int v4si __attribute__((vector_size(16)));
> 
> int foo (v4si v, int i)
> {
>   v = v + v;
>   return v[i] + v[2*i];
> }
> 
> at -O2 we get
> 
> foo:
> .LFB0:
>         .cfi_startproc
>         leal    (%rdi,%rdi), %edx
>         paddd   %xmm0, %xmm0
>         movslq  %edi, %rdi
>         movslq  %edx, %rdx
>         movaps  %xmm0, -24(%rsp)
>         movaps  %xmm0, -40(%rsp)
>         movl    -40(%rsp,%rdi,4), %eax
>         addl    -24(%rsp,%rdx,4), %eax
>         ret

and unpatched

foo:
.LFB0:
        .cfi_startproc
        leal    (%rdi,%rdi), %edx
        paddd   %xmm0, %xmm0
        movslq  %edi, %rdi
        movslq  %edx, %rdx
        movaps  %xmm0, -24(%rsp)
        movl    -24(%rsp,%rdi,4), %eax
        addl    -24(%rsp,%rdx,4), %eax
        ret

so we're able to elide the stack slot usage for the add and retain a single
slot.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2020-09-28 11:44 ` rguenth at gcc dot gnu.org
@ 2020-09-28 13:20 ` amonakov at gcc dot gnu.org
  2020-10-16  7:17 ` crazylht at gmail dot com
                   ` (11 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: amonakov at gcc dot gnu.org @ 2020-09-28 13:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #14 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
I see, there are more weaknesses than I thought. For CSE (or rather fwprop?) I
was thinking about a simpler case where the extracted-from value is loaded from
memory, but even in trivial cases RTL optimizers cannot clean it up today (so
it wouldn't get any better with separate temporaries):

#define N 16
typedef int T;
typedef T V __attribute__((vector_size(N)));
T f(V *px, long i)
{
    V x = *px;
    return x[i];
}

f:
        movdqa  (%rdi), %xmm0
        movaps  %xmm0, -24(%rsp)
        movl    -24(%rsp,%rsi,4), %eax
        ret

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2020-09-28 13:20 ` amonakov at gcc dot gnu.org
@ 2020-10-16  7:17 ` crazylht at gmail dot com
  2020-10-16  7:23 ` crazylht at gmail dot com
                   ` (10 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: crazylht at gmail dot com @ 2020-10-16  7:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #15 from Hongtao.liu <crazylht at gmail dot com> ---
I'm working on add the expander, i encounter a problem.

for V32HI vec_set with constant index, the expander existed under
TARGET_AVX512F, but for variable index, the expander should be existed under
TARGET_AVX512BW, since vpcmpw zmm only existed in TARGET_AVX512BW, we need to
restricted the expander under TARGET_AVX512BW, unfortunately operands is
unvisible in condition scope, any solution to handle such issue?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2020-10-16  7:17 ` crazylht at gmail dot com
@ 2020-10-16  7:23 ` crazylht at gmail dot com
  2020-10-16  7:31 ` rguenth at gcc dot gnu.org
                   ` (9 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: crazylht at gmail dot com @ 2020-10-16  7:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #16 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #15)
> I'm working on add the expander, i encounter a problem.
> 
> for V32HI vec_set with constant index, the expander existed under
> TARGET_AVX512F, but for variable index, the expander should be existed under
> TARGET_AVX512BW, since vpcmpw zmm only existed in TARGET_AVX512BW, we need
> to restricted the expander under TARGET_AVX512BW, unfortunately operands is
> unvisible in condition scope, any solution to handle such issue?

Or break V32HI into V16HI and V8HI when TARGET_AVX512BW is not existed.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2020-10-16  7:23 ` crazylht at gmail dot com
@ 2020-10-16  7:31 ` rguenth at gcc dot gnu.org
  2020-10-16  7:32 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-10-16  7:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #17 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #15)
> I'm working on add the expander, i encounter a problem.
> 
> for V32HI vec_set with constant index, the expander existed under
> TARGET_AVX512F, but for variable index, the expander should be existed under
> TARGET_AVX512BW, since vpcmpw zmm only existed in TARGET_AVX512BW, we need
> to restricted the expander under TARGET_AVX512BW, unfortunately operands is
> unvisible in condition scope, any solution to handle such issue?

can_vec_set_var_idx_p checks insn_operand_matches thus the operand predicate
is what you should adjust I think, sth like

 "const_int_or_reg_for_vec_set_operand"

or do you mean you do not see the mode of the vector inside the predicate?

In that case I think you need to split the pattern?  Or think of a way
to implement byte/word insert with just AVVX512F ...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2020-10-16  7:31 ` rguenth at gcc dot gnu.org
@ 2020-10-16  7:32 ` rguenth at gcc dot gnu.org
  2020-10-16  7:46 ` crazylht at gmail dot com
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-10-16  7:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #16)
> (In reply to Hongtao.liu from comment #15)
> > I'm working on add the expander, i encounter a problem.
> > 
> > for V32HI vec_set with constant index, the expander existed under
> > TARGET_AVX512F, but for variable index, the expander should be existed under
> > TARGET_AVX512BW, since vpcmpw zmm only existed in TARGET_AVX512BW, we need
> > to restricted the expander under TARGET_AVX512BW, unfortunately operands is
> > unvisible in condition scope, any solution to handle such issue?
> 
> Or break V32HI into V16HI and V8HI when TARGET_AVX512BW is not existed.

That sounds like a good implementation strathegy, I suppose AVX512F implies
AVX2 even when AVX512VL is not available.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2020-10-16  7:32 ` rguenth at gcc dot gnu.org
@ 2020-10-16  7:46 ` crazylht at gmail dot com
  2020-10-16  7:49 ` crazylht at gmail dot com
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: crazylht at gmail dot com @ 2020-10-16  7:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #19 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #17)
> (In reply to Hongtao.liu from comment #15)
> > I'm working on add the expander, i encounter a problem.
> > 
> > for V32HI vec_set with constant index, the expander existed under
> > TARGET_AVX512F, but for variable index, the expander should be existed under
> > TARGET_AVX512BW, since vpcmpw zmm only existed in TARGET_AVX512BW, we need
> > to restricted the expander under TARGET_AVX512BW, unfortunately operands is
> > unvisible in condition scope, any solution to handle such issue?
> 
> can_vec_set_var_idx_p checks insn_operand_matches thus the operand predicate
> is what you should adjust I think, sth like
> 
>  "const_int_or_reg_for_vec_set_operand"
> 
> or do you mean you do not see the mode of the vector inside the predicate?
> 

Need to sse vector mode inside the predicate.

> In that case I think you need to split the pattern?  Or think of a way
> to implement byte/word insert with just AVVX512F ...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (18 preceding siblings ...)
  2020-10-16  7:46 ` crazylht at gmail dot com
@ 2020-10-16  7:49 ` crazylht at gmail dot com
  2020-10-16  8:03 ` rguenther at suse dot de
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: crazylht at gmail dot com @ 2020-10-16  7:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #20 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Richard Biener from comment #18)
> (In reply to Hongtao.liu from comment #16)
> > (In reply to Hongtao.liu from comment #15)
> > > I'm working on add the expander, i encounter a problem.
> > > 
> > > for V32HI vec_set with constant index, the expander existed under
> > > TARGET_AVX512F, but for variable index, the expander should be existed under
> > > TARGET_AVX512BW, since vpcmpw zmm only existed in TARGET_AVX512BW, we need
> > > to restricted the expander under TARGET_AVX512BW, unfortunately operands is
> > > unvisible in condition scope, any solution to handle such issue?
> > 
> > Or break V32HI into V16HI and V8HI when TARGET_AVX512BW is not existed.
> 
> That sounds like a good implementation strathegy, I suppose AVX512F implies
> AVX2 even when AVX512VL is not available.

Yes, but not sure performance impact, maybe better generate code "like"
expander not existed.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (19 preceding siblings ...)
  2020-10-16  7:49 ` crazylht at gmail dot com
@ 2020-10-16  8:03 ` rguenther at suse dot de
  2020-11-17  3:31 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: rguenther at suse dot de @ 2020-10-16  8:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #21 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 16 Oct 2020, crazylht at gmail dot com wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194
> 
> --- Comment #20 from Hongtao.liu <crazylht at gmail dot com> ---
> (In reply to Richard Biener from comment #18)
> > (In reply to Hongtao.liu from comment #16)
> > > (In reply to Hongtao.liu from comment #15)
> > > > I'm working on add the expander, i encounter a problem.
> > > > 
> > > > for V32HI vec_set with constant index, the expander existed under
> > > > TARGET_AVX512F, but for variable index, the expander should be existed under
> > > > TARGET_AVX512BW, since vpcmpw zmm only existed in TARGET_AVX512BW, we need
> > > > to restricted the expander under TARGET_AVX512BW, unfortunately operands is
> > > > unvisible in condition scope, any solution to handle such issue?
> > > 
> > > Or break V32HI into V16HI and V8HI when TARGET_AVX512BW is not existed.
> > 
> > That sounds like a good implementation strathegy, I suppose AVX512F implies
> > AVX2 even when AVX512VL is not available.
> 
> Yes, but not sure performance impact, maybe better generate code "like"
> expander not existed.

I'm not so sure - the STLF penalty is quite large and the AVX2 code
should only require an extra vextract + vpcmp + vinsert (and the
extra vpcmp uses another constant pool entry).  The broadcasts
and blend only require AVX512F.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (20 preceding siblings ...)
  2020-10-16  8:03 ` rguenther at suse dot de
@ 2020-11-17  3:31 ` cvs-commit at gcc dot gnu.org
  2020-11-17  3:39 ` crazylht at gmail dot com
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-11-17  3:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #22 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:287cc750b0887e86cb309d976b17c7ee95f7ad48

commit r11-5074-g287cc750b0887e86cb309d976b17c7ee95f7ad48
Author: liuhongt <hongtao.liu@intel.com>
Date:   Mon Oct 19 16:04:39 2020 +0800

    Support variable index vec_set.

    gcc/ChangeLog:

            PR target/97194
            * config/i386/i386-expand.c (ix86_expand_vector_set_var): New
function.
            * config/i386/i386-protos.h (ix86_expand_vector_set_var): New Decl.
            * config/i386/predicates.md (vec_setm_operand): New predicate,
            true for const_int_operand or register_operand under TARGET_AVX2.
            * config/i386/sse.md (vec_set<mode>): Support both constant
            and variable index vec_set.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/avx2-vec-set-1.c: New test.
            * gcc.target/i386/avx2-vec-set-2.c: New test.
            * gcc.target/i386/avx512bw-vec-set-1.c: New test.
            * gcc.target/i386/avx512bw-vec-set-2.c: New test.
            * gcc.target/i386/avx512f-vec-set-2.c: New test.
            * gcc.target/i386/avx512vl-vec-set-2.c: New test.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (21 preceding siblings ...)
  2020-11-17  3:31 ` cvs-commit at gcc dot gnu.org
@ 2020-11-17  3:39 ` crazylht at gmail dot com
  2021-06-17 13:20 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 27+ messages in thread
From: crazylht at gmail dot com @ 2020-11-17  3:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #23 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC11, may need a bit adjustment for the modeless operand(the variable
index) as dicussed in
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559213.html

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (22 preceding siblings ...)
  2020-11-17  3:39 ` crazylht at gmail dot com
@ 2021-06-17 13:20 ` cvs-commit at gcc dot gnu.org
  2021-07-06 17:28 ` cvs-commit at gcc dot gnu.org
  2022-01-11 12:05 ` rguenth at gcc dot gnu.org
  25 siblings, 0 replies; 27+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-06-17 13:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #24 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:20a2c8ace0ab56c147fd995432abd5e7cf89b0e3

commit r12-1563-g20a2c8ace0ab56c147fd995432abd5e7cf89b0e3
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Thu Jun 17 15:19:12 2021 +0200

    i386: Add variable vec_set for 64bit vectors [PR97194]

    To generate sane code a SSE4.1 variable PBLENDV instruction is needed.

    2021-06-17  Uroš Bizjak  <ubizjak@gmail.com>

    gcc/
            PR target/97194
            * config/i386/i386-expand.c (expand_vector_set_var):
            Handle V2FS mode remapping.  Pass TARGET_MMX_WITH_SSE to
            ix86_expand_vector_init_duplicate.
            (ix86_expand_vector_init_duplicate): Emit insv_1 for
            QImode for !TARGET_PARTIAL_REG_STALL.
            * config/i386/predicates.md (vec_setm_mmx_operand): New predicate.
            * config/i386/mmx.md (vec_setv2sf): Use vec_setm_mmx_operand
            as operand 2 predicate.  Call ix86_expand_vector_set_var
            for non-constant index operand.
            (vec_setv2si): Ditto.
            (vec_setv4hi): Ditto.
            (vec_setv8qi): ditto.

    gcc/testsuite/

            PR target/97194
            * gcc.target/i386/sse4_1-vec-set-1.c: New test.
            * gcc.target/i386/sse4_1-vec-set-2.c: ditto.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (23 preceding siblings ...)
  2021-06-17 13:20 ` cvs-commit at gcc dot gnu.org
@ 2021-07-06 17:28 ` cvs-commit at gcc dot gnu.org
  2022-01-11 12:05 ` rguenth at gcc dot gnu.org
  25 siblings, 0 replies; 27+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2021-07-06 17:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #25 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Uros Bizjak <uros@gcc.gnu.org>:

https://gcc.gnu.org/g:f65878178ab05180a5937f11f8fdb755678a82ce

commit r12-2085-gf65878178ab05180a5937f11f8fdb755678a82ce
Author: Uros Bizjak <ubizjak@gmail.com>
Date:   Tue Jul 6 19:27:34 2021 +0200

    i386: Add variable vec_set for 32bit vectors [PR97194]

    To generate sane code a SSE4.1 variable PBLENDV instruction is needed.

    Also enable variable vec_set through vec_setm_operand predicate
    for TARGET_SSE4_1 instead of TARGET_AVX2. 
ix86_expand_vector_init_duplicate
    is able to emulate vpbroadcast{b,w} with pxor/pshufb.

    2021-07-06  Uroš Bizjak  <ubizjak@gmail.com>

    gcc/
            PR target/97194
            * config/i386/predicates.md (vec_setm_operand): Enable
            register_operand for TARGET_SSE4_1.
            * config/i386/mmx.md (vec_setv2hi): Use vec_setm_operand
            as operand 2 predicate.  Call ix86_expand_vector_set_var
            for non-constant index operand.
            (vec_setv4qi): Use vec_setm_mmx_operand as operand 2 predicate.
            Call ix86_expand_vector_set_var for non-constant index operand.

    gcc/testsuite/

            PR target/97194
            * gcc.target/i386/sse4_1-vec-set-1a.c: New test.
            * gcc.target/i386/sse4_1-vec-set-2a.c: Ditto.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [Bug target/97194] optimize vector element set/extract at variable position
  2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
                   ` (24 preceding siblings ...)
  2021-07-06 17:28 ` cvs-commit at gcc dot gnu.org
@ 2022-01-11 12:05 ` rguenth at gcc dot gnu.org
  25 siblings, 0 replies; 27+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-01-11 12:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #26 from Richard Biener <rguenth at gcc dot gnu.org> ---
Let's declare this generic bug fixed.  Specific unhandled cases on x86 should
get a new bug with a specific testcase.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2022-01-11 12:05 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-24 14:26 [Bug target/97194] New: optimize vector element set/extract at variable position rguenth at gcc dot gnu.org
2020-09-24 14:27 ` [Bug target/97194] " rguenth at gcc dot gnu.org
2020-09-24 14:39 ` rguenth at gcc dot gnu.org
2020-09-24 14:52 ` rguenth at gcc dot gnu.org
2020-09-25  6:47 ` rguenth at gcc dot gnu.org
2020-09-27  9:27 ` crazylht at gmail dot com
2020-09-28  7:01 ` rguenth at gcc dot gnu.org
2020-09-28  8:55 ` amonakov at gcc dot gnu.org
2020-09-28  9:20 ` rguenth at gcc dot gnu.org
2020-09-28 10:45 ` amonakov at gcc dot gnu.org
2020-09-28 11:10 ` rguenther at suse dot de
2020-09-28 11:19 ` amonakov at gcc dot gnu.org
2020-09-28 11:42 ` rguenth at gcc dot gnu.org
2020-09-28 11:44 ` rguenth at gcc dot gnu.org
2020-09-28 13:20 ` amonakov at gcc dot gnu.org
2020-10-16  7:17 ` crazylht at gmail dot com
2020-10-16  7:23 ` crazylht at gmail dot com
2020-10-16  7:31 ` rguenth at gcc dot gnu.org
2020-10-16  7:32 ` rguenth at gcc dot gnu.org
2020-10-16  7:46 ` crazylht at gmail dot com
2020-10-16  7:49 ` crazylht at gmail dot com
2020-10-16  8:03 ` rguenther at suse dot de
2020-11-17  3:31 ` cvs-commit at gcc dot gnu.org
2020-11-17  3:39 ` crazylht at gmail dot com
2021-06-17 13:20 ` cvs-commit at gcc dot gnu.org
2021-07-06 17:28 ` cvs-commit at gcc dot gnu.org
2022-01-11 12:05 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).