public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/96481] New: SLP fail to vectorize VEC_COND_EXPR pattern.
@ 2020-08-05  9:26 crazylht at gmail dot com
  2020-08-05 11:26 ` [Bug tree-optimization/96481] " rguenth at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: crazylht at gmail dot com @ 2020-08-05  9:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96481

            Bug ID: 96481
           Summary: SLP fail to vectorize VEC_COND_EXPR pattern.
           Product: gcc
           Version: 11.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: crazylht at gmail dot com
  Target Milestone: ---

testcase not vectorized:
-----
#include <x86intrin.h>

inline unsigned opt(unsigned a, unsigned b, unsigned c, unsigned d) {
    return a > b ? c : d;
}

void opt( unsigned * __restrict dst, const unsigned *pa, const unsigned *pb,
        const unsigned *pc, const unsigned  *pd )
{

     *dst++ = opt(*pa++, *pb++, *pc++, *pd++);
     *dst++ = opt(*pa++, *pb++, *pc++, *pd++);
     *dst++ = opt(*pa++, *pb++, *pc++, *pd++);
     *dst++ = opt(*pa++, *pb++, *pc++, *pd++);
}
----


testcase successfully vectorized:

----
inline unsigned opt(unsigned a, unsigned b, unsigned c, unsigned d) {
    return a > b ? c : d;
}

void opt( unsigned * __restrict dst, const unsigned *pa, const unsigned *pb,
        const unsigned *pc, const unsigned  *pd )
{
    for (int i = 0; i != 4; i++)
     *dst++ = opt(*pa++, *pb++, *pc++, *pd++);
}
----

llvm can handle both case
refer to https://godbolt.org/z/jYoPxT

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/96481] SLP fail to vectorize VEC_COND_EXPR pattern.
  2020-08-05  9:26 [Bug tree-optimization/96481] New: SLP fail to vectorize VEC_COND_EXPR pattern crazylht at gmail dot com
@ 2020-08-05 11:26 ` rguenth at gcc dot gnu.org
  2020-08-05 11:28 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-08-05 11:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96481

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2020-08-05
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
             Blocks|                            |53947

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Yes, this is a known limitation in that for basic-block SLP we do not perform
if-conversion.  Instead the basic-block SLP code sees

  <bb 2> [local count: 1073741824]:
  _1 = *pd_17(D);
  _2 = *pc_19(D);
  _3 = *pb_20(D);
  _4 = *pa_21(D);
  if (_3 < _4)
    goto <bb 11>; [50.00%]
  else
    goto <bb 3>; [50.00%]

  <bb 11> [local count: 536870912]:
  goto <bb 4>; [100.00%]

  <bb 3> [local count: 536870913]:

  <bb 4> [local count: 1073741824]:
  # iftmp.20_23 = PHI <_1(3), _2(11)>
  *dst_22(D) = iftmp.20_23;
  _5 = MEM[(const unsigned int *)pd_17(D) + 4B];
  _6 = MEM[(const unsigned int *)pc_19(D) + 4B];
  _7 = MEM[(const unsigned int *)pb_20(D) + 4B];
  _8 = MEM[(const unsigned int *)pa_21(D) + 4B];
...

which also rips apart the memory groups (we're slowly relaxing another
limitation that the basic-block SLP code operates on a single basic-block
at a time but for data refs this restriction will prevail).


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/96481] SLP fail to vectorize VEC_COND_EXPR pattern.
  2020-08-05  9:26 [Bug tree-optimization/96481] New: SLP fail to vectorize VEC_COND_EXPR pattern crazylht at gmail dot com
  2020-08-05 11:26 ` [Bug tree-optimization/96481] " rguenth at gcc dot gnu.org
@ 2020-08-05 11:28 ` rguenth at gcc dot gnu.org
  2020-08-05 11:30 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-08-05 11:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96481

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |marxin at gcc dot gnu.org

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
So in theory we could record basic-block boundaries as DR group_id instead
and continue scanning over such "simple" (no VOPs involving) diamonds.
That leaves handling PHI nodes themselves of course but I think it should be
possible to weave them into the SLP graph as special ops - well,
virtual cond-exprs.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/96481] SLP fail to vectorize VEC_COND_EXPR pattern.
  2020-08-05  9:26 [Bug tree-optimization/96481] New: SLP fail to vectorize VEC_COND_EXPR pattern crazylht at gmail dot com
  2020-08-05 11:26 ` [Bug tree-optimization/96481] " rguenth at gcc dot gnu.org
  2020-08-05 11:28 ` rguenth at gcc dot gnu.org
@ 2020-08-05 11:30 ` rguenth at gcc dot gnu.org
  2020-08-10  3:20 ` marxin at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2020-08-05 11:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96481

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #2)
> So in theory we could record basic-block boundaries as DR group_id instead

Note for outer loop vect we need the BB restriction which means we'd need to
compute the group_id array also for loop vect.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/96481] SLP fail to vectorize VEC_COND_EXPR pattern.
  2020-08-05  9:26 [Bug tree-optimization/96481] New: SLP fail to vectorize VEC_COND_EXPR pattern crazylht at gmail dot com
                   ` (2 preceding siblings ...)
  2020-08-05 11:30 ` rguenth at gcc dot gnu.org
@ 2020-08-10  3:20 ` marxin at gcc dot gnu.org
  2021-08-19 23:13 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: marxin at gcc dot gnu.org @ 2020-08-10  3:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96481

Martin Liška <marxin at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |marxin at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #4 from Martin Liška <marxin at gcc dot gnu.org> ---
I can try working on that..

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/96481] SLP fail to vectorize VEC_COND_EXPR pattern.
  2020-08-05  9:26 [Bug tree-optimization/96481] New: SLP fail to vectorize VEC_COND_EXPR pattern crazylht at gmail dot com
                   ` (3 preceding siblings ...)
  2020-08-10  3:20 ` marxin at gcc dot gnu.org
@ 2021-08-19 23:13 ` pinskia at gcc dot gnu.org
  2021-08-20  9:18 ` rguenth at gcc dot gnu.org
  2021-08-20 10:46 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-19 23:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96481

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
   Last reconfirmed|2020-08-05 00:00:00         |2021-8-19

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/96481] SLP fail to vectorize VEC_COND_EXPR pattern.
  2020-08-05  9:26 [Bug tree-optimization/96481] New: SLP fail to vectorize VEC_COND_EXPR pattern crazylht at gmail dot com
                   ` (4 preceding siblings ...)
  2021-08-19 23:13 ` pinskia at gcc dot gnu.org
@ 2021-08-20  9:18 ` rguenth at gcc dot gnu.org
  2021-08-20 10:46 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-08-20  9:18 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96481

--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
so one interesting speciality of this testcase is that the ifs switch between
two scalar values and overall there's no control flow effect.  That is, for the
issue of splitting the dataref groups which we currently do on a BB granularity
we could solve this by not splitting groups when we are sure all members of the
group are either executed or not executed (which is the real intent of this
splitting).

In fact the current dataref_group compute seems to be useful only for
making sure to split groups _inside_ BBs at suitable points and the
cross-BB split is ensured by vect_analyze_data_ref_accesses.  We'd need
to enhance the dataref_group computation to be conservative for
cross-BB groups to relax the latter (and for outer loop vect compute it
there as well).  The simplest correctness fix is to ensure the group_id
is bumped when going from one BB to the next.

That would help this case up to encountering the PHIs/ifs which
we only vectorize when they are in the same BB.

inline unsigned opt(unsigned a, unsigned b, unsigned c, unsigned d) {
    return a > b ? c : d;
}

void opt( unsigned * __restrict dst, const unsigned *pa, const unsigned *pb,
        const unsigned *pc, const unsigned  *pd )
{
  unsigned tem = opt(*pa++, *pb++, *pc++, *pd++);
  unsigned tem1 = opt(*pa++, *pb++, *pc++, *pd++);
  unsigned tem2 = opt(*pa++, *pb++, *pc++, *pd++);
  unsigned tem3 = opt(*pa++, *pb++, *pc++, *pd++);
  *dst++ = tem;
  *dst++ = tem1;
  *dst++ = tem2;
  *dst++ = tem3;
}

ends up with

  _35 = {iftmp.24_22, iftmp.24_23, iftmp.24_24, iftmp.24_25};
  vectp.30_34 = dst_26(D);
  MEM <vector(4) unsigned int> [(unsigned int *)vectp.30_34] = _35;

SLP discovery stops at the PHIs which are spread out (and there's still
the loads spread as well).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/96481] SLP fail to vectorize VEC_COND_EXPR pattern.
  2020-08-05  9:26 [Bug tree-optimization/96481] New: SLP fail to vectorize VEC_COND_EXPR pattern crazylht at gmail dot com
                   ` (5 preceding siblings ...)
  2021-08-20  9:18 ` rguenth at gcc dot gnu.org
@ 2021-08-20 10:46 ` rguenth at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-08-20 10:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96481

--- Comment #6 from Richard Biener <rguenth at gcc dot gnu.org> ---
double a[2];
typedef double v2df __attribute__((vector_size(16)));
void foo (v2df x, v2df y, v2df z, v2df w)
{
  double a0, a1;
  a0 = x[0] < y[0] ? z[0] : w[0];
  a1 = x[1] < y[1] ? z[1] : w[1];
  a[0] = a0;
  a[1] = a1;
}

is maybe a testcase that only has the PHI / gimple_cond parts preventing full
vectorization.

I've pushed a patch that should allow experimenting with relaxing the BB split
of the dataref groups but I'm wondering about a good testcase to motivate
fixing that on its own.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-08-20 10:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-05  9:26 [Bug tree-optimization/96481] New: SLP fail to vectorize VEC_COND_EXPR pattern crazylht at gmail dot com
2020-08-05 11:26 ` [Bug tree-optimization/96481] " rguenth at gcc dot gnu.org
2020-08-05 11:28 ` rguenth at gcc dot gnu.org
2020-08-05 11:30 ` rguenth at gcc dot gnu.org
2020-08-10  3:20 ` marxin at gcc dot gnu.org
2021-08-19 23:13 ` pinskia at gcc dot gnu.org
2021-08-20  9:18 ` rguenth at gcc dot gnu.org
2021-08-20 10:46 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).