[Bug c/109088] New: GCC fail auto-vectorization

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/109088] New: GCC fail auto-vectorization
@ 2023-03-10  9:24 juzhe.zhong at rivai dot ai
  2023-03-10 10:39 ` [Bug c/109088] " ubizjak at gmail dot com
                   ` (21 more replies)
  0 siblings, 22 replies; 23+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-03-10  9:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

            Bug ID: 109088
           Summary: GCC fail auto-vectorization
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

Such case, LLVM succeed in auto-vectorization but GCC fail.

https://godbolt.org/z/1x68jo36d

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug c/109088] GCC fail auto-vectorization
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
@ 2023-03-10 10:39 ` ubizjak at gmail dot com
  2023-03-10 12:39 ` [Bug tree-optimization/109088] " rguenth at gcc dot gnu.org
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: ubizjak at gmail dot com @ 2023-03-10 10:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #1 from Uroš Bizjak <ubizjak at gmail dot com> ---
Please read https://gcc.gnu.org/bugs/ on how to correctly report a bug.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC fail auto-vectorization
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
  2023-03-10 10:39 ` [Bug c/109088] " ubizjak at gmail dot com
@ 2023-03-10 12:39 ` rguenth at gcc dot gnu.org
  2023-03-10 13:16 ` juzhe.zhong at rivai dot ai
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-03-10 12:39 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization
          Component|c                           |tree-optimization
             Blocks|                            |53947

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think we can do conditional reduction but maybe not for variable-length
vectors?  Otherwise I'm sure there's a duplicate bug about this.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC fail auto-vectorization
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
  2023-03-10 10:39 ` [Bug c/109088] " ubizjak at gmail dot com
  2023-03-10 12:39 ` [Bug tree-optimization/109088] " rguenth at gcc dot gnu.org
@ 2023-03-10 13:16 ` juzhe.zhong at rivai dot ai
  2023-03-10 14:04 ` pinskia at gcc dot gnu.org
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-03-10 13:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Richard Biener from comment #2)
> I think we can do conditional reduction but maybe not for variable-length
> vectors?  Otherwise I'm sure there's a duplicate bug about this.

No, GCC still can not vectorize when I specify the fixed-length vectors 
(-msve-vector-bits=512).

If it's a known issue, do you have any suggestion? I am willing to fix it when
I finished the RVV auto-vectorization.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC fail auto-vectorization
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (2 preceding siblings ...)
  2023-03-10 13:16 ` juzhe.zhong at rivai dot ai
@ 2023-03-10 14:04 ` pinskia at gcc dot gnu.org
  2023-03-10 14:09 ` [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction pinskia at gcc dot gnu.org
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-03-10 14:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Created attachment 54635
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54635&action=edit
testcase

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (3 preceding siblings ...)
  2023-03-10 14:04 ` pinskia at gcc dot gnu.org
@ 2023-03-10 14:09 ` pinskia at gcc dot gnu.org
  2023-09-26 12:14 ` juzhe.zhong at rivai dot ai
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-03-10 14:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2023-03-10
           Severity|normal                      |enhancement
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
            Summary|GCC fail auto-vectorization |GCC does not always
                   |                            |vectorize conditional
                   |                            |reduction

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
GCC does vectorize:
    if (a[i] > b[i]) {
      result += a[i];
    }

Even for variable-length vectors.
Just we don't vectorize when there is an extra conditional operation.

GCC will even vectorize with variable-length vectors:
#include <stdint.h>

uint64_t single_loop_with_if_condition(
        uint64_t * restrict a,
        uint64_t * restrict b,
        uint64_t * restrict c,
        int loop_size) {
  uint64_t result = 0;

  for (int i = 0; i < loop_size; i++) {
    c[i] = a[i] + 1;
    if (a[i] > b[i]) {
      result += c[i];
    }
  }
  return result;
}

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (4 preceding siblings ...)
  2023-03-10 14:09 ` [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction pinskia at gcc dot gnu.org
@ 2023-09-26 12:14 ` juzhe.zhong at rivai dot ai
  2023-09-27  2:45 ` juzhe.zhong at rivai dot ai
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-09-26 12:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #6 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
After investigations:

GCC failed to vectorize reduction with multiple conditional operations:

ifcvt dump:

# result_20 = PHI <result_9(8), 0(18)>
...
_11 = result_20 + 10;
result_17 = _4 + _11;
_23 = _4 > _7;
result_9 = _23 ? result_17 : result_20;

It's odd that GCC failed to vectorize it since they are not complicate
statements.

In LLVM, it will vectorize them into:

vector_ssa_2 = <vector_ssa_result, 0>
...
vector_ssa_1 = vector_ssa_2 + 10;
vector_ssa_3 = vector_ssa_1 + 10;
mask_ssa_1 = vector_ssa_4 > vector_ssa_5;
vector_ssa_result = select <mask_ssa_1, vector_ssa_3, vector_ssa_2>

I think GCC should be able to vectorize it like LLVM:

vector_ssa_2 = <vector_ssa_result, 0>
...
vector_ssa_1 = vector_ssa_2 + 10;
vector_ssa_3 = vector_ssa_1 + 10;
mask_ssa_1 = vector_ssa_4 > vector_ssa_5;
vector_ssa_result = VCOND_MASK <mask_ssa_1, vector_ssa_3, vector_ssa_2>

I saw this code disable the vectorization:
      else if (!bbs.is_empty ()
               && bb->loop_father->header == bb
               && bb->loop_father->dont_vectorize)
        {
          if (dump_enabled_p ())
            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                             "splitting region at dont-vectorize loop %d "
                             "entry at bb%d\n",
                             bb->loop_father->num, bb->index);
          split = true;
        }

I am not familiar with these codes, any ideas ? Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (5 preceding siblings ...)
  2023-09-26 12:14 ` juzhe.zhong at rivai dot ai
@ 2023-09-27  2:45 ` juzhe.zhong at rivai dot ai
  2023-09-27  2:58 ` juzhe.zhong at rivai dot ai
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-09-27  2:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #7 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Update the analysis:

We failed to recognize it as reduction because, the PHI result has 2 uses:

# result_20 = PHI <result_9(8), 0(18)>
...
_11 = result_20 + 10;   --------> first use
result_17 = _4 + _11;
_23 = _4 > _7;
result_9 = _23 ? result_17 : result_20;  -----> second use

It seems that it is the if-conversion issue which makes loop vectorizer failed
to vectorize it.

I have checked LLVM implementation, their "result" ssa always has single use no
matter how I modify the codes (for example, result += a[i] + b[i] + c[i]).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (6 preceding siblings ...)
  2023-09-27  2:45 ` juzhe.zhong at rivai dot ai
@ 2023-09-27  2:58 ` juzhe.zhong at rivai dot ai
  2023-09-27  7:15 ` rguenth at gcc dot gnu.org
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-09-27  2:58 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #8 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
It's because the order of the operations we are doing:

For code as follows:

result += mask ? a[i] + x : 0;

GCC:
result_ssa_1 = PHI <result_ssa_2, 0>
...
STMT 1. tmp = a[i] + x;
STMT 2. tmp2 = tmp + result_ssa_1;
STMT 3. result_ssa_2 = mask ? tmp2 : result_ssa_1;

Here we can see both STMT 2 and STMT 3 are using 'result_ssa_1',
we end up with 2 uses of the PHI result. Then, we failed to vectorize.

Wheras LLVM:

result_ssa_1 = PHI <result_ssa_2, 0>
...
IR 1. tmp = a[i] + x;
IR 2. tmp2 = mask ? tmp : 0;
IR 3. result_ssa_2 = tmp2 + result_ssa_1.

LLVM only has 1 use.

Is it reasonable to swap the order in match.pd ?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (7 preceding siblings ...)
  2023-09-27  2:58 ` juzhe.zhong at rivai dot ai
@ 2023-09-27  7:15 ` rguenth at gcc dot gnu.org
  2023-09-27  7:34 ` juzhe.zhong at rivai dot ai
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-09-27  7:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rdapp at gcc dot gnu.org

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to JuzheZhong from comment #8)
> It's because the order of the operations we are doing:
> 
> For code as follows:
> 
> result += mask ? a[i] + x : 0;
> 
> GCC:
> result_ssa_1 = PHI <result_ssa_2, 0>
> ...
> STMT 1. tmp = a[i] + x;
> STMT 2. tmp2 = tmp + result_ssa_1;
> STMT 3. result_ssa_2 = mask ? tmp2 : result_ssa_1;
> 
> Here we can see both STMT 2 and STMT 3 are using 'result_ssa_1',
> we end up with 2 uses of the PHI result. Then, we failed to vectorize.
> 
> Wheras LLVM:
> 
> result_ssa_1 = PHI <result_ssa_2, 0>
> ...
> IR 1. tmp = a[i] + x;
> IR 2. tmp2 = mask ? tmp : 0;
> IR 3. result_ssa_2 = tmp2 + result_ssa_1.

For floating point these are not equivalent (adding zero isn't a no-op).

> LLVM only has 1 use.
> 
> Is it reasonable to swap the order in match.pd ?

if-conversion could be teached to swap this (it's if-conversion creating
the IL for conditional reductions) when valid.  IIRC Robin Dapp also has
a patch to make if-conversion emit .COND_ADD instead which should make
it even better to vectorize.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (8 preceding siblings ...)
  2023-09-27  7:15 ` rguenth at gcc dot gnu.org
@ 2023-09-27  7:34 ` juzhe.zhong at rivai dot ai
  2023-09-27  9:06 ` rguenth at gcc dot gnu.org
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-09-27  7:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #10 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Richard Biener from comment #9)
> (In reply to JuzheZhong from comment #8)
> > It's because the order of the operations we are doing:
> > 
> > For code as follows:
> > 
> > result += mask ? a[i] + x : 0;
> > 
> > GCC:
> > result_ssa_1 = PHI <result_ssa_2, 0>
> > ...
> > STMT 1. tmp = a[i] + x;
> > STMT 2. tmp2 = tmp + result_ssa_1;
> > STMT 3. result_ssa_2 = mask ? tmp2 : result_ssa_1;
> > 
> > Here we can see both STMT 2 and STMT 3 are using 'result_ssa_1',
> > we end up with 2 uses of the PHI result. Then, we failed to vectorize.
> > 
> > Wheras LLVM:
> > 
> > result_ssa_1 = PHI <result_ssa_2, 0>
> > ...
> > IR 1. tmp = a[i] + x;
> > IR 2. tmp2 = mask ? tmp : 0;
> > IR 3. result_ssa_2 = tmp2 + result_ssa_1.
> 
> For floating point these are not equivalent (adding zero isn't a no-op).

Yes, I agree these are not equivalent for floating-point.
But I they are equivalent if we specify -ffast-math.

I have double checked LLVM, they failed to vectorize conditionl
floating-point reduction too by default.

However, if we specify LLVM -ffast-math, it will generate the same 
if-conversion IR sequence as integer, then vectorization succeed.

> 
> > LLVM only has 1 use.
> > 
> > Is it reasonable to swap the order in match.pd ?
> 
> if-conversion could be teached to swap this (it's if-conversion creating
> the IL for conditional reductions) when valid.  IIRC Robin Dapp also has
> a patch to make if-conversion emit .COND_ADD instead which should make
> it even better to vectorize.

I knew that patch, Robin is trying fixing the issue (in-order reduction)that I
posted.

I have confirm that patch can't help since it didn't modify the code for this
case, we will end up with multiple use in conditional reduction.

The reduction failed since:

  /* If this isn't a nested cycle or if the nested cycle reduction value
     is used ouside of the inner loop we cannot handle uses of the reduction
     value.  */
  if (nlatch_def_loop_uses > 1 || nphi_def_loop_uses > 1)
    {
      if (dump_enabled_p ())
        dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                         "reduction used in loop.\n");
      return NULL;
    }

when  nphi_def_loop_uses  > 1, we failed to vectorize.

I have checked LLVM codes, and I think we can extend this function:

strip_nop_cond_scalar_reduction

We should be able to strip all the statement until we can reach the
use of PHI result, like this:

LLVM is able to handle this case:

for ()
  if (cond)
    result += a[i] + b[i] + c[i] + .... 

No matter how many variables are added in the condition reduction.
They well handle that since they keep iterating all the statement until
reach the result:

result_ssa_1 = PHI <>
tmp1 = result_ssa_1 + a[i];
tmp2 = tmp1 + b[i];
tmp3 = tmp2 + c[i];
....

We keep iterating until find the result_ssa_1 to hold the reduction variable.

Is this LLVM's approach reasonable to GCC?

If yes, I can translate LLVM code into GCC.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (9 preceding siblings ...)
  2023-09-27  7:34 ` juzhe.zhong at rivai dot ai
@ 2023-09-27  9:06 ` rguenth at gcc dot gnu.org
  2023-09-27  9:27 ` juzhe.zhong at rivai dot ai
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-09-27  9:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #11 from Richard Biener <rguenth at gcc dot gnu.org> ---
I don't think strip_nop_cond_scalar_reduction is the place to adjust here,
maybe it's the caller.  I don't have time to dig into the specific issue right
now but if we require scalar code adjustments then we need to perform those in
if-conversion.

But to me it looks like allowing

> > STMT 1. tmp = a[i] + x;
> > STMT 2. tmp2 = tmp + result_ssa_1;
> > STMT 3. result_ssa_2 = mask ? tmp2 : result_ssa_1;

in vect_is_simple_reduction might also be a reasonable approach.  The
use in the COND_EXPR isn't really a use - it's a conditional update.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (10 preceding siblings ...)
  2023-09-27  9:06 ` rguenth at gcc dot gnu.org
@ 2023-09-27  9:27 ` juzhe.zhong at rivai dot ai
  2023-09-27 14:11 ` juzhe.zhong at rivai dot ai
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-09-27  9:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #12 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Richard Biener from comment #11)
> I don't think strip_nop_cond_scalar_reduction is the place to adjust here,
> maybe it's the caller.  I don't have time to dig into the specific issue
> right now but if we require scalar code adjustments then we need to perform
> those in if-conversion.
> 
> But to me it looks like allowing
> 
> > > STMT 1. tmp = a[i] + x;
> > > STMT 2. tmp2 = tmp + result_ssa_1;
> > > STMT 3. result_ssa_2 = mask ? tmp2 : result_ssa_1;
> 
> in vect_is_simple_reduction might also be a reasonable approach.  The
> use in the COND_EXPR isn't really a use - it's a conditional update.

Thanks Richi.

Enhancing vect_is_simple_reduction in loop vectorizer is also a good approach.
But I think it's better to recognize the scalar condition reduction
(if-conversion) as early as possible. Obviously, current if-conversion failed
to
recognize it as a feasible conditional reduction.

I think enhancing vect_is_simple_reduction is the approach that it's unlikely
we
can simplify the scalar code in if-converison to fit current loop vectorizer.

I believe we will eventually have to enhance both if-converison and loop
vectorizer in the future. And I prefer improving the if-conversion and working
on it. Will keep you posted.

Thanks a lot!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (11 preceding siblings ...)
  2023-09-27  9:27 ` juzhe.zhong at rivai dot ai
@ 2023-09-27 14:11 ` juzhe.zhong at rivai dot ai
  2023-10-06  9:44 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-09-27 14:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #13 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Hi, Richi. This is my draft approach to enhance the finding more potential
condtional reduction.

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index a8c915913ae..c25d2038f16 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1790,8 +1790,72 @@ is_cond_scalar_reduction (gimple *phi, gimple **reduc,
tree arg_0, tree arg_1,
       std::swap (r_op1, r_op2);
       std::swap (r_nop1, r_nop2);
     }
-  else if (r_nop1 != PHI_RESULT (header_phi))
-    return false;
+  else if (r_nop1 == PHI_RESULT (header_phi))
+    ;
+  else
+    {
+      /* Analyze the statement chain of STMT so that we could teach generate
+        better if-converison code sequence.  We are trying to catch this
+        following situation:
+
+          loop-header:
+          reduc_1 = PHI <..., reduc_2>
+          ...
+          if (...)
+          tmp1 = reduc_1 + rhs1;
+          tmp2 = tmp1 + rhs2;
+          tmp3 = tmp2 + rhs3;
+          ...
+          reduc_3 = tmpN-1 + rhsN-1;
+
+          reduc_2 = PHI <reduc_1, reduc_3>
+
+          and convert to
+
+          reduc_2 = PHI <0, reduc_1>
+          tmp1 = rhs1 + rhs2;
+          tmp2 = tmp1 + rhs3;
+          tmp3 = tmp2 + rhs4;
+          ...
+          tmpN-1 = tmpN-2 + rhsN;
+          ifcvt = cond_expr ? tmpN-1 : 0
+          reduc_1 = tmpN-1 +/- ifcvt;  */
+      if (num_imm_uses (PHI_RESULT (header_phi)) != 2)
+       return false;
+      FOR_EACH_IMM_USE_FAST (use_p, imm_iter, PHI_RESULT (header_phi))
+       {
+         gimple *use_stmt = USE_STMT (use_p);
+         if (is_gimple_assign (use_stmt))
+           {
+             if (gimple_assign_rhs_code (use_stmt) != reduction_op)
+               return false;
+             if (TREE_CODE (gimple_assign_lhs (use_stmt)) != SSA_NAME)
+               return false;
+
+             bool visited_p = false;
+             while (!visited_p)
+               {
+                 use_operand_p use;
+                 if (!single_imm_use (gimple_assign_lhs (use_stmt), &use,
+                                      &use_stmt)
+                     || gimple_bb (use_stmt) != gimple_bb (stmt)
+                     || !is_gimple_assign (use_stmt)
+                     || TREE_CODE (gimple_assign_lhs (use_stmt)) != SSA_NAME
+                     || gimple_assign_rhs_code (use_stmt) != reduction_op)
+                   return false;
+
+                 if (gimple_assign_lhs (use_stmt) == gimple_assign_lhs (stmt))
+                   {
+                     r_op2 = r_op1;
+                     r_op1 = PHI_RESULT (header_phi);
+                     visited_p = true;
+                   }
+               }
+           }
+         else if (use_stmt != phi)
+           return false;
+       }
+    }


My approach is doing the check as follows:

           tmp1 = reduc_1 + rhs1;
           tmp2 = tmp1 + rhs2;
           tmp3 = tmp2 + rhs3;
           ...
           reduc_3 = tmpN-1 + rhsN-1;

Start the iteration check from "tmp1 = reduc_1 + rhs1;" until "reduc_3 = tmpN-1
+ rhsN-1;"

Make sure each statement are PLUS_EXPR for reduction sum.
Does it look reasonable ?

It succeed on vectorization.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (12 preceding siblings ...)
  2023-09-27 14:11 ` juzhe.zhong at rivai dot ai
@ 2023-10-06  9:44 ` rguenth at gcc dot gnu.org
  2023-11-10 12:02 ` juzhe.zhong at rivai dot ai
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-10-06  9:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #14 from Richard Biener <rguenth at gcc dot gnu.org> ---
Sorry for the delay - but this looks exactly like Robins transform to COND_ADD,
no?  But sure, the current code doesn't handle a reduction path through
multiple stmts but when if-conversion would convert the final add to a COND_ADD
then
it should be a matter of teaching tree-vect-loop.cc:check_reduction_path
about conditional operations?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (13 preceding siblings ...)
  2023-10-06  9:44 ` rguenth at gcc dot gnu.org
@ 2023-11-10 12:02 ` juzhe.zhong at rivai dot ai
  2023-11-10 13:10 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-10 12:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #15 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Hi,Richard.
Confirmed Robin's patch doesn't help with this issue.

The root cause of this issue is failed to recognize it as possible
vectorization of conditional reduction which means is_cond_scalar_reduction
is FALSE.

I have this following patch which bootstrap on X86 and regtest passed
also passed on aarch64. This following patch can enhance if-conv
conditional reduction recognition.

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index a8c915913ae..2bdd3710a65 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -1784,14 +1784,119 @@ is_cond_scalar_reduction (gimple *phi, gimple **reduc,
tree arg_0, tree arg_1,
   r_nop2 = strip_nop_cond_scalar_reduction (*has_nop, r_op2);

   /* Make R_OP1 to hold reduction variable.  */
+  gimple *nonphi_use_stmt = NULL;
   if (r_nop2 == PHI_RESULT (header_phi)
       && commutative_tree_code (reduction_op))
     {
       std::swap (r_op1, r_op2);
       std::swap (r_nop1, r_nop2);
     }
-  else if (r_nop1 != PHI_RESULT (header_phi))
-    return false;
+  else if (r_nop1 == PHI_RESULT (header_phi))
+    ;
+  else
+    {
+      /* Analyze the statement chain of STMT so that we could teach generate
+        better if-converison code sequence.  We are trying to catch this
+        following situation:
+
+          loop-header:
+          reduc_1 = PHI <..., reduc_2>
+          ...
+          if (...)
+          tmp1 = reduc_1 + rhs1;
+          tmp2 = tmp1 + rhs2;
+          tmp3 = tmp2 + rhs3;
+          ...
+          reduc_3 = tmpN-1 + rhsN-1;
+
+          reduc_2 = PHI <reduc_1, reduc_3>
+
+          and convert to
+
+          reduc_2 = PHI <0, reduc_1>
+          tmp1 = rhs1;
+          tmp2 = tmp1 + rhs2;
+          tmp3 = tmp2 + rhs3;
+          ...
+          reduc_3 = tmpN-1 + rhsN-1;
+          ifcvt = cond_expr ? reduc_3 : 0;
+          reduc_1 = reduc_1 +/- ifcvt;  */
+      if (num_imm_uses (PHI_RESULT (header_phi)) != 2)
+       return false;
+      if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (PHI_RESULT (phi)))
+         && !(FLOAT_TYPE_P (TREE_TYPE (PHI_RESULT (phi)))
+              && !HONOR_SIGNED_ZEROS (TREE_TYPE (PHI_RESULT (phi)))
+              && !HONOR_SIGN_DEPENDENT_ROUNDING (TREE_TYPE (PHI_RESULT (phi)))
+              && !HONOR_NANS (TREE_TYPE (PHI_RESULT (phi)))))
+       return false;
+      gimple *prev_use_stmt, *curr_use_stmt;
+      use_operand_p use;
+      FOR_EACH_IMM_USE_FAST (use_p, imm_iter, PHI_RESULT (header_phi))
+       {
+         prev_use_stmt = curr_use_stmt = USE_STMT (use_p);
+         if (is_gimple_assign (curr_use_stmt))
+           {
+             if (TREE_CODE (gimple_assign_lhs (curr_use_stmt)) != SSA_NAME)
+               return false;
+             if (*has_nop)
+               {
+                 if (!CONVERT_EXPR_CODE_P (
+                       gimple_assign_rhs_code (curr_use_stmt)))
+                   return false;
+               }
+             else
+               {
+                 if (gimple_assign_rhs_code (curr_use_stmt) != reduction_op)
+                   return false;
+               }
+
+             bool visited_p = false;
+             nonphi_use_stmt = curr_use_stmt;
+             while (!visited_p)
+               {
+                 if (!single_imm_use (gimple_assign_lhs (prev_use_stmt), &use,
+                                      &curr_use_stmt)
+                     || gimple_bb (curr_use_stmt) != gimple_bb (stmt)
+                     || !is_gimple_assign (curr_use_stmt)
+                     || TREE_CODE (gimple_assign_lhs (curr_use_stmt))
+                          != SSA_NAME
+                     || gimple_assign_rhs_code (curr_use_stmt) !=
reduction_op)
+                   return false;
+                 if (curr_use_stmt == stmt)
+                   {
+                     if (*has_nop)
+                       {
+                         if (!single_imm_use (gimple_assign_lhs (
+                                                nonphi_use_stmt),
+                                              &use, &curr_use_stmt))
+                           return false;
+                         r_op1 = gimple_assign_lhs (nonphi_use_stmt);
+                         r_nop1 = PHI_RESULT (header_phi);
+                         nonphi_use_stmt = curr_use_stmt;
+                       }
+                     else
+                       r_op1 = PHI_RESULT (header_phi);
+
+                     if (*has_nop)
+                       {
+                         if (!single_imm_use (gimple_assign_lhs (stmt), &use,
+                                              &curr_use_stmt))
+                           return false;
+                         r_op2 = gimple_assign_lhs (stmt);
+                         r_nop2 = gimple_assign_lhs (curr_use_stmt);
+                       }
+                     else
+                       r_op2 = gimple_assign_lhs (stmt);
+                     visited_p = true;
+                   }
+                 else
+                   prev_use_stmt = curr_use_stmt;
+               }
+           }
+         else if (curr_use_stmt != phi)
+           return false;
+       }
+    }

   if (*has_nop)
     {
@@ -1816,12 +1921,41 @@ is_cond_scalar_reduction (gimple *phi, gimple **reduc,
tree arg_0, tree arg_1,
        continue;
       if (use_stmt == stmt)
        continue;
+      if (use_stmt == nonphi_use_stmt)
+       continue;
       if (gimple_code (use_stmt) != GIMPLE_PHI)
        return false;
     }

   *op0 = r_op1; *op1 = r_op2;
   *reduc = stmt;
+
+  if (nonphi_use_stmt)
+    {
+      /* Transform:
+
+       if (...)
+          tmp1 = reduc_1 + rhs1;
+          tmp2 = tmp1 + rhs2;
+          tmp3 = tmp2 + rhs3;
+
+       into:
+
+          tmp1 = rhs1 + 0;   ---> We replace reduc_1 into '0'
+          tmp2 = tmp1 + rhs2;
+          tmp3 = tmp2 + rhs3;
+          ...
+          reduc_3 = tmpN-1 + rhsN-1;
+          ifcvt = cond_expr ? reduc_3 : 0;  */
+      gcc_assert (gimple_assign_rhs_code (nonphi_use_stmt) == reduction_op);
+      if (gimple_assign_rhs1 (nonphi_use_stmt) == r_op1)
+       gimple_assign_set_rhs1 (nonphi_use_stmt,
+                               build_zero_cst (TREE_TYPE (r_op1)));
+      else if (gimple_assign_rhs2 (nonphi_use_stmt) == r_op1)
+       gimple_assign_set_rhs2 (nonphi_use_stmt,
+                               build_zero_cst (TREE_TYPE (r_op1)));
+      update_stmt (nonphi_use_stmt);
+    }
   return true;
 }

@@ -1886,12 +2020,17 @@ convert_scalar_cond_reduction (gimple *reduc,
gimple_stmt_iterator *gsi,
       gsi_remove (&stmt_it, true);
       release_defs (nop_reduc);
     }
+
   gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);

   /* Delete original reduction stmt.  */
-  stmt_it = gsi_for_stmt (reduc);
-  gsi_remove (&stmt_it, true);
-  release_defs (reduc);
+  if (op1 != gimple_assign_lhs (reduc))
+    {
+      stmt_it = gsi_for_stmt (reduc);
+      gsi_remove (&stmt_it, true);
+      release_defs (reduc);
+    }
+
   return rhs;
 }

I have fully tested it with different kinds of condtional reduction.
All can be vectorized.

I am not sure whether it is a feasible solution.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (14 preceding siblings ...)
  2023-11-10 12:02 ` juzhe.zhong at rivai dot ai
@ 2023-11-10 13:10 ` rguenth at gcc dot gnu.org
  2023-11-10 13:42 ` juzhe.zhong at rivai dot ai
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-10 13:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #16 from Richard Biener <rguenth at gcc dot gnu.org> ---
it's not exactly clear what the transform is you propose.  it looks like a
re-association but SSA names do not match up but the transform only replaces
a single op with 0!?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (15 preceding siblings ...)
  2023-11-10 13:10 ` rguenth at gcc dot gnu.org
@ 2023-11-10 13:42 ` juzhe.zhong at rivai dot ai
  2023-11-15 14:09 ` rguenth at gcc dot gnu.org
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-10 13:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #17 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Sorry for confusing and not enough information.

I am trying to transform:

+          reduc_1 = PHI <..., reduc_2>
+          ...
+          if (...)
+          tmp1 = reduc_1 + rhs1;
+          tmp2 = tmp1 + rhs2;
+          tmp3 = tmp2 + rhs3;
+          ...
+          reduc_3 = tmpN-1 + rhsN-1;
+
+          reduc_2 = PHI <reduc_1, reduc_3>

First transform the first statement:

tmp1 = reduc_1 + rhs1; into  tmp1 = rhs1 + 0;

Then it will become bogus data move assignment: tmp1 = rhs1.
The later PASS will eliminate it.

Then, transform the reduction PHI: 

reduc_1 = PHI <..., reduc_2>

into if-convert statement:

reduc_1 = PHI <_ifc__35(8), 0(18)>

Thid, transform 

+          reduc_3 = tmpN-1 + rhsN-1;
+
+          reduc_2 = PHI <reduc_1, reduc_3>

into :

reduc_3 = tmpN-1 + rhsN-1;
_ifc__35 = .COND_ADD (condition, reduc_1, reduc_3, reduc_1);


So finally:

result_1 = PHI <_ifc__35(8), 0(18)>
...
tmp1 = rhs1;
tmp2 = tmp1 + rhs2;
tmp3 = tmp2 + rhs3;
...
reduc_3 = tmpN-1 + rhsN-1;
_ifc__35 = .COND_ADD (condition, reduc_1, reduc_3, reduc_1);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (16 preceding siblings ...)
  2023-11-10 13:42 ` juzhe.zhong at rivai dot ai
@ 2023-11-15 14:09 ` rguenth at gcc dot gnu.org
  2023-11-15 14:38 ` juzhe.zhong at rivai dot ai
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-15 14:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #18 from Richard Biener <rguenth at gcc dot gnu.org> ---
Ah, so this is for

int foo (int *a, int x)
{
  int sum = 0;
  for (int i = 0; i < 1024; ++i)
    if (a[i] < 10)
      sum = sum + a[i] + x;
  return sum;
}

transforming it to

int foo (int *a, int x)
{
  int sum = 0;
  for (int i = 0; i < 1024; ++i)
    {
      tem = a[i] + x;
      if (a[i] < 10)
        sum = sum + tem;
    }
  return sum;
}

note this is re-associating and thus not always valid.  It also executes
stmts unconditionally (also not always valid, for FP spurious exceptions
might be raised).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (17 preceding siblings ...)
  2023-11-15 14:09 ` rguenth at gcc dot gnu.org
@ 2023-11-15 14:38 ` juzhe.zhong at rivai dot ai
  2023-11-15 14:42 ` rguenther at suse dot de
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 23+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-15 14:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
I have added:

+      if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (PHI_RESULT (phi)))
+         && !(FLOAT_TYPE_P (TREE_TYPE (PHI_RESULT (phi)))
+              && !HONOR_SIGNED_ZEROS (TREE_TYPE (PHI_RESULT (phi)))
+              && !HONOR_SIGN_DEPENDENT_ROUNDING (TREE_TYPE (PHI_RESULT (phi)))
+              && !HONOR_NANS (TREE_TYPE (PHI_RESULT (phi)))))
+       return false;

for floating-point. I failed to see which situation will cause FP exceptions ?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (18 preceding siblings ...)
  2023-11-15 14:38 ` juzhe.zhong at rivai dot ai
@ 2023-11-15 14:42 ` rguenther at suse dot de
  2023-11-16  1:06 ` juzhe.zhong at rivai dot ai
  2023-11-16  6:50 ` rguenther at suse dot de
  21 siblings, 0 replies; 23+ messages in thread
From: rguenther at suse dot de @ 2023-11-15 14:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #20 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 15 Nov 2023, juzhe.zhong at rivai dot ai wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088
> 
> --- Comment #19 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> I have added:
> 
> +      if (!ANY_INTEGRAL_TYPE_P (TREE_TYPE (PHI_RESULT (phi)))

For TYPE_OVERFLOW_UNDEFINED you have to convert the ops to unsigned
to avoid spurious undefined overflow.

> +         && !(FLOAT_TYPE_P (TREE_TYPE (PHI_RESULT (phi)))
> +              && !HONOR_SIGNED_ZEROS (TREE_TYPE (PHI_RESULT (phi)))
> +              && !HONOR_SIGN_DEPENDENT_ROUNDING (TREE_TYPE (PHI_RESULT (phi)))
> +              && !HONOR_NANS (TREE_TYPE (PHI_RESULT (phi)))))

You should check flag_associative_math which covers one half
and !flag_trapping_math which covers spurious FP exceptions.

> +       return false;
> 
> for floating-point. I failed to see which situation will cause FP exceptions ?

Ops with NaN cause INVALID, but there's also INEXACT which can be
set differently after re-association.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (19 preceding siblings ...)
  2023-11-15 14:42 ` rguenther at suse dot de
@ 2023-11-16  1:06 ` juzhe.zhong at rivai dot ai
  2023-11-16  6:50 ` rguenther at suse dot de
  21 siblings, 0 replies; 23+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-16  1:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #21 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Thanks Richi.

Does re-associating (with eliminating exceptions) in if-convert is a reasonable
approach ?

If yes, I am gonna to revisit this PR on GCC-15 and refine the codes.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction
  2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
                   ` (20 preceding siblings ...)
  2023-11-16  1:06 ` juzhe.zhong at rivai dot ai
@ 2023-11-16  6:50 ` rguenther at suse dot de
  21 siblings, 0 replies; 23+ messages in thread
From: rguenther at suse dot de @ 2023-11-16  6:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088

--- Comment #22 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 16 Nov 2023, juzhe.zhong at rivai dot ai wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109088
> 
> --- Comment #21 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
> Thanks Richi.
> 
> Does re-associating (with eliminating exceptions) in if-convert is a reasonable
> approach ?

Yeah, I don't think we have a much better place at the moment.

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-11-16  6:50 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-10  9:24 [Bug c/109088] New: GCC fail auto-vectorization juzhe.zhong at rivai dot ai
2023-03-10 10:39 ` [Bug c/109088] " ubizjak at gmail dot com
2023-03-10 12:39 ` [Bug tree-optimization/109088] " rguenth at gcc dot gnu.org
2023-03-10 13:16 ` juzhe.zhong at rivai dot ai
2023-03-10 14:04 ` pinskia at gcc dot gnu.org
2023-03-10 14:09 ` [Bug tree-optimization/109088] GCC does not always vectorize conditional reduction pinskia at gcc dot gnu.org
2023-09-26 12:14 ` juzhe.zhong at rivai dot ai
2023-09-27  2:45 ` juzhe.zhong at rivai dot ai
2023-09-27  2:58 ` juzhe.zhong at rivai dot ai
2023-09-27  7:15 ` rguenth at gcc dot gnu.org
2023-09-27  7:34 ` juzhe.zhong at rivai dot ai
2023-09-27  9:06 ` rguenth at gcc dot gnu.org
2023-09-27  9:27 ` juzhe.zhong at rivai dot ai
2023-09-27 14:11 ` juzhe.zhong at rivai dot ai
2023-10-06  9:44 ` rguenth at gcc dot gnu.org
2023-11-10 12:02 ` juzhe.zhong at rivai dot ai
2023-11-10 13:10 ` rguenth at gcc dot gnu.org
2023-11-10 13:42 ` juzhe.zhong at rivai dot ai
2023-11-15 14:09 ` rguenth at gcc dot gnu.org
2023-11-15 14:38 ` juzhe.zhong at rivai dot ai
2023-11-15 14:42 ` rguenther at suse dot de
2023-11-16  1:06 ` juzhe.zhong at rivai dot ai
2023-11-16  6:50 ` rguenther at suse dot de

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).