[Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
@ 2023-09-13  9:31 juzhe.zhong at rivai dot ai
  2023-09-13  9:46 ` [Bug c/111401] " rguenth at gcc dot gnu.org
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-09-13  9:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

            Bug ID: 111401
           Summary: Middle-end: Missed optimization of
                    MASK_LEN_FOLD_LEFT_PLUS
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: juzhe.zhong at rivai dot ai
  Target Milestone: ---

There is a case I think I missed the optimization in the loop vectorizer:

https://godbolt.org/z/x5sjdenhM

double
foo2 (double *__restrict a,
     double init,
     int *__restrict cond,
     int n)
{
    for (int i = 0; i < n; i++)
      if (cond[i])
        init += a[i];
    return init;
}

It generate the GIMPLE IR as follows:

_60 = .SELECT_VL (ivtmp_58, 4);
...
vect__ifc__35.14_56 = .VCOND_MASK (mask__23.10_50, vect__8.13_54, { 0.0, 0.0,
0.0, 0.0 });
  _36 = .MASK_LEN_FOLD_LEFT_PLUS (init_20, vect__ifc__35.14_56, { -1, -1, -1,
-1 }, _60, 0);

The mask of MASK_LEN_FOLD_LEFT_PLUS is the dummy mask {-1.-1,...-1}
I think we should forward the mask of VCOND_MASK into the
MASK_LEN_FOLD_LEFT_PLUS.

Then we can eliminate the VCOND_MASK.


I don't where is the optimal place to do the optimization.

Should be the match.pd ? or the loop vectorizer code?

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug c/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
  2023-09-13  9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
@ 2023-09-13  9:46 ` rguenth at gcc dot gnu.org
  2023-09-13 16:52 ` [Bug middle-end/111401] " rdapp at gcc dot gnu.org
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-09-13  9:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2023-09-13

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The vectorizer sees if-converted code like

  <bb 3> [local count: 955630224]:
  # init_20 = PHI <_36(8), init_12(D)(18)>
  # i_22 = PHI <i_18(8), 0(18)>
  _1 = (long unsigned int) i_22;
  _2 = _1 * 4;
  _3 = cond_15(D) + _2;
  _4 = *_3;
  _23 = _4 != 0;
  _6 = _1 * 8;
  _38 = _37 + _6;
  _7 = (double *) _38;
  _8 = .MASK_LOAD (_7, 64B, _23);
  _ifc__35 = _23 ? _8 : 0.0;
  _36 = init_20 + _ifc__35;
  i_18 = i_22 + 1;
  if (n_13(D) > i_18)

so what it produces matches up here.  There's the possibility to
modify the if-conversion handling to use a COND_ADD instead of
the COND_EXPR plus ADD, I think that would be the best thing here.
See tree-if-conv.cc:is_cond_scalar_reduction/convert_scalar_cond_reduction

I think this is also wrong code when signed zeros are involved.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
  2023-09-13  9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
  2023-09-13  9:46 ` [Bug c/111401] " rguenth at gcc dot gnu.org
@ 2023-09-13 16:52 ` rdapp at gcc dot gnu.org
  2023-09-13 21:25 ` rdapp at gcc dot gnu.org
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-09-13 16:52 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

Robin Dapp <rdapp at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rdapp at gcc dot gnu.org

--- Comment #2 from Robin Dapp <rdapp at gcc dot gnu.org> ---
I played around with this a bit.  Emitting a COND_LEN in if-convert is easy:

_ifc__35 = .COND_ADD (_23, init_20, _8, init_20);

However, during reduction handling we rely on the reduction being a gimple
assign and binary operation, though so I needed to fix some places and indices
as well as the proper mask.

What complicates things a bit is that we assume that "init_20" (i.e. the
reduction def) occurs once when we have it twice in the COND_ADD.  I just
special cased that for now.  Is this the proper thing to do?

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 23c6e8259e7..e99add3cf16 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared
*shared)
 static bool
 fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
 {
-  if (code == PLUS_EXPR)
+  if (code == PLUS_EXPR || code == IFN_COND_ADD)
     {
       *reduc_fn = IFN_FOLD_LEFT_PLUS;
       return true;
@@ -4106,8 +4106,11 @@ vect_is_simple_reduction (loop_vec_info loop_info,
stmt_vec_info phi_info,
           return NULL;
         }

-      nphi_def_loop_uses++;
-      phi_use_stmt = use_stmt;
+      if (use_stmt != phi_use_stmt)
+       {
+         nphi_def_loop_uses++;
+         phi_use_stmt = use_stmt;
+       }

@@ -7440,6 +7457,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       if (i == STMT_VINFO_REDUC_IDX (stmt_info))
        continue;

+      if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)])
+       continue;
+

Apart from that I think what's mainly missing is making the added code nicer. 
Going to attach a tentative patch later.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
  2023-09-13  9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
  2023-09-13  9:46 ` [Bug c/111401] " rguenth at gcc dot gnu.org
  2023-09-13 16:52 ` [Bug middle-end/111401] " rdapp at gcc dot gnu.org
@ 2023-09-13 21:25 ` rdapp at gcc dot gnu.org
  2023-09-14  6:46 ` rguenther at suse dot de
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-09-13 21:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

--- Comment #3 from Robin Dapp <rdapp at gcc dot gnu.org> ---
Several other things came up, so I'm just going to post the latest status here
without having revised or tested it.  Going to try fixing it and testing
tomorrow.

--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared
*shared)
 static bool
 fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
 {
-  if (code == PLUS_EXPR)
+  if (code == PLUS_EXPR || code == IFN_COND_ADD)
     {
       *reduc_fn = IFN_FOLD_LEFT_PLUS;
       return true;
@@ -4106,8 +4106,13 @@ vect_is_simple_reduction (loop_vec_info loop_info,
stmt_vec_info phi_info,
           return NULL;
         }

-      nphi_def_loop_uses++;
-      phi_use_stmt = use_stmt;
+      /* We might have two uses in the same instruction, only count them as
+        one. */
+      if (use_stmt != phi_use_stmt)
+       {
+         nphi_def_loop_uses++;
+         phi_use_stmt = use_stmt;
+       }
     }

   tree latch_def = PHI_ARG_DEF_FROM_EDGE (phi, loop_latch_edge (loop));
@@ -6861,7 +6866,7 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
                               gimple **vec_stmt, slp_tree slp_node,
                               gimple *reduc_def_stmt,
                               tree_code code, internal_fn reduc_fn,
-                              tree ops[3], tree vectype_in,
+                              tree *ops, int num_ops, tree vectype_in,
                               int reduc_index, vec_loop_masks *masks,
                               vec_loop_lens *lens)
 {
@@ -6883,11 +6888,24 @@ vectorize_fold_left_reduction (loop_vec_info
loop_vinfo,
     gcc_assert (known_eq (TYPE_VECTOR_SUBPARTS (vectype_out),
                          TYPE_VECTOR_SUBPARTS (vectype_in)));

-  tree op0 = ops[1 - reduc_index];
+  /* The operands either come from a binary operation or a COND_ADD operation.
+     The former is a gimple assign and the latter is a gimple call with four
+     arguments.  */
+  gcc_assert (num_ops == 2 || num_ops == 4);
+  bool is_cond_add = num_ops == 4;
+  tree op0, opmask;
+  if (!is_cond_add)
+    op0 = ops[1 - reduc_index];
+  else
+    {
+      op0 = ops[2];
+      opmask = ops[0];
+      gcc_assert (!slp_node);
+    }
   int group_size = 1;
   stmt_vec_info scalar_dest_def_info;
-  auto_vec<tree> vec_oprnds0;
+  auto_vec<tree> vec_oprnds0, vec_opmask;
   if (slp_node)
     {
       auto_vec<vec<tree> > vec_defs (2);
@@ -6903,9 +6921,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
       vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1,
                                     op0, &vec_oprnds0);
       scalar_dest_def_info = stmt_info;
+      if (is_cond_add)
+       {
+         vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1,
+                                        opmask, &vec_opmask);
+         gcc_assert (vec_opmask.length() == 1);
+       }
     }

-  tree scalar_dest = gimple_assign_lhs (scalar_dest_def_info->stmt);
+  gimple *sdef = scalar_dest_def_info->stmt;
+  tree scalar_dest = is_gimple_call (sdef)
+                      ? gimple_call_lhs (sdef)
+                      : gimple_assign_lhs (scalar_dest_def_info->stmt);
   tree scalar_type = TREE_TYPE (scalar_dest);
   tree reduc_var = gimple_phi_result (reduc_def_stmt);

@@ -6945,7 +6972,11 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
                                   i, 1);
          signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS
(loop_vinfo);
          bias = build_int_cst (intQI_type_node, biasval);
-         mask = build_minus_one_cst (truth_type_for (vectype_in));
+         /* If we have a COND_ADD take its mask.  Otherwise use {-1, ...}.  */
+         if (is_cond_add)
+           mask = vec_opmask[0];
+         else
+           mask = build_minus_one_cst (truth_type_for (vectype_in));
        }

       /* Handle MINUS by adding the negative.  */
@@ -7440,6 +7471,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
       if (i == STMT_VINFO_REDUC_IDX (stmt_info))
        continue;

+      if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)])
+       continue;
+
       /* There should be only one cycle def in the stmt, the one
          leading to reduc_def.  */
       if (VECTORIZABLE_CYCLE_DEF (dt))
@@ -8211,8 +8245,21 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
       vec_num = 1;
     }

-  code_helper code = canonicalize_code (op.code, op.type);
-  internal_fn cond_fn = get_conditional_internal_fn (code, op.type);
+  code_helper code (op.code);
+  internal_fn cond_fn;
+
+  if (code.is_internal_fn ())
+    {
+      internal_fn ifn = internal_fn (op.code);
+      code = canonicalize_code (conditional_internal_fn_code (ifn), op.type);
+      cond_fn = ifn;
+    }
+  else
+    {
+      code = canonicalize_code (op.code, op.type);
+      cond_fn = get_conditional_internal_fn (code, op.type);
+    }
+
   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
   vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
   bool mask_by_cond_expr = use_mask_by_cond_expr_p (code, cond_fn,
vectype_in);
@@ -8240,8 +8287,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
       gcc_assert (code.is_tree_code ());
       return vectorize_fold_left_reduction
          (loop_vinfo, stmt_info, gsi, vec_stmt, slp_node, reduc_def_phi,
-          tree_code (code), reduc_fn, op.ops, vectype_in, reduc_index, masks,
-          lens);
+          tree_code (code), reduc_fn, op.ops, op.num_ops, vectype_in,
+          reduc_index, masks, lens);
     }

   bool single_defuse_cycle = STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info);

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
  2023-09-13  9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
                   ` (2 preceding siblings ...)
  2023-09-13 21:25 ` rdapp at gcc dot gnu.org
@ 2023-09-14  6:46 ` rguenther at suse dot de
  2023-09-14  6:51 ` rguenther at suse dot de
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenther at suse dot de @ 2023-09-14  6:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 13 Sep 2023, rdapp at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
> 
> Robin Dapp <rdapp at gcc dot gnu.org> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                  CC|                            |rdapp at gcc dot gnu.org
> 
> --- Comment #2 from Robin Dapp <rdapp at gcc dot gnu.org> ---
> I played around with this a bit.  Emitting a COND_LEN in if-convert is easy:
> 
> _ifc__35 = .COND_ADD (_23, init_20, _8, init_20);
> 
> However, during reduction handling we rely on the reduction being a gimple
> assign and binary operation, though so I needed to fix some places and indices
> as well as the proper mask.
> 
> What complicates things a bit is that we assume that "init_20" (i.e. the
> reduction def) occurs once when we have it twice in the COND_ADD.  I just
> special cased that for now.  Is this the proper thing to do?

I think so - we should ignore a use in the else value when the other
use is in that same stmt.

> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 23c6e8259e7..e99add3cf16 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -3672,7 +3672,7 @@ vect_analyze_loop (class loop *loop, vec_info_shared
> *shared)
>  static bool
>  fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
>  {
> -  if (code == PLUS_EXPR)
> +  if (code == PLUS_EXPR || code == IFN_COND_ADD)
>      {
>        *reduc_fn = IFN_FOLD_LEFT_PLUS;
>        return true;
> @@ -4106,8 +4106,11 @@ vect_is_simple_reduction (loop_vec_info loop_info,
> stmt_vec_info phi_info,
>            return NULL;
>          }
> 
> -      nphi_def_loop_uses++;
> -      phi_use_stmt = use_stmt;
> +      if (use_stmt != phi_use_stmt)
> +       {
> +         nphi_def_loop_uses++;
> +         phi_use_stmt = use_stmt;
> +       }
> 
> @@ -7440,6 +7457,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>        if (i == STMT_VINFO_REDUC_IDX (stmt_info))
>         continue;
> 
> +      if (op.ops[i] == op.ops[STMT_VINFO_REDUC_IDX (stmt_info)])
> +       continue;
> +
> 
> Apart from that I think what's mainly missing is making the added code nicer. 
> Going to attach a tentative patch later.
> 
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
  2023-09-13  9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
                   ` (3 preceding siblings ...)
  2023-09-14  6:46 ` rguenther at suse dot de
@ 2023-09-14  6:51 ` rguenther at suse dot de
  2023-09-14 15:07 ` rdapp at gcc dot gnu.org
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rguenther at suse dot de @ 2023-09-14  6:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
On Wed, 13 Sep 2023, rdapp at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401
> 
> --- Comment #3 from Robin Dapp <rdapp at gcc dot gnu.org> ---
> Several other things came up, so I'm just going to post the latest status here
> without having revised or tested it.  Going to try fixing it and testing
> tomorrow.

I think what's important to do is make sure targets without
masking are still getting the cond-reduction code generation
(but with the signed-zero issue fixed).  Using a cond_add is
probably better than the vec_cond + add even for the not
fold-left reduction case.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
  2023-09-13  9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
                   ` (4 preceding siblings ...)
  2023-09-14  6:51 ` rguenther at suse dot de
@ 2023-09-14 15:07 ` rdapp at gcc dot gnu.org
  2023-09-15  6:42 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-09-14 15:07 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

--- Comment #6 from Robin Dapp <rdapp at gcc dot gnu.org> ---
Created attachment 55902
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55902&action=edit
Tentative

You're referring to the case where we have init = -0.0, the condition is false
and we end up wrongly doing -0.0 + 0.0 = 0.0?
I suppose -0.0 the proper neutral element for PLUS (and WIDEN_SUM?) when
honoring signed zeros?  And 0.0 for MINUS?  Doesn't that also depend on the
rounding mode?

neutral_op_for_reduction could return a -0 for PLUS if we honor it for that
type.  Or is that too intrusive?

Guess I should add a test case for that as well.

Another thing is that swapping operands is not as easy with COND_ADD because
the addition would be in the else.  I'd punt for that case for now.

Next problem - might be a mistake on my side.  For avx512 we create a COND_ADD
but the respective MASK_FOLD_LEFT_PLUS is not available, causing us to create
numerous vec_extracts as fallback that increase the cost until we don't
vectorize anymore.

Therefore I added a
vectorized_internal_fn_supported_p (IFN_FOLD_LEFT_PLUS, TREE_TYPE (lhs)).
SLP paths and ncopies != 1 are excluded as well.  Not really happy with how the
patch looks now but at least the testsuites on aarch and x86 pass.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
  2023-09-13  9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
                   ` (5 preceding siblings ...)
  2023-09-14 15:07 ` rdapp at gcc dot gnu.org
@ 2023-09-15  6:42 ` rguenth at gcc dot gnu.org
  2023-11-02 10:50 ` cvs-commit at gcc dot gnu.org
  2023-11-02 22:40 ` juzhe.zhong at rivai dot ai
  8 siblings, 0 replies; 10+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-09-15  6:42 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Robin Dapp from comment #6)
> Created attachment 55902 [details]
> Tentative
> 
> You're referring to the case where we have init = -0.0, the condition is
> false and we end up wrongly doing -0.0 + 0.0 = 0.0?
> I suppose -0.0 the proper neutral element for PLUS (and WIDEN_SUM?) when
> honoring signed zeros?  And 0.0 for MINUS?  Doesn't that also depend on the
> rounding mode?

Yes, if the rounding mode isn't known there isn't a working neutral element.

> neutral_op_for_reduction could return a -0 for PLUS if we honor it for that
> type.  Or is that too intrusive?

I suppose that could work, but we need to check that we're not using this
for the initial value.

> Guess I should add a test case for that as well.
> 
> Another thing is that swapping operands is not as easy with COND_ADD because
> the addition would be in the else.  I'd punt for that case for now.
> 
> Next problem - might be a mistake on my side.  For avx512 we create a
> COND_ADD but the respective MASK_FOLD_LEFT_PLUS is not available, causing us
> to create numerous vec_extracts as fallback that increase the cost until we
> don't vectorize anymore.

Yeah, but then a fold-left reduction wasn't necessary in the first place?
We should avoid that (it's slow even when the target supports it) when
possible.

> Therefore I added a
> vectorized_internal_fn_supported_p (IFN_FOLD_LEFT_PLUS, TREE_TYPE (lhs)).
> SLP paths and ncopies != 1 are excluded as well.  Not really happy with how
> the patch looks now but at least the testsuites on aarch and x86 pass.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
  2023-09-13  9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
                   ` (6 preceding siblings ...)
  2023-09-15  6:42 ` rguenth at gcc dot gnu.org
@ 2023-11-02 10:50 ` cvs-commit at gcc dot gnu.org
  2023-11-02 22:40 ` juzhe.zhong at rivai dot ai
  8 siblings, 0 replies; 10+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-02 10:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Robin Dapp <rdapp@gcc.gnu.org>:

https://gcc.gnu.org/g:01c18f58d37865d5f3bbe93e666183b54ec608c7

commit r14-5076-g01c18f58d37865d5f3bbe93e666183b54ec608c7
Author: Robin Dapp <rdapp@ventanamicro.com>
Date:   Wed Sep 13 22:19:35 2023 +0200

    ifcvt/vect: Emit COND_OP for conditional scalar reduction.

    As described in PR111401 we currently emit a COND and a PLUS expression
    for conditional reductions.  This makes it difficult to combine both
    into a masked reduction statement later.
    This patch improves that by directly emitting a COND_ADD/COND_OP during
    ifcvt and adjusting some vectorizer code to handle it.

    It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
    is true.

    gcc/ChangeLog:

            PR middle-end/111401
            * internal-fn.cc (internal_fn_else_index): New function.
            * internal-fn.h (internal_fn_else_index): Define.
            * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_OP
            if supported.
            (predicate_scalar_phi): Add whitespace.
            * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_OP.
            (neutral_op_for_reduction): Return -0 for PLUS.
            (check_reduction_path): Don't count else operand in COND_OP.
            (vect_is_simple_reduction): Ditto.
            (vect_create_epilog_for_reduction): Fix whitespace.
            (vectorize_fold_left_reduction): Add COND_OP handling.
            (vectorizable_reduction): Don't count else operand in COND_OP.
            (vect_transform_reduction): Add COND_OP handling.
            * tree-vectorizer.h (neutral_op_for_reduction): Add default
            parameter.

    gcc/testsuite/ChangeLog:

            * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
            * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
            * gcc.target/riscv/rvv/autovec/reduc/reduc_call-2.c: Adjust.
            * gcc.target/riscv/rvv/autovec/reduc/reduc_call-4.c: Ditto.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [Bug middle-end/111401] Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS
  2023-09-13  9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
                   ` (7 preceding siblings ...)
  2023-11-02 10:50 ` cvs-commit at gcc dot gnu.org
@ 2023-11-02 22:40 ` juzhe.zhong at rivai dot ai
  8 siblings, 0 replies; 10+ messages in thread
From: juzhe.zhong at rivai dot ai @ 2023-11-02 22:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111401

JuzheZhong <juzhe.zhong at rivai dot ai> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #9 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
Fixed

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-11-02 22:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-13  9:31 [Bug c/111401] New: Middle-end: Missed optimization of MASK_LEN_FOLD_LEFT_PLUS juzhe.zhong at rivai dot ai
2023-09-13  9:46 ` [Bug c/111401] " rguenth at gcc dot gnu.org
2023-09-13 16:52 ` [Bug middle-end/111401] " rdapp at gcc dot gnu.org
2023-09-13 21:25 ` rdapp at gcc dot gnu.org
2023-09-14  6:46 ` rguenther at suse dot de
2023-09-14  6:51 ` rguenther at suse dot de
2023-09-14 15:07 ` rdapp at gcc dot gnu.org
2023-09-15  6:42 ` rguenth at gcc dot gnu.org
2023-11-02 10:50 ` cvs-commit at gcc dot gnu.org
2023-11-02 22:40 ` juzhe.zhong at rivai dot ai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).