Re: [PATCH 4/8] vect: Determine input vectype for multiple lane-reducing

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <richard.guenther@gmail.com>
To: Feng Xue OS <fxue@os.amperecomputing.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 4/8] vect: Determine input vectype for multiple lane-reducing
Date: Wed, 19 Jun 2024 15:01:19 +0200	[thread overview]
Message-ID: <CAFiYyc1F-GLqnkex12nriVTYqXKhNm0UyFUEETf70uCt_vnnPw@mail.gmail.com> (raw)
In-Reply-To: <LV2PR01MB78393B9BE03402556710D12CF7CC2@LV2PR01MB7839.prod.exchangelabs.com>

On Sun, Jun 16, 2024 at 9:25 AM Feng Xue OS <fxue@os.amperecomputing.com> wrote:
>
> The input vectype of reduction PHI statement must be determined before
> vect cost computation for the reduction. Since lance-reducing operation has
> different input vectype from normal one, so we need to traverse all reduction
> statements to find out the input vectype with the least lanes, and set that to
> the PHI statement.
>
> Thanks,
> Feng
>
> ---
> gcc/
>         * tree-vect-loop.cc (vectorizable_reduction): Determine input vectype
>         during traversal of reduction statements.
> ---
>  gcc/tree-vect-loop.cc | 72 +++++++++++++++++++++++++++++--------------
>  1 file changed, 49 insertions(+), 23 deletions(-)
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 0f7b125e72d..39aa5cb1197 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -7643,7 +7643,9 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>      {
>        stmt_vec_info def = loop_vinfo->lookup_def (reduc_def);
>        stmt_vec_info vdef = vect_stmt_to_vectorize (def);
> -      if (STMT_VINFO_REDUC_IDX (vdef) == -1)
> +      int reduc_idx = STMT_VINFO_REDUC_IDX (vdef);
> +
> +      if (reduc_idx == -1)
>         {
>           if (dump_enabled_p ())
>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -7686,10 +7688,50 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>               return false;
>             }
>         }
> -      else if (!stmt_info)
> -       /* First non-conversion stmt.  */
> -       stmt_info = vdef;
> -      reduc_def = op.ops[STMT_VINFO_REDUC_IDX (vdef)];
> +      else
> +       {
> +         /* First non-conversion stmt.  */
> +         if (!stmt_info)
> +           stmt_info = vdef;
> +
> +         if (lane_reducing_op_p (op.code))
> +           {
> +             unsigned group_size = slp_node ? SLP_TREE_LANES (slp_node) : 0;
> +             tree op_type = TREE_TYPE (op.ops[0]);
> +             tree new_vectype_in = get_vectype_for_scalar_type (loop_vinfo,
> +                                                                op_type,
> +                                                                group_size);

I think doing it this way does not adhere to the vector type size constraint
with loop vectorization.  You should use vect_is_simple_use like the
original code did as the actual vector definition determines the vector type
used.

You are always using op.ops[0] here - I think that works because
reduc_idx is the last operand of all lane-reducing ops.  But then
we should assert reduc_idx != 0 here and add a comment.

> +
> +             /* The last operand of lane-reducing operation is for
> +                reduction.  */
> +             gcc_assert (reduc_idx > 0 && reduc_idx == (int) op.num_ops - 1);
> +
> +             /* For lane-reducing operation vectorizable analysis needs the
> +                reduction PHI information */
> +             STMT_VINFO_REDUC_DEF (def) = phi_info;
> +
> +             if (!new_vectype_in)
> +               return false;
> +
> +             /* Each lane-reducing operation has its own input vectype, while
> +                reduction PHI will record the input vectype with the least
> +                lanes.  */
> +             STMT_VINFO_REDUC_VECTYPE_IN (vdef) = new_vectype_in;
> +
> +             /* To accommodate lane-reducing operations of mixed input
> +                vectypes, choose input vectype with the least lanes for the
> +                reduction PHI statement, which would result in the most
> +                ncopies for vectorized reduction results.  */
> +             if (!vectype_in
> +                 || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
> +                      < GET_MODE_SIZE (SCALAR_TYPE_MODE (op_type))))
> +               vectype_in = new_vectype_in;

I know this is a fragile area but I always wonder since the accumulating operand
is the largest (all lane-reducing ops are widening), and that will be
equal to the
type of the PHI node, how this condition can be ever true.

ncopies is determined by the VF, so the comment is at least misleading.

> +           }
> +         else
> +           vectype_in = STMT_VINFO_VECTYPE (phi_info);

Please initialize vectype_in from phi_info before the loop (that
should never be NULL).

I'll note that with your patch it seems we'd initialize vectype_in to
the biggest
non-accumulation vector type involved in lane-reducing ops but the accumulating
type might still be larger.   Why, when we have multiple lane-reducing
ops, would
we chose the largest input here?  I see we eventually do

  if (slp_node)
    ncopies = 1;
  else
    ncopies = vect_get_num_copies (loop_vinfo, vectype_in);

but then IIRC we always force a single cycle def for lane-reducing ops(?).
In particular for vect_transform_reduction and SLP we rely on
SLP_TREE_NUMBER_OF_VEC_STMTS while non-SLP uses
STMT_VINFO_REDUC_VECTYPE_IN.

So I wonder what breaks when we set vectype_in = vector type of PHI?

Richard.

> +       }
> +
> +      reduc_def = op.ops[reduc_idx];
>        reduc_chain_length++;
>        if (!stmt_info && slp_node)
>         slp_for_stmt_info = SLP_TREE_CHILDREN (slp_for_stmt_info)[0];
> @@ -7747,6 +7789,8 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>
>    tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
>    STMT_VINFO_REDUC_VECTYPE (reduc_info) = vectype_out;
> +  STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) = vectype_in;
> +
>    gimple_match_op op;
>    if (!gimple_extract_op (stmt_info->stmt, &op))
>      gcc_unreachable ();
> @@ -7831,16 +7875,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>           = get_vectype_for_scalar_type (loop_vinfo,
>                                          TREE_TYPE (op.ops[i]), slp_op[i]);
>
> -      /* To properly compute ncopies we are interested in the widest
> -        non-reduction input type in case we're looking at a widening
> -        accumulation that we later handle in vect_transform_reduction.  */
> -      if (lane_reducing
> -         && vectype_op[i]
> -         && (!vectype_in
> -             || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
> -                 < GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_op[i]))))))
> -       vectype_in = vectype_op[i];
> -
>        /* Record how the non-reduction-def value of COND_EXPR is defined.
>          ???  For a chain of multiple CONDs we'd have to match them up all.  */
>        if (op.code == COND_EXPR && reduc_chain_length == 1)
> @@ -7859,14 +7893,6 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>             }
>         }
>      }
> -  if (!vectype_in)
> -    vectype_in = STMT_VINFO_VECTYPE (phi_info);
> -  STMT_VINFO_REDUC_VECTYPE_IN (reduc_info) = vectype_in;
> -
> -  /* Each lane-reducing operation has its own input vectype, while reduction
> -     PHI records the input vectype with least lanes.  */
> -  if (lane_reducing)
> -    STMT_VINFO_REDUC_VECTYPE_IN (stmt_info) = vectype_in;
>
>    enum vect_reduction_type reduction_type = STMT_VINFO_REDUC_TYPE (phi_info);
>    STMT_VINFO_REDUC_TYPE (reduc_info) = reduction_type;
> --
> 2.17.1

next prev parent reply	other threads:[~2024-06-19 13:01 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-16  7:25 Feng Xue OS
2024-06-19 13:01 ` Richard Biener [this message]
2024-06-20  5:47   ` Feng Xue OS
2024-06-20 11:52     ` Richard Biener
2024-06-26 14:48     ` Feng Xue OS
2024-06-28 12:59       ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc1F-GLqnkex12nriVTYqXKhNm0UyFUEETf70uCt_vnnPw@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=fxue@os.amperecomputing.com \
    --cc=gcc-patches@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).