Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Richard Biener <richard.guenther@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>,
	Hao Liu OS <hliu@os.amperecomputing.com>,
	 "GCC-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	richard.sandiford@arm.com
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]
Date: Wed, 26 Jul 2023 14:00:04 +0200	[thread overview]
Message-ID: <CAFiYyc25JMB2KFOSiTJictPxG=eZR_jQuiXMO5RyaCjyHhVa+Q@mail.gmail.com> (raw)
In-Reply-To: <mptjzunc77g.fsf@arm.com>

On Wed, Jul 26, 2023 at 12:12 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Wed, Jul 26, 2023 at 11:14 AM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Richard Biener <richard.guenther@gmail.com> writes:
> >> > On Wed, Jul 26, 2023 at 4:02 AM Hao Liu OS via Gcc-patches
> >> > <gcc-patches@gcc.gnu.org> wrote:
> >> >>
> >> >> > When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that we're not papering over an issue elsewhere.
> >> >>
> >> >> Yes, I also wonder if this is an issue in vectorizable_reduction.  Below is the the gimple of "gcc.target/aarch64/sve/cost_model_13.c":
> >> >>
> >> >>   <bb 3>:
> >> >>   # res_18 = PHI <res_15(7), 0(6)>
> >> >>   # i_20 = PHI <i_16(7), 0(6)>
> >> >>   _1 = (long unsigned int) i_20;
> >> >>   _2 = _1 * 2;
> >> >>   _3 = x_14(D) + _2;
> >> >>   _4 = *_3;
> >> >>   _5 = (unsigned short) _4;
> >> >>   res.0_6 = (unsigned short) res_18;
> >> >>   _7 = _5 + res.0_6;                             <-- The current stmt_info
> >> >>   res_15 = (short int) _7;
> >> >>   i_16 = i_20 + 1;
> >> >>   if (n_11(D) > i_16)
> >> >>     goto <bb 7>;
> >> >>   else
> >> >>     goto <bb 4>;
> >> >>
> >> >>   <bb 7>:
> >> >>   goto <bb 3>;
> >> >>
> >> >> It looks like that STMT_VINFO_REDUC_DEF should be "res_18 = PHI <res_15(7), 0(6)>"?
> >> >> The status here is:
> >> >>   STMT_VINFO_REDUC_IDX (stmt_info): 1
> >> >>   STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION
> >> >>   STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0
> >> >
> >> > Not all stmts in the SSA cycle forming the reduction have
> >> > STMT_VINFO_REDUC_DEF set,
> >> > only the last (latch def) and live stmts have at the moment.
> >>
> >> Ah, thanks.  In that case, Hao, I think we can avoid the ICE by changing:
> >>
> >>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
> >>       && vect_is_reduction (stmt_info))
> >>
> >> to:
> >>
> >>   if ((kind == scalar_stmt || kind == vector_stmt || kind == vec_to_scalar)
> >>       && STMT_VINFO_LIVE_P (stmt_info)
> >>       && vect_is_reduction (stmt_info))
> >>
> >> instead of using a null check.
> >
> > But as seen you will miss stmts that are part of the reduction then?
>
> Yeah, but the code is doing a maximum of all the reductions in the loop:
>
>       /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunately
>          that's not yet the case.  */
>       ops->reduction_latency = MAX (ops->reduction_latency, base * count);
>
> So as it stands, we only need to see each reduction (as opposed to each
> reduction statement) once.
>
> But we do want to know the length of each reduction...

Yeah, it really tells the current costing infrastructure is far from
perfect here ...

We could pass you the stmt_info for the reduction PHI (aka the
info_for_reduction)
once with a special kind, vect_reduction, so you could walk relevant stmts
from that.  You get a number of add_stmts for each reduction, that could
be a hint of the length as well ...  (but I think we always pass the 'main'
stmt_info here)

> > In principle we could put STMT_VINFO_REDUC_DEF to other stmts
> > as well.  See vectorizable_reduction in the
> >
> >   while (reduc_def != PHI_RESULT (reduc_def_phi))
> >
> > loop.
>
> Nod.  That's where I'd got the STMT_VINFO_LIVE_P thing from.
>
> >> I see that vectorizable_reduction calculates a reduc_chain_length.
> >> Would it be OK to store that in the stmt_vec_info?  I suppose the
> >> AArch64 code should be multiplying by that as well.  (It would be a
> >> separate patch from this one though.)
> >
> > I don't think that's too relevant here (it also counts noop conversions).
>
> Bah.  I'm loath to copy that loop and just pick out the relevant
> statements though.
>
> I suppose if every statement had a STMT_VINFO_REDUC_DEF, aarch64 could
> maintain a hash map from STMT_VINFO_REDUC_DEF to total latencies, and
> then take the maximum of those total latencies.  It sounds pretty
> complex though...

Ick.  Yeah.  I think most of the costing is still nearly GIGO, I hardly see any
loop vectorization disqualified because of cost (happens for BB SLP though).

Richard.

> Thanks,
> Richard
>

next prev parent reply	other threads:[~2023-07-26 12:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-19  4:33 Hao Liu OS
2023-07-24  1:58 ` Hao Liu OS
2023-07-24 11:10 ` Richard Sandiford
2023-07-25  9:10   ` Hao Liu OS
2023-07-25  9:44     ` Richard Sandiford
2023-07-26  2:01       ` Hao Liu OS
2023-07-26  8:47         ` Richard Biener
2023-07-26  9:14           ` Richard Sandiford
2023-07-26 10:02             ` Richard Biener
2023-07-26 10:12               ` Richard Sandiford
2023-07-26 12:00                 ` Richard Biener [this message]
2023-07-26 12:54             ` Hao Liu OS
2023-07-28 10:06               ` Hao Liu OS
2023-07-28 17:35               ` Richard Sandiford
2023-07-31  2:39                 ` Hao Liu OS
2023-07-31  9:11                   ` Richard Sandiford
2023-07-31  9:25                     ` Hao Liu OS
2023-08-01  9:43                     ` Hao Liu OS
2023-08-02  3:45                       ` Hao Liu OS
2023-08-03  9:33                         ` Hao Liu OS
2023-08-03 10:10                         ` Richard Sandiford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFiYyc25JMB2KFOSiTJictPxG=eZR_jQuiXMO5RyaCjyHhVa+Q@mail.gmail.com' \
    --to=richard.guenther@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hliu@os.amperecomputing.com \
    --cc=richard.sandiford@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).