From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=I8LG=DM=gmail.com=richard.guenther@sourceware.org>
Received: from mail-lj1-x232.google.com (mail-lj1-x232.google.com [IPv6:2a00:1450:4864:20::232])
	by sourceware.org (Postfix) with ESMTPS id 48E023857027
	for <gcc-patches@gcc.gnu.org>; Wed, 26 Jul 2023 12:00:53 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 48E023857027
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=gmail.com
Received: by mail-lj1-x232.google.com with SMTP id 38308e7fff4ca-2b93fba1f62so97567081fa.1
        for <gcc-patches@gcc.gnu.org>; Wed, 26 Jul 2023 05:00:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20221208; t=1690372851; x=1690977651;
        h=content-transfer-encoding:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=UVYZwpe0N5TyMBzU1+52SeaQCGzkrLbyLHms1CwSRwE=;
        b=Q1Kn1NWuZ3D4sqVg0CbLiaEgidt2vFPn+BEmLUOVwfb15fd6BWo25JRvqo9R1e4tSC
         RuormGC5zgSETy8gHdamKZk0bNipJsOd+o4XZYgCWToZPaCQBa/XpD2aESykyuCzwFcg
         3/8TxarWfS7eN8pcdXrB4IEPQ8On7/uPtioGYFImirbOujrtCUYbIEwtVv+aF9HGT8PK
         RXYplUZ87IMN1z7IYewF3PHmux76jR8esDcAI7jhI8d/ENKyg2sDgXZQGNSdS16e6CbM
         CgXYFNV809RuRN0euxyuDOizdenByl0JvYz2pKVxMafAKm+18J2WpaCX4oMO7mA81h4y
         hw9w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20221208; t=1690372851; x=1690977651;
        h=content-transfer-encoding:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=UVYZwpe0N5TyMBzU1+52SeaQCGzkrLbyLHms1CwSRwE=;
        b=eNK/jljpTnflkCwg0sx/aJ+KvcQcCpSCBdKa8SdfotEPGXXAd/x7emoppQ8HXuB9ii
         YHcMeh7dS2/Vv9UuKFmwYGfWLp1RT4w2e3OXVo3RON6WC4NYE3ELZKUu5ETH6rpk1TQB
         5F36uZVOrjwWFVg9ol5tB1SntKS/SIPl1dpoB+IVYP1Ls9qLo+xh94mk/BmAWCTcwhHa
         zhSDoEMxYwdT2j+Qx50StcnBii6tXH01/0td9m3hwauAyc81JrxqswFgaYnC+egYhkQq
         2wz3MypzasNSXREHYJ/Plh4eCG206A8qAuziUSZeP/EOzCKsMeNeYmY5KI2w4KVsscqA
         +lHQ==
X-Gm-Message-State: ABy/qLZyWidPU+SP/Gl+PxcGm528V8G4RNT6MQzzCLru6eVxbL2Fteya
	JqF/OZTv78zpbpKt89GtGBHSrv6u7AUD0i3reWM=
X-Google-Smtp-Source: APBJJlHyX/4KYpzhJ/ec+4w1BN10qiupfU9hubJEHjWNmdmP8KS5ReO8uI546Xlzf4/bas/iElIHbCR6RHTK60Q7AQ0=
X-Received: by 2002:a05:651c:10d2:b0:2b6:fa54:cec1 with SMTP id
 l18-20020a05651c10d200b002b6fa54cec1mr1239327ljn.48.1690372850914; Wed, 26
 Jul 2023 05:00:50 -0700 (PDT)
MIME-Version: 1.0
References: <SJ2PR01MB8635742E07E2076FA2BE0560E139A@SJ2PR01MB8635.prod.exchangelabs.com>
 <mptr0oxh8eb.fsf@arm.com> <SJ2PR01MB8635D1095B5BDDAE1CA19877E103A@SJ2PR01MB8635.prod.exchangelabs.com>
 <mptsf9cfhqz.fsf@arm.com> <SJ2PR01MB86352C18F941F7EEB65C1B55E100A@SJ2PR01MB8635.prod.exchangelabs.com>
 <CAFiYyc2pyQTVy++qbuAPhq8FNLo+k5ysEdj0JG2eyLzgsFWCdA@mail.gmail.com>
 <mptv8e7c9wi.fsf@arm.com> <CAFiYyc0F-vvXpu_UUCs_=poJw=m5Rzdgbnb8NjQVbpDZ6frYYA@mail.gmail.com>
 <mptjzunc77g.fsf@arm.com>
In-Reply-To: <mptjzunc77g.fsf@arm.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Wed, 26 Jul 2023 14:00:04 +0200
Message-ID: <CAFiYyc25JMB2KFOSiTJictPxG=eZR_jQuiXMO5RyaCjyHhVa+Q@mail.gmail.com>
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by
 multiplying count [PR110625]
To: Richard Biener <richard.guenther@gmail.com>, Hao Liu OS <hliu@os.amperecomputing.com>, 
	"GCC-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-1.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

On Wed, Jul 26, 2023 at 12:12=E2=80=AFPM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Wed, Jul 26, 2023 at 11:14=E2=80=AFAM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> Richard Biener <richard.guenther@gmail.com> writes:
> >> > On Wed, Jul 26, 2023 at 4:02=E2=80=AFAM Hao Liu OS via Gcc-patches
> >> > <gcc-patches@gcc.gnu.org> wrote:
> >> >>
> >> >> > When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure th=
at we're not papering over an issue elsewhere.
> >> >>
> >> >> Yes, I also wonder if this is an issue in vectorizable_reduction.  =
Below is the the gimple of "gcc.target/aarch64/sve/cost_model_13.c":
> >> >>
> >> >>   <bb 3>:
> >> >>   # res_18 =3D PHI <res_15(7), 0(6)>
> >> >>   # i_20 =3D PHI <i_16(7), 0(6)>
> >> >>   _1 =3D (long unsigned int) i_20;
> >> >>   _2 =3D _1 * 2;
> >> >>   _3 =3D x_14(D) + _2;
> >> >>   _4 =3D *_3;
> >> >>   _5 =3D (unsigned short) _4;
> >> >>   res.0_6 =3D (unsigned short) res_18;
> >> >>   _7 =3D _5 + res.0_6;                             <-- The current =
stmt_info
> >> >>   res_15 =3D (short int) _7;
> >> >>   i_16 =3D i_20 + 1;
> >> >>   if (n_11(D) > i_16)
> >> >>     goto <bb 7>;
> >> >>   else
> >> >>     goto <bb 4>;
> >> >>
> >> >>   <bb 7>:
> >> >>   goto <bb 3>;
> >> >>
> >> >> It looks like that STMT_VINFO_REDUC_DEF should be "res_18 =3D PHI <=
res_15(7), 0(6)>"?
> >> >> The status here is:
> >> >>   STMT_VINFO_REDUC_IDX (stmt_info): 1
> >> >>   STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION
> >> >>   STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0
> >> >
> >> > Not all stmts in the SSA cycle forming the reduction have
> >> > STMT_VINFO_REDUC_DEF set,
> >> > only the last (latch def) and live stmts have at the moment.
> >>
> >> Ah, thanks.  In that case, Hao, I think we can avoid the ICE by changi=
ng:
> >>
> >>   if ((kind =3D=3D scalar_stmt || kind =3D=3D vector_stmt || kind =3D=
=3D vec_to_scalar)
> >>       && vect_is_reduction (stmt_info))
> >>
> >> to:
> >>
> >>   if ((kind =3D=3D scalar_stmt || kind =3D=3D vector_stmt || kind =3D=
=3D vec_to_scalar)
> >>       && STMT_VINFO_LIVE_P (stmt_info)
> >>       && vect_is_reduction (stmt_info))
> >>
> >> instead of using a null check.
> >
> > But as seen you will miss stmts that are part of the reduction then?
>
> Yeah, but the code is doing a maximum of all the reductions in the loop:
>
>       /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunat=
ely
>          that's not yet the case.  */
>       ops->reduction_latency =3D MAX (ops->reduction_latency, base * coun=
t);
>
> So as it stands, we only need to see each reduction (as opposed to each
> reduction statement) once.
>
> But we do want to know the length of each reduction...

Yeah, it really tells the current costing infrastructure is far from
perfect here ...

We could pass you the stmt_info for the reduction PHI (aka the
info_for_reduction)
once with a special kind, vect_reduction, so you could walk relevant stmts
from that.  You get a number of add_stmts for each reduction, that could
be a hint of the length as well ...  (but I think we always pass the 'main'
stmt_info here)

> > In principle we could put STMT_VINFO_REDUC_DEF to other stmts
> > as well.  See vectorizable_reduction in the
> >
> >   while (reduc_def !=3D PHI_RESULT (reduc_def_phi))
> >
> > loop.
>
> Nod.  That's where I'd got the STMT_VINFO_LIVE_P thing from.
>
> >> I see that vectorizable_reduction calculates a reduc_chain_length.
> >> Would it be OK to store that in the stmt_vec_info?  I suppose the
> >> AArch64 code should be multiplying by that as well.  (It would be a
> >> separate patch from this one though.)
> >
> > I don't think that's too relevant here (it also counts noop conversions=
).
>
> Bah.  I'm loath to copy that loop and just pick out the relevant
> statements though.
>
> I suppose if every statement had a STMT_VINFO_REDUC_DEF, aarch64 could
> maintain a hash map from STMT_VINFO_REDUC_DEF to total latencies, and
> then take the maximum of those total latencies.  It sounds pretty
> complex though...

Ick.  Yeah.  I think most of the costing is still nearly GIGO, I hardly see=
 any
loop vectorization disqualified because of cost (happens for BB SLP though)=
.

Richard.

> Thanks,
> Richard
>