From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=Wd3M=DM=arm.com=richard.sandiford@sourceware.org>
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by sourceware.org (Postfix) with ESMTP id 0CECD3858C33
	for <gcc-patches@gcc.gnu.org>; Wed, 26 Jul 2023 10:12:22 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0CECD3858C33
Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4F2201692;
	Wed, 26 Jul 2023 03:13:04 -0700 (PDT)
Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C8F283F67D;
	Wed, 26 Jul 2023 03:12:20 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@arm.com>
To: Richard Biener <richard.guenther@gmail.com>
Mail-Followup-To: Richard Biener <richard.guenther@gmail.com>,Hao Liu OS <hliu@os.amperecomputing.com>,  "GCC-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>, richard.sandiford@arm.com
Cc: Hao Liu OS <hliu@os.amperecomputing.com>,  "GCC-patches\@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] AArch64: Do not increase the vect reduction latency by multiplying count [PR110625]
References: <SJ2PR01MB8635742E07E2076FA2BE0560E139A@SJ2PR01MB8635.prod.exchangelabs.com>
	<mptr0oxh8eb.fsf@arm.com>
	<SJ2PR01MB8635D1095B5BDDAE1CA19877E103A@SJ2PR01MB8635.prod.exchangelabs.com>
	<mptsf9cfhqz.fsf@arm.com>
	<SJ2PR01MB86352C18F941F7EEB65C1B55E100A@SJ2PR01MB8635.prod.exchangelabs.com>
	<CAFiYyc2pyQTVy++qbuAPhq8FNLo+k5ysEdj0JG2eyLzgsFWCdA@mail.gmail.com>
	<mptv8e7c9wi.fsf@arm.com>
	<CAFiYyc0F-vvXpu_UUCs_=poJw=m5Rzdgbnb8NjQVbpDZ6frYYA@mail.gmail.com>
Date: Wed, 26 Jul 2023 11:12:19 +0100
In-Reply-To: <CAFiYyc0F-vvXpu_UUCs_=poJw=m5Rzdgbnb8NjQVbpDZ6frYYA@mail.gmail.com>
	(Richard Biener's message of "Wed, 26 Jul 2023 12:02:01 +0200")
Message-ID: <mptjzunc77g.fsf@arm.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-20.4 required=5.0 tests=BAYES_00,KAM_DMARC_NONE,KAM_DMARC_STATUS,KAM_LAZY_DOMAIN_SECURITY,KAM_SHORT,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Jul 26, 2023 at 11:14=E2=80=AFAM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> Richard Biener <richard.guenther@gmail.com> writes:
>> > On Wed, Jul 26, 2023 at 4:02=E2=80=AFAM Hao Liu OS via Gcc-patches
>> > <gcc-patches@gcc.gnu.org> wrote:
>> >>
>> >> > When was STMT_VINFO_REDUC_DEF empty?  I just want to make sure that=
 we're not papering over an issue elsewhere.
>> >>
>> >> Yes, I also wonder if this is an issue in vectorizable_reduction.  Be=
low is the the gimple of "gcc.target/aarch64/sve/cost_model_13.c":
>> >>
>> >>   <bb 3>:
>> >>   # res_18 =3D PHI <res_15(7), 0(6)>
>> >>   # i_20 =3D PHI <i_16(7), 0(6)>
>> >>   _1 =3D (long unsigned int) i_20;
>> >>   _2 =3D _1 * 2;
>> >>   _3 =3D x_14(D) + _2;
>> >>   _4 =3D *_3;
>> >>   _5 =3D (unsigned short) _4;
>> >>   res.0_6 =3D (unsigned short) res_18;
>> >>   _7 =3D _5 + res.0_6;                             <-- The current st=
mt_info
>> >>   res_15 =3D (short int) _7;
>> >>   i_16 =3D i_20 + 1;
>> >>   if (n_11(D) > i_16)
>> >>     goto <bb 7>;
>> >>   else
>> >>     goto <bb 4>;
>> >>
>> >>   <bb 7>:
>> >>   goto <bb 3>;
>> >>
>> >> It looks like that STMT_VINFO_REDUC_DEF should be "res_18 =3D PHI <re=
s_15(7), 0(6)>"?
>> >> The status here is:
>> >>   STMT_VINFO_REDUC_IDX (stmt_info): 1
>> >>   STMT_VINFO_REDUC_TYPE (stmt_info): TREE_CODE_REDUCTION
>> >>   STMT_VINFO_REDUC_VECTYPE (stmt_info): 0x0
>> >
>> > Not all stmts in the SSA cycle forming the reduction have
>> > STMT_VINFO_REDUC_DEF set,
>> > only the last (latch def) and live stmts have at the moment.
>>
>> Ah, thanks.  In that case, Hao, I think we can avoid the ICE by changing:
>>
>>   if ((kind =3D=3D scalar_stmt || kind =3D=3D vector_stmt || kind =3D=3D=
 vec_to_scalar)
>>       && vect_is_reduction (stmt_info))
>>
>> to:
>>
>>   if ((kind =3D=3D scalar_stmt || kind =3D=3D vector_stmt || kind =3D=3D=
 vec_to_scalar)
>>       && STMT_VINFO_LIVE_P (stmt_info)
>>       && vect_is_reduction (stmt_info))
>>
>> instead of using a null check.
>
> But as seen you will miss stmts that are part of the reduction then?

Yeah, but the code is doing a maximum of all the reductions in the loop:

      /* ??? Ideally we'd do COUNT reductions in parallel, but unfortunately
	 that's not yet the case.  */
      ops->reduction_latency =3D MAX (ops->reduction_latency, base * count);

So as it stands, we only need to see each reduction (as opposed to each
reduction statement) once.

But we do want to know the length of each reduction...

> In principle we could put STMT_VINFO_REDUC_DEF to other stmts
> as well.  See vectorizable_reduction in the
>
>   while (reduc_def !=3D PHI_RESULT (reduc_def_phi))
>
> loop.

Nod.  That's where I'd got the STMT_VINFO_LIVE_P thing from.

>> I see that vectorizable_reduction calculates a reduc_chain_length.
>> Would it be OK to store that in the stmt_vec_info?  I suppose the
>> AArch64 code should be multiplying by that as well.  (It would be a
>> separate patch from this one though.)
>
> I don't think that's too relevant here (it also counts noop conversions).

Bah.  I'm loath to copy that loop and just pick out the relevant
statements though.

I suppose if every statement had a STMT_VINFO_REDUC_DEF, aarch64 could
maintain a hash map from STMT_VINFO_REDUC_DEF to total latencies, and
then take the maximum of those total latencies.  It sounds pretty
complex though...

Thanks,
Richard